Scheduled tasks in ChatGPT

77 comments

·January 15, 2025

UmYeahNo

I tried this yesterday, asking it to create a simple daily reminder task, which it happily did. Then when the time came and went I simply got a chat that the task failed, with no explanation of why or how it failed. When I asked it why, it hallucinated that I had too many tasks. (I only had the one) So, now I don't know why it failed or how to fix it. Which leads to two related observations:

1) I find it interesting that the LLM rarely seems trained to understand it's own features, or about your account, or how the LLM works. Seems strange that it has no idea about it's own support.

2) Which leads me to the Open AI support docs[0]. It seems pretty telling to me that they use old-school search and not an LLM for its own help docs, right?

[0] https://help.openai.com/

Terretta

Same experience except mine insisted I had no tasks.

It does say it's a beta on the label, but the thing inside doesn't seem to know that, nor what it's supposed to know. Your point 1, for sure.

Point 2 is a SaaS from before the LLMs+RAG beat normal things. Status page, a SaaS. API membership, metrics, and billing, a SaaS. These are all undifferentiated, but arguably they selected quite well for when the selections were made, and unless the help is going to sell more users, they shouldn't spend time on undifferentiated heavy lifting, arguably.

varispeed

> it hallucinated that I had too many tasks.

How do you know it hallucinated? Maybe your task was one too many and it is only able to handle zero tasks (which would appear to be true in your case).

derefr

Re: 2 — for the same reason that you shouldn't host your site's status page on the same infrastructure that hosts your site (if people want to see your status page, that probably means your infra is broken), I would guess that OpenAI think that if you're looking at the support docs, it might be because the AI service is currently broken.

reustle

> It seems pretty telling to me that they use old-school search and not an LLM for its own help docs, right?

Just not a priority most likely. Check out the search by Mintlify docs to see a very well built implementation.

Example docs site that uses it: https://docs.browserbase.com

fooker

You can hardly blame a product for not doing something that we don't know for certain to be possible.

neom

I've thought about this a lot too and my guess is that because foundational modals take a lot to train, I don't think they are trained fairly often, and from my experiences you can't train in new data easily, so I think you'd have to have some little up to date side system, and I suspect they're very thoughtful about these "side systems" they place, from trying to build some agent orchestration stuff myself nothing ends up being as simple as as I expect with "side systems" and stuff easily goes off the rails. So my thought was probably, given the scale they're dealing with, this is probably a low priority not actually particularly easy feature.

miltonlost

> So my thought was probably, given the scale they're dealing with, this is probably a low priority not actually particularly easy feature.

"working like OpenAI said it should" is a weird thing to put low priority. Why do they continuously put out features that break and bug? I'm tired of stochastic outputs and being told that we should accept sub-90% success rates.

At their scale, being less than 99.99% right results in thousands of problems. So their scale and the outsized impact of their statistical bugs is part of the issue.

neom

Why are you setting your bar this way? Is it because of how they do their feature releases (no warning of it being an alpha or beta feature)? Their product, ChatGPT was released 2 years ago, and is a fairly complicated product. My understanding was the whole thing is still a pretty early product generally. It doesn't seem unusual that any startup doing something as big as they are to release features that don't have all the kinks ironed out. I've released some kinda janky features to 100,000s of users before not totally knowing how it's going to preform with all of them at that scale, I don't think that is very controversial in product development.

Also, I was specifically talking about it being able to understand the features it has in my earlier comment, I don't think that is the same problem as the remind me feature not working consistently.

yosito

I regularly use Perplexity and Cursor which can search the internet and documentation to answer questions that aren't in their training data. It doesn't seem that hard for ChatGPT to search and summarize their own docs when people ask about it.

neom

You would want a feature like "self aware" to be pretty canonical, not based on a web search, and even if they had a discreet internal side system it could query that you controlled, if the training data was a year old, how would you keep it matched from a systems point of view over time? Also it's unclear how the model would interoperate the data each time it ran on the new context. It seems like a pretty complicated system to build tbh, esp when maintaining human created help and docs and FAQs etc is A LOT simpler and more reliable source of truth. That said, my understanding is behind the scenes they are working towards the product we experience just built around the foundational model, not THE foundational model is it pretty much is today. Once they have a bunch of smaller llms that do discreet standard tasks set up, I would guess they will become considerably more "aware".

baxtr

Now imagine giving this "agent" a task like booking a table at a restaurant or similar.

"Yeah sure I got you a table at a nice restaurant. Don’t worry."

behnamoh

> 2) Which leads me to the Open AI support docs[0]. It seems pretty telling to me that they use old-school search and not an LLM for its own help docs, right?

I agree, but then again, if you're a dev in this space, presumably you know what keywords to use to refine your search. RAG'ed search implies that the user (dev) are not "in the know".

dgfitz

New killer feature: cron

Can’t imagine why everyone doesn’t pay $200/mo for even more features. Eventually I bet they can clean out /tmp!

chairhairair

cron, but completely unreliable. How nice.

LLM heads will say “it’s not completely unreliable, it works very often”. That is completely unreliable. You cannot rely on it to work.

Please product people, stop putting LLMs at the core of products that need reliability.

kenjackson

It's all a matter of degree. Even in deterministic systems, bit flipping happens. Rarely, but it does. You don't throw out computers as a whole because of this phenomena, do you? You just assess the risk and determine if the scenario you care about sits above or below the threshold.

dkjaudyeqooe

A bit flip is a rare occurrence in an array typically tens of billions large.

The chance that the flipped bit changes a bit that results in a new valid state and one that does something actually damaging is astronomically small.

Meanwhile LLM errors are common and directly effect the result.

great_psy

When’s the last time you personally had a bit flip on you?

rsynnott

Not just that, cron, only non-deterministic! The future is now.

theshrike79

An actual killer feature would be a system that lets me define repeating tasks with natural language.

Then it would translate that into cron commands in the background.

postsantum

I feel like the obligatory comment about Dropbox is coming your way

headcanon

I'm trying to figure out how this would be useful with the existing feature set.

It seems like it would be good for summarizing daily updates against a search query. but all it would do is display them. I would probably want to connect it with some tools at minimum for it to be useful.

DeepYogurt

They're really trying to juice the usage numbers

BoredPositron

"How chatgpt reminders saved my life and made me more productive." Videos on YouTube in 3,2,1.

JTyQZSnP3cQGa8B

As long as it’s generating hype and funding, it brings us closer to their own definition of AGI. It’s the perfect plan.

null

[deleted]

null

[deleted]

srid

Important caveat:

> ChatGPT has a limit on 10 active tasks at any time. If you reach this limit, ChatGPT will not be able to create a new task unless you pause or delete an existing active task or it completes per its scheduled time.

So this is pretty much useless for most real-world uses cases.

jumploops

I'm surprised it took OpenAI this long to launch scheduled tasks, but as we've seen from our users[0], pure LLM-based responses are quite limited in utility.

For context: ~50% of our users use a time-triggered Loop, often with an LLM component.

Simple stuff I've used it for: baby name idea generator, reminder to pay housekeeper, pre-natal notifications, etc.

We're moving away from cron-esque automations as one of our core-value props (most new users use us for spinning up APIs really quickly), but the base functionality of LLM+code+cron will still be available (and migrated!) to the next version of our product.

[0]https://magicloops.dev/

MattDaEskimo

This was a weak citation.

> Simple stuff I've used it for: baby name idea generator, reminder to pay housekeeper, pre-natal notifications, etc.

None of these require an LLM. It seems like you own this service yet can't find any valuable use for it.

---

ChatGPT tasks will become a powerful tool once incorporated into GPTs.

I produce lots of data. Lots of it, and I'd like to have my clients have daily updates on it, or even have content created based on it.

jumploops

> None of these require an LLM. It seems like you own this service yet can't find any valuable use for it.

Sorry? My point was that these are the only overlapping features I've personally found useful that could be replaced with the new scheduled tasks from ChatGPT.

Even these shouldn't require an LLM. A simple cron+email would suffice.

The web scraping component is neat, but for my personal use-cases (tide tracking) I've had to use LLM-generated code to get the proper results. Pure LLMs were lacking in following the rules I wanted (tide less than 1 ft, between sunrise and sunset). Sometimes the LLM would get it right, sometimes it would not.

For our customers, purely scheduling an LLM call isn't that useful. They require pairing multiple LLM and code execution steps to get repeatable and reliable results.

> ChatGPT tasks will become a powerful tool once incorporated into GPTs.

Out of curiosity, do you use GPTs?

duskwuff

> Simple stuff I've used it for: baby name idea generator, reminder to pay housekeeper, pre-natal notifications, etc.

Baby name generator: why would this be a scheduled task? Surely you aren't having that many children... :)

Reminder to pay, notifications: what value does OpenAI bring to the table here over other apps which provide calendar / reminder functionality?

jumploops

> Baby name generator: why would this be a scheduled task? Surely you aren't having that many children... :)

So far it's help name two children :) -- my wife and I like to see the same 10 ideas each day (via text), so that we can discuss what we like/don't like daily. We tried the sift through 1000 names thing and it didn't fit well with us.

> Reminder to pay, notifications: what value does OpenAI bring to the table here over other apps which provide calendar / reminder functionality?

That's exactly my point. Without further utility (i.e. custom code execution), I don't think this provides a ton of value at present.

dimitri-vs

"ok Google, remind me to ____ every ____"

Am I missing something or is there exactly zero benefit here over native Apple/Google calendar/todo apps?

jumploops

You're not missing anything, other than us using Siri :)

My point was that this new functionality, while neat at a surface level, doesn't provide much real utility.

Without custom code execution, you're limited to very surface-level tasks that should be doable with a cron+sms/email.

darkteflon

Surely we want to be scheduling and calling LLMs from temporalio, dagster - even cron - instead of whatever this is. Why put the LLM at the middle?

joshstrange

This feature is really bad (unreliable) and they don’t even make a good case for _why_ you would want to use this over literally any other reminder system. I guess it can execute an LLM to decide what to send to you at the scheduled time but its unreliability would never have me relying on it. Some use cases that might be interesting * Let me know the closing stock price for XXXXX * Compile a list of highlights from the XXXX game after it finishes But everything I can think of is just a toy, cool if it works but not ground breaking and possible with much more reliable methods. OpenAI really seems just be throwing stuff at the wall to see if it sticks then moving on and never iterating on the previous features. Dall-e is kind of a joke compared to other things (one-shot only), I trust Claude more for programming, o1 was ho-hum for my needs, desktop app still feels like a solution in search of a problem, etc.

reustle

Has been consistently working for me, and it does web searching within the tasks.

i.e. look up some niche news on a topic and format it in a particular way

android521

I tried it and it failed to send me desktop notification. I did receive emails (at the wrong time). I do think it is too early to launch. 5 min test could have found out these bugs.It really hurts their brand.

ilaksh

This will be a lot more useful when it's able to combine with more tools, such as in custom GPT actions, APIs, "computer use", the Python interpreter, etc.

ProofHouse

Yeah, it’s pretty bad, embarrassingly so quite honestly. Literally a single developer in a day could probably significantly improve it. I’m sure that’s coming, but why don’t they just launch these MVP features at least a quarter baked. It’s essentially unusable as is. If it could ping me on my phone And advanced voice could open or I could go do a basic task, great I’m back to using it. But essentially as it is rolled out, it’s hilariously minimal and borderline unusable.

elif

Works on my machine. (tm)

But it won't let me reschedule my task execution time or change its prompting... It will just go forever now I guess

HN

Scheduled tasks in ChatGPT

Scheduled tasks in ChatGPT