GPT-4o with scheduled tasks (jawbone) is available in beta
97 comments
·January 14, 2025ttul
qgin
Recurring schedules across time zones is an unbelievably maddening thing to implement. At first glance it seems simple, but it gets very weird very quickly.
wkat4242
Yeah summer time in different countries switching on different days and often in a different direction (other hemisphere). I used to work on such matters and those weeks were the toughest.
echeese
Considering my iPhone alarm still sometimes fails to go off (it just shows the alarm screen silently), I'd be inclined to believe you.
ineedasername
Thanks for that— I though I was going crazy (well still could be I guess) or had some strange habit or gesture I didn’t realize was silencing the alarm somehow.
emptiestplace
yakz
Whenever I have to wake for something that I absolutely can’t miss, I set 2-3 extra reminders 5 minutes apart precisely because of this “silent alarm” bug. It’s only happened to me a couple of times but twice was enough to completely destroy my trust in the alarm. The first time I thought I just did something in my sleep to cause it, but the UI shows it as if the alarm worked. I’m lucky to have the privilege that if I oversleep an hour or so it’s no big deal, otherwise ye olde tabletop alarm clock would be back.
android521
And gmail schedule delivery just won't work if you want to email yourself a month later.
imsotiredspacex
This is the prompt describing the function call parameters:
When calling the automation, you need to provide three main parameters: 1. Title (title): A brief descriptive name for the automation. This helps identify it at a glance. For example, "Check for recent news headlines". 2. Prompt (prompt): The detailed instruction or request you want the automation to follow. For example: "Search for the top 10 headlines from multiple sources, ensuring they are published within the last 48 hours, and provide a summary of any recent Russian military strikes in the Lviv Oblast." 3. Schedule (schedule): This uses the iCalendar (iCal) VEVENT format to specify when the automation should run. For example, if you want it to run every day at 8:30 AM, you might provide:
BEGIN:VEVENT RRULE:FREQ=DAILY;BYHOUR=8;BYMINUTE=30;BYSECOND=0 END:VEVENT
Optionally, you can also include: • DTSTART (start time): If you have a specific starting point, you can include it. For example:
BEGIN:VEVENT DTSTART:20250115T083000 RRULE:FREQ=DAILY;BYHOUR=8;BYMINUTE=30;BYSECOND=0 END:VEVENT
In summary, the call typically includes: • title (string): A short name. • prompt (string): What you want the automation to do. • schedule (string): The iCal VEVENT defining when it should run.
dmadisetti
The beta is inconsistently showing (required a few refreshes to get something to show up), but my limited usage of it showed a plethora of issues:
- Assumed UTC instead of EST. Corrected it and it still continued to bork
- Added random time deltas to my asked times (+2, -10 min).
- Couple notifications didn't go off at all
- The one that did go off didn't provide a push notification.
---
On top of that, only usable without search mode. In search mode, it was totally confused and gave me a Forbes article.
Seems half baked to me.
Doing scheduled research behind the scenes or sending a push notification to my phone would be cool, but surprised they thought this was OK for a public beta.
gukov
You'd think Open AI's dev velocity and quality would be off the charts since they live and breathe "AI." If a company building ChatGPT itself often delivers buggy features then it doesn't bode well for this whole 'AI will eat the world' notion.
golergka
So far, I've found AI to be a great force multiplier in green field, small projects. In a huge corporate codebase, it has the power of advanced refactoring (which doesn't touch more than a handful files at a time) and a CSS wizard.
practice9
Well none of the labs have good frontend or mobile engineers or even infra engineers
Anthropic is ahead in this because they keep their UIs simplistic so the failure modes are also simple (bad connection)
OpenAI is just pushing half baked stuff to prod and moving on (GPTs, Canvas).
Find it hilarious and sad that o1-pro just times out thinking on very long or image-intense chats. Need to reload page multiple times after it fails to reply and maybe answer will appear (or not? Or in 5 minutes?). Kinda shows they’re not testing enough and “not eating their own food” and feels like chatgpt 3.5 ui before the redesign
lolinder
> Anthropic is ahead in this because they keep their UIs simplistic ... OpenAI is just pushing half baked stuff to prod and moving on (GPTs, Canvas).
What's funny is that OpenAI's Canvas was their attempt to copy Anthropic's Artifacts! So it's not like Anthropic is stagnant and OpenAI is at least shipping, Anthropic is shipping and OpenAI can't even copy them right.
jeffgreco
It's a good point, Anthropic is being VERY choosy and winds up knocking it out of the park with stuff like Artifacts. Meanwhile their MacOS app is junk, but obviously not a priority.
cma
> because they keep their UIs simplistic
How do I edit a sent message in the Claude Android app? It's so simplistic I can't find it.
cruffle_duffle
According to all the magazines I've been reading, all that is required is to just prompt it with "please fix all of these issues" and give it a bulleted list with a single sentence describing each issue. I mean, it's AI powered and therefore much better than overpaid prima-donna engineers, so obviously it should "just work" and all the problems will get fixed. I'm sure most of the bugs were the result of humans meddling in the AI's brilliant output.
Right now, in fact, my understanding is OpenAI is using their current LLM's to write the next generation ones which will far surpass anything a developer can currently do. Obviously we'll need to keep management around to tell these things what to do, but the days of being a paid software engineer are numbered.
xarope
I think you forgot the /s (sarcasm) in your post!
ineedasername
When I have it do a search I have to tell it to just get all the info it can in the search but wait for the next request. The I explicitly tell it we’re done searching and to treat the next prompt as a new request but using the new info it found.
That’s the only way I get it to have a halfway decent brain after a web search. Something about that mode makes it more like a PR drone version of whatever I asked it to search, repeating things verbatim even when I ask for more specifics in follow-up.
imsotiredspacex
i posted the system prompt part describing the function call; if you read it and adjust your prompt for creating the task it works way better.
null
potatoman22
I'd rather have buggy things now than perfect things in a year.
dmadisetti
Doesn't need to be perfect- but using this would actively reduce productivity
sprobertson
First impressions matter, if the experience is this bad you're probably waiting a year to come back anyway.
jahewson
Worked out great for Sonos when their timers and alarms didn’t work.
broknbottle
Found the PM
arthurcolle
DateTime stuff is generally super annoying to debug. Can't fault them too badly. Adding a scheduler is a key enabling idea for a ton of use cases
sensanaty
> Can't fault them too badly
The same company that touts their super hyper advanced AI tool that can do everyone's (except the C-level's, apparently) jobs to the world can't figure out how to make a functional cron job happen? And we're giving them a pass, despite the bajillions of dollars that M$ and VC is funneling their way?
Quite interesting they wouldn't just throw the "proven to be AGI cause it passes some IQ tests sometimes" tooling at it and be done with it.
arthurcolle
it would explain the bugs if they used the AI to make the datetime implementation though
dmadisetti
Yeah, they're not exactly a scrappy startup- I'd be surprised if they had 0 QA.
Makes me wonder if they internally have "press releases / Q" as an internal metric to keep up the hype.
airstrike
Maybe that's the Q* we've been hearing rumors about
cbeach
Agreed on date/time being a frustrating area of software development.
But wouldn't a company like OpenAI use a tick-based system in this architecture? i.e. there's an event emitter that ticks every second (or maybe minute), and consumers that operate based on these events in realtime? Obviously things get complicated due to the time consumed by inference models, but if OpenAI knows the task upfront it could make an allowance for the inference time?
If the logic is event driven and deterministic, it's easy to test and debug, right?
singron
The original cron was programmed this way, but it has to examine every job every tick to check if it should run, which doesn't scale well. Instead, you predict when the next run for a job will be and insert that into an indexed schedule. Then each tick it checks the front of the schedule in ascending order of timestamps until the remaining jobs are in the future.
This is also a bad case in terms of queueing theory. Looking at Kingmans equation, the arrival variance is very high (a ton of jobs will run at 00:00 and much fewer at 00:01), and the service time also has pretty high variance. That combo will either require high queue delay variance, low utilization (i.e. over-provosioning), or a sophisticated auto-scaler that aggressively starts and stops instances to anticipate the schedule. Most of the time it's ok to let jobs queue since most use cases don't care if a daily or weekly job is 5 minutes late.
sky2224
Pretty useless so far. I'm not sure what the intended application of this is so far, but I wanted it to schedule some work for me.
It only scheduled the first thing and that was after having to be specific by saying "7:30pm-11pm". I wanted to say "from now to 11pm" but it did couldn't process "now"
PittleyDunkin
Where are the release notes?
Edit: I suppose they'll be here at some point: https://help.openai.com/en/articles/9624314-model-release-no...
These seem like extremely shitty release notes. I have no clue why anybody pays for this model.
ben_w
You might want this? It's more technical than the one you linked to:
throwup238
The docs for the beta seem to already be up: https://help.openai.com/en/articles/10291617-scheduled-tasks...
speedgoose
It has consistently been the best model for the two last years and only Gemini is perhaps slightly better now.
TheJCDenton
Nothing yet
phgn
What am I supposed to see at the link?
swifthesitation
You click the drop down menu for model selection and choose 4o with scheduled tasks
picografix
why are they trying to be a model provider as well as service provider
simple10
The UI is different in the desktop app for macOS. The ability to edit the schedule task is only available in the web UI for me.
I got the best results by not enabling Search the Web when I was trying to create tasks. It confuses the model. But scheduled tasks can successfully search the web.
It's flaky, but looks promising!
throwaway314155
Less relevant but why isn't canvas available in the desktop app? I thought they had feature parity but it seems not.
reversethread
Does the world need another reminder/todo app?
Many existing apps (like Todoist) have already had LLM integrations for a while now, and have more features like calendars and syncing.
Or do I completely not understand what this product is trying to be?
bogdan
Why not? I already pay for chatgpt but I don't pay for todoist so that doesn't help me.
elyase
There is more information in these twitter threads:
https://x.com/karinanguyen_/status/1879270529066262733 https://x.com/OpenAI/status/1879267276291203329
cbeach
I'm sure it's brilliant, but I have no idea what it's capable of. What will it do? Send me a push notification? Have an answer waiting for me when I come back to it in a while?
I switched over to the "GPT4o with scheduled tasks" model and there were no UI hints as to how I might use the feature. So I asked it "what you can you follow up later on and how?"
It replied "Could you clarify what specifically you’d like me to follow up on later?"
This is a truly awful way to launch a new product.
benaduggan
After asking it to schedule something, it prompted me to allow or block notifications, so sounds like this is just chatGPT scheduling push notifications? We'll see!
jerpint
So basically canibalizing Siri ?
1propionyl
Siri has access to a wealth of private existing and future on-device APIs to fuel context sensitive responses to queries on vendor locked devices used all day long. (Which Apple has apparently decided to just not use yet.)
OpenAI doesn't, they just have a ton of funding and (up to recently) a good mass media story, and the best natural language responses.
The moat around Siri is much deeper, and I don't really see any evidence OpenAI has any special sauce that can't be reproduced by others.
My prediction is that OpenAI's reliance on AI doomerism to generate a regulatory moat falters as they become unable to produce step changes in new models, while Apple's efforts despite being halting and incomplete become ubiquitous thanks to market share and access to on device context.
I wouldn't (and don't) put my money in OpenAI anymore. I don't see a future for them beyond being the first mover in an "LLM as a service" space in which they have no moat. On top of that they've managed to absorb the worst of criticism as a sort of LLM lightning rod. Worst of all, it may turn out that off-device isn't even really necessary for most consumer applications in which case they'll start to have to make most of their money on corporate contracts.
Maybe something will change, but right now OpenAI is looking like a bubble company with no guarantee to its dominant position. Because it is what it is: simply the largest pooling of money to try to corner this market. What else do they have?
siva7
Yep, this is a truly bad feature launch. I have no clue what this model does. Did they somehow lose their competent product people?
cbeach
Ah, I've just stumbled on some hints after clicking around.. click on your avatar image (top right) and then click "Tasks"
Then there are some UI hints.
"Remind me of your mom's birthday on [X] date"
Wow, really maximising that $10bn GPU investment!
Amazon had an insane number of people working on just the alarms feature in Alexa when they interviewed me for a position years ago. They had entire teams devoted to the tiniest edge case within the realm of scheduling things with Alexa. This is no doubt one of the biggest use cases in computing: getting your computer to tell you what to do at a given time.