Ace: Realtime Computer Autopilot
18 comments
·April 2, 2025vivzkestrel
There was a very nice article yesterday titled "The case against conversational interfaces" What do you have to say about that https://julian.digital/2025/03/27/the-case-against-conversat...
quantumHazer
Very good article.
The founder of generalagents even says that they want free humans from digital labor. I can’t stand these takes. Leave my computer to me!
iamleppert
The recruitment use-case was hilarious, thanks I needed that.
dilDDoS
That example made me wonder if this was satire...I'm guessing not, but pretty funny nonetheless. I bet my boss would love me texting him the names of random people from LinkedIn.
unfunnytard
i'd love to see it being real (someday)
cap-potato
[dead]
the__alchemist
This isn't what I was expecting at all based on the title!
sherjilozair
I'm the founder and CEO of General Agents. Happy to answer questions!
chews
From your site, "Ace works like we do—performing mouse clicks and keystrokes based on the screen and prompt—trained with <3 by our team of software specialists and domain experts on over a million tasks."
Is there a way to train or augment training on applications you've never seen before? We have a bunch of custom Java applications that we use in finance, curious about some additional automation.
sherjilozair
Ace is actually uniquely designed to support that. Our training staff simply record their screens and mouse+keyboard events. We transform that into behavior cloning data to train the model. It's quite easy for us to do custom agents for enterprise or other lesser-known software and workflows. Reach out to us at contact@generalagents.com if you're interested.
gkolli
Hi! Looks pretty interesting - few questions/thoughts:
1. Could you talk a bit more about your behavioral-training? If ace-control is trained on behavioral recordings, would it choose the most efficient path for the agent to take to complete a task? I'm guessing humans choose naturally take less-optimal steps.
2. What causes the huge speed increase? I'm guessing there were a lot of optimizations made, especially since this behavioral-training seems very different from vision models. I'm guessing the model is smaller, so it's interesting that accuracy is highest. I'd be interested to see a comparison vs. 4o-mini
3. Would be neat for it to handle instructions offline/locally - like "connect me to wifi" ;)
4. Would be cool if agent could work in the background so I can do something else in the meantime. ;)
timabdulla
How does it perform on e.g. WebVoyager, WebArena, or OSWorld? These seem to be the oft-cited benchmarks when comparing computer-use agents.
xfr
First, I am extremely impressed by the demo. It looks truly groundbreaking.
Could you elaborate on the types of tasks and data sources used to train Ace, and how these contribute to its performance on desktop automation?
Ace is said to outperform other models on your suite of computer use tasks. Can you provide more details on these benchmarks and how Ace compares to existing automation tools?
martin_
Amazing performance! Do you anticipate making the model available for commercial use or are you primarily focused on releasing agents built upon it?
lilyhills03
so excited!!
cbiscuit
cool!!
misbah143
This is super fast. Future of computer agents. Bullish on this.
Every time I see "Book me a trip to X" I immediately shut down. I have yet to see any LLM handle all the cases that human would/needs to. Those sites are hard to navigate for a human, doing a search and clicking on the first result is not "autopilot". If that's all I was going to do then I'd just do it myself.
Instead I need to read through 10s+ of listings, keeping track of cleaning/other fees, weighing location and price as well amenities if any. [0]
I have yet to see a model do any of that (yes, I'm aware it's possible and maybe someone is doing that).
> We're just beginning. As we increase training resources, Ace will become more intelligent and capable.
Can we not? "As time goes on we will get better because magic" - I truly hate this hopium in the LLM community. LLM problems are not like normal software problems, you cannot code your way out of a hole. You can prompt or re-train, both suck and (training at least) has a long turn-around time and is not cheap.
I really enjoy LLMs and love trying new things that use them. This idea/product/service/whatever is just not compelling. It feels like I'd need to babysit this process to make sure it didn't do something stupid. It's the same reason I have never in my life bought something through my Amazon Echos, the upside is minimal and the downside can be massive.
[0] OpenAI's Deep Research is closer to what I'm talking about but even that is laughably bad sometimes. It looks impressive as hell, it impressed me, and then I went to ask it an "easy" question so I could share the question to a friend to show them how cool it was. The "easy" question was something I was familiar with and the final results were lacking (to be nice). I asked it to research local bakeries and it missed a ton of places that show up in 1-2 google searches. -- This is the problem with LLMs across the board, they are great at producing good sounding output but that doesn't make it right/true/complete.