Skip to content(if available)orjump to list(if available)

R1 Computer Use

R1 Computer Use

8 comments

·February 6, 2025

crazygringo

I can't wait for something like this to be built.

People have tons of workflows that involve a lot of clicks and typing in response to data that are too difficult or one-off to automate with fragile macros.

But if my computer can quickly realize that I'm deleting every odd-numbered page of a PDF, or renaming every file to add a prefix, or following each link on a website and saving an image... and then just instantly automate the next 100 times... that's going to be huge!

falcor84

> @software{r1_computer_use, title = {R1-Computer-Use: Reasoning-First Computer Interaction}, author = {Barker, Patrick}, year = {2025}, url = {https://github.com/agentsea/r1-computer-use}, }

Sorry to be a party-pooper, but does it really make sense to add a citation when you don't have fully working code yet, let alone a paper about it?

mountainriver

Fair, in the process of opening more of it up, so this seems a bit odd right now

mountainriver

Hey HN,

We are working to apply the ideas of R1 to computer use. The primary struggle is creating reliable neural reward models since hard-verification rewards are not available at scale in GUI interactions.

Our team is currently deep in the weeds of collecting reasoning annotation data for GUI interfaces to train a reliable reward model.

We would love all thoughts, feedback, and collaborations!

refulgentis

Free advice (though, worth less than free, because A) it's unsolicited B) it's saying "don't do it")

TL;DR:

- Turns out that if you do UXR, even if computer use is 100% successful in the action execution, and there's no latency, people don't use it. (interesting to me is, the core demo was buying airline tickets, and so is OpenAI's. no one would defer to a computer on that, for humanist / design reasons)

- You're not going to be able out-do model companies on building models, they have too much funds.

- Try writing GUI-based integration tests. Then imagine an LLM, miraculously, always chooses the right route. Does the UX look good?

- Note the reasoning models are worse at tool calling. It's very, very, VERY stark when you have Claude next to o1/4o. OpenAI also owns up to this in the o3-mini paper, though its not under a blaring red line headline or phrased that straightforwardly.

- Why is that? You're fighting against the current when you're trying to teach the next token predictor to throw a bunch of text out there to <think>, then generate perfectly correct JSON/python/whatever given N tools.

CLI, though....

llama-mini

It seems a placeholder for now? No content? Right?

mountainriver

We are in the process of collecting the data right now which is fairly involved, we are going to be opening up that platform for others as well shortly.

fkyoureadthedoc

This is the type of post some VP at my company sees and starts telling people that R1 can use computer and then I have to be like "well actually" to 25 people.

Computer use is pretty exciting stuff in general though, good luck