Claude Code on the web

115 comments

·October 20, 2025

mmaunder

We were heavy users of Claude Code ($70K+ spend per year) and have almost completely switched to codex CLI. I'm doing massive lifts with it on software that would never before have been feasible for me personally, or any team I've ever run. I'll use Claude Code maybe once every two weeks as a second set of eyes to inspect code and document a bug, with mixed success. But my experience has been that initially Claude Code was amazing and a "just take my frikkin money" product. Then Codex overtook CC and is much better at longer runs on hard problems. I've seen Claude Code literally just give up on a hard problem and tell me to buy something off the shelf. Whereas Codex's ability to profoundly increase the capabilities of a software org is a secret that's slowly getting out.

I don't have any relationship with any AI company, and honestly I was rooting for Anthropic, but Codex CLI is just way way better.

Also Codex CLI is cheaper than Claude Code.

I think Anthropic are going to have to somehow leapfrog OpenAI to regain the position they were in around June of this year. But right now they're being handed their hat.

pkreg01

I totally agree. I remember the June magic as well - almost overnight my abilities and throughput were profoundly increased, I had many weeks of late nights in awe and wonder trying things that were beyond my ability to implement technically but within the bounds of my conceptual understanding.

Initially, I found Codex CLI with GPT-5 to be a substitute for Claude Code - now GPT-5 Codex materially surpasses it in my line of work, with a huge asterisk. I work in a niche industry, and Codex has generally poor domain understanding of many of the critical attributes and concepts. Claude happens to have better background knowledge for my tasks, so I've found that Sonnet 4.5 with Claude Code generally does a better job at scaffolding any given new feature. Then, I call in Codex to implement actual functionality since Codex does not have the "You're absolutely right" and mocked/placeholder implementation issues of CC, and just generally writes clean, maintainable, well-planned code. It's the first time I've ever really felt the whole "it's as good as a senior engineer" hype - I think, in most cases, GPT5-Codex finally is as good as a senior engineer for my specific use case.

I think Codex is a generally better product with better pricing, typically 40-50% cheaper for about the same level of daily usage for me compared to CC. I agree that it will take a genuinely novel and material advancement to dethrone Codex now. I think the next frontier for coding agents is speed. I would use CC over Codex if it was 2x or 3x as fast, even at the same quality level. Otherwise, Codex will remain my workhorse.

WXLCKNO

I agree with this and actually Claude Code agrees with it too. I've had Codex cli (gpt-5-codex high) and claude code 4.5 sonnet (and sometimes opus 4.1) do the same lengthier task with the same prompt in cloned folders about 10x now and then I ask them to review the work in the other folder and determine who did the best job.

100% of the time Codex has done a far better job according to both Codex and Claude Code when reviewing. Meeting all the requirements where Claude would leave things out, do them lazily or badly and lose track overall.

Codex high just feels much smarter and more capable than Claude currently and even though it's quite a bit slower, it's work that I don't have to go over again and again to get it to the standards I want.

pkreg01

I share your observations. It's strange to see Anthropic loosing so much ground so fast - they seemed to be the first to crack long-horizon agentic tasks via what I can only assume is an extremely exotic RL process.

Now, I will concede that for non-coding long-horizon tasks, GPT-5 is marginally worse than Sonnet 4.5 in my own scaffolds. But GPT-5 is cheaper, and Sonnet 4.5 is about 2 months newer. However, for coding in a CLI context, GPT-5-Codex is night-and-day better. I don't know how they did it.

maherbeg

Yeah this has been my experience as well. The Claude Code UI is still so much better, and the permissioning policy system is much better. Though I'm working on closing that gap by writing a custom policy https://github.com/openai/codex/blob/main/codex-rs/execpolic...

Kinda sick of Codex asking for approval to run tests for each test instance

mmaunder

Ah the tension between cybersecurity best practices and productivity is brutal right now.

rtfeldman

You don't have to use Codex in its terminal UI - e.g. you can use it in the Zed IDE out-the-box:

https://zed.dev/blog/codex-is-live-in-zed

PantaloonFlames

And also in emacs or neovim

https://xenodium.com/introducing-acpel

cesarvarela

Can you share an example of the tasks you found Codex being much better? From my experience Claude Code is much better.

the_duke

IMO gpt5-codex medium is much better as soon as the task becomes slightly complex, or the context grows a bit.

Sora 4.5 tends to randomly hallucinate odd/inappropriate decisions and goes to make stupid changes that have to be patched up manually.

mordymoop

I'm on the same page here. I have seen this sentiment about Codex suddenly being good a few times now, so I booted Codex CLI thinking-high back up after a break and asked it to look for bugs. It promptly found five bugs that didn't actually exist. It was the kind of truly impressively stupid mistake that I haven't seen Claude Code make essentially ever, and made me wonder if this isn't the sort of thing that's making people downplay the power of LLMs for agentic coding.

intellectronica

Codex works much better for long-running tasks that require a lot of planning and deep understanding.

Claude, especially 4.5 Sonnet, is a lot nicer to interact with, so it may be a better choice in cases where you are co-working with the agent. Its output is nicer, it "improvises" really well even if you give it only vague prompts. That's valueable for interactive use.

But for delegating complete tasks, Codex is far better. The benchmarks indicate that, as do most practicioners I talk to (and it is indeed my own experience).

In my own work, I use Codex for complete end-to-end tasks, and Claude Sonnet for interactive sessions. They're actually quite different.

simplify

Same here. I tried codex a few days ago for a very simple task (remove any references of X within this long text string) and it fumbled it pretty hard. Very strange.

mmaunder

I can not. We're all racing very hard to take full advantage of these new capabilities before they go mainstream. And to be honest, sharing problem domains that are particularly attractive would be sharing too much. Go forth and experiment. Have fun with it. You'll figure it out pretty fast. You can read my other post here about the kinds of problem spaces I'm looking at.

deadbabe

Ah, super secret problem domains that have been thoroughly represented in the LLM training data. Nice.

mmaunder

I'm seeing the downvotes. I'm sorry folks feel that way. I'm regretting my honesty.

Edit: I'd like to reply to this comment in particular but can't in a threaded reply, so will do that here: "Ah, super secret problem domains that have been thoroughly represented in the LLM training data. Nice."

This exhibits a fundamental misunderstanding of why coding agents powered by LLMs are such a game changer.

The assumption this poster is making is that LLMs are regurgitating whole cloth after being trained on whole cloth.

This is a common mistake among lay people and non-practitioners. The reality is that LLMs have gained the ability to program, by learning from the code of others. Much like a human would learn from the code of others, and then be able to create a completely novel application.

The difference between a human programmer an an agentic coder is that the agent has much broader and deeper expertise across more programming languages, and understands more design patterns, more operating systems, more about programming history, etc etc and it uses all this knowledge to fulfill the task you've set it to. That's not possible for any single human.

It's important for the poster to take two realities on board: Firstly, agentic coding agents are not regurgitating whole cloth from whole cloth. Instead they are weaving new creations because they have learned how to program. Secondly, agentic coding agents have broader and deeper knowledge than any human that will ever exist, and they never tire, and their mood and energy level never changes. In fact that improves on a continuous basis as the months go by and progress continues. This means we can, as individual practitioners or fast moving teams, create things that were never before possible for us without raising huge amounts of money and hiring large very expensive teams, and then having the overhead of lining everyone up behind a goal AND dealing with the human issues that arise, including communication overhead.

This is a very exciting time. Especially if you're curious, energetic, and are willing to suspend disbelief to go and take a look.

bcrosby95

Yeah, after correcting it several times I've gotten Claude Code to tell me it didn't have the expertise to work in one of my problem domains. It was kinda surprising but also kinda refreshing that it knew when to give up. For better or worse I haven't noticed similar things with Codex.

mmaunder

I've chosen problems with non-negotiable outcomes. In other words, problem domains where you either are able to clearly accomplish the very hard thing, or not, and there's no grey area. I've purposely chosen these kinds of problems to prove what AI agents are capable of, so that there is no debate in my mind. And with Codex I've accomplished the previously impossible. Unambiguously. Codex did this. Claude gave up.

It's as if there are two vendors saying they can give up incredibly superpowers for an affordable price, and only one of them actually delivers the full package. The other vendor's powers only work on Tuesdays, and when you're lucky. With that situation, in an environment as competitive as things currently stand, and given the trajectory we're on, Claude is an absolute non-starter for me. Without question.

corndoge

Can you expound a bit on the problem domains? I am curious

lherron

Still a toss-up for me which one I use. For deep work Codex (codex-high) is the clear winner, but when you need to knock out something small Claude Code (sonnet) is a workhorse.

Also CC tool usage is so much better! Many, many times I’ve seen Codex writing a python script to edit a file which seems to bypass the diff view so you don’t really know what’s going on.

poorman

Totally agree. I was just thinking that I wouldn't want this feature for Claude Code but for Codex right now it would be great! I can simply let tasks run in Codex and I know it's going to eventually do what I want. Where as with Claude Code I feel like I have to watch it like a hawk and interrupt it when it goes off the rails.

mi_lk

What model are you using respectively? Not sure I share your observations

mmaunder

Have tried all and continue to eval regularly. I spend up to 14 hours a day. Currently recovering from a herniated disk because I spent 6 weeks sitting at a dining room table, 14 hours a day, leaning foward. Don't do that. lol. So my coverage is pretty good. I'm using GPT5-codex-high for 99% of my work. Also I have a team of 40 folks, about a third of which are software engineers and the other third are cybersecurity analysts, so I get feedback from them too and we go deep on our engineering calls re the latest learnings and capabilities.

simonw

I had a preview of this over the weekend, notes here plus some example PRs: https://simonwillison.net/2025/Oct/20/claude-code-for-web/

It's really solid. It's effectively a web (and native mobile) UI over Claude Code CLI, more specifically "claude --dangerously-skip-permissions".

Anthropic have recognized that Claude Code where you don't have to approve every step is massively more productive and interesting than the default, so it's worth investing a lot of resources in sandboxing.

extr

It’s interesting because I’ve slowly arrived at the opposite conclusion: for much of my practical day to day work, using CC with “allow edits” turned OFF results in a much better end product. I can correct it inline, I pseudo-review the code as it’s produced, etc etc. Codex is better for “fire and forget” features for sure. But Claude remains excellent at grokking intent for problems where you aren’t quite sure what you want to build yet or are highly opinionated. Mostly due to the fact it’s faster and the iteration loop is faster.

simonw

That approach should work well for projects where you are directly working on the code in tandem with Claude, but a lot of my own uses are much more research oriented. I like sending Claude Code off on a mission figure out how to do something.

Here's an example from this morning, getting CUDA working on a NVIDIA Spark: https://simonwillison.net/2025/Oct/20/deepseek-ocr-claude-co...

I have a few more in https://github.com/simonw/research

extr

Very fair. Interesting how much feedback on models/tools is different right now depending on what you're doing.

ryoshu

Agreed. I use CC a lot for exploratory work. It's great with fast iteration for throwaway code.

brynary

The most interesting parts of this to me are somewhat buried:

- Claude Code has been added to iOS

- Claude Code on the Web allows for seamless switching to Claude Code CLI

- They have open sourced an OS-native sandboxing system which limits file system and network access _without_ needing containers

However, I find the emphasis on limiting the outbound network access somewhat puzzling because the allowlists invariably include domains like gist.github.com and dozens of others which act effectively as public CMS’es and would still permit exfiltration with just a bit of extra effort.

minimaxir

Link to the GitHub for the native sandboxing: https://github.com/anthropic-experimental/sandbox-runtime

navanchauhan

I used `sandbox-exec` previously before moving to a better solution (done right, sandboxing on macOS can be more powerful than Linux imo). The way `sandbox-exec` works is that all child processes inherit the same restrictions. For example, if you run `sandbox-exec $rules claude --dangerously-skip-permissions`, any commands executed by Claude through a shell will also be bound by those same rules. Since the sandbox settings are applied globally, you currently can’t grant or deny granular read/write permissions to specific tools.

Using a proxy through the `HTTP_PROXY` or `HTTPS_PROXY` environment variables has its own issues. It relies on the application respecting those variables—if it doesn’t, the connection will simply fail. Sure, in this case since all other network connection requests are dropped you are somewhat protected but then an application that doesn't respect them will just not work

You can also have some fun with `DYLD_INSERT_LIBRARIES`, but that often requires creating shims to make it work with codesigned binaries

fragmede

Exfiltration is always going to be possible, the question is, is it difficult enough for an attacker to succeed against the defenses I've put in place. The problem is, I really want to share, and help protect others, but if I write it up somewhere anybody can read, it's gonna end up in the training data.

mdeeks

I feel like these background agents still aren't doing what I want from a developer experience perspective. Running in an inaccessible environment that pushes random things to branches that I then have to checkout locally doesn't feel great.

AI coding should be tightly in the inner dev loop! PRs are a bad way to review and iterate on code. They are a last line of defense, not the primary way to develop.

Give me an isolated environment that is one click hooked up to Cursor/VSCode Remote SSH. It should be the default. I can't think of a single time that Claude or any other AI tool nailed the request on the first try (other than trivial things). I always need to touch it up or at least navigate around and validate it in my IDE.

justinram11

Have you checked out Ona [1] (gitpod's pivot)?

[1] https://ona.com/

mdeeks

This is possibly what I want? It's hard to tell from all of the marketing on the site.

I want to run a prompt that operates in an isolated environment that is open in my IDE where I can iterate with the AI. I think maybe it can do this?

simonw

Not quite. This doesn't (yet) have an option where you can connect your local IDE to their remote containers to edit files directly. It's more of a fire-and-forget thing where you can eventually suck the resulting code down to your local machine using "claude --teleport ..." - but then it's not running in the cloud any more.

asdev

so the biggest issue is having to pull down and manually edit changes? can't you just @claude on the PR to make any changes?

mdeeks

Yes, but my point is often times I don't want to. Sometimes there are changes I can make it seconds. I don't want to wait 15+ seconds for an AI that might do it wrong or do too much.

Also it isn't always about editing. It is about seeing the surrounding code, navigating around, and ensuring the AI did the right thing in all of the right places.

ea016

No relations to them, but I've started using Happy[0]'s iOS app to start and continue Claude Code sessions on my iPhone. It allows me to run sessions on a custom environment, like a machine with a GPU to train models

[0] https://github.com/slopus/happy/

hmokiguess

This seems to be the only solution still if using bedrock or direct API access instead of Pro / Max plan, the Claude Code for Web doesn't seem to let you use it that way.

lysecret

Just played around with it the fact it’s on the phone is a big bonus.

I have setup a little workflow where given linear tags it sets up a work tree on my dev box installs deps and starts the implementation so I can take it over I prefer this workflow to the fully managed cloud based solutions.

This kind of fits in for issues where I’m basically sure I won’t have to take it over (and it can do it fully on its own). Which aren’t that many.

Very simple example there was a warning pop up on something where I thought there shouldn’t be now it’s done fully automatically from my phone in 5 mins. I quite like that these small changes become so easy.

arjie

A thing I really like with Claude Code is how well it uses the bash scripts you give it. I also have a browser control MCP installed and it's pretty good for it to full-cycle around the approach. I have a staging database that it has the passwords to that it logs in and runs queries on. This whole thing means it loops and delivers good results for me.

I'll try this, but the grounding seems crucial for these LLMs to deliver results that are fewer shot than otherwise.

hugs

which specific functions/features of the browser control MCP do you lean on the most?

neilv

Nit about doing your AI interfaces on the Web: I really want claude.ai and chatgpt.com to offer a standard username+password login without 2FA. The kind my privacy-friendly browser of short-lived sessions can complete in a couple clicks, like for most other SaaSes, and then I'm in and using the tool.

I don't want to leak data either way by using some "let's throw SSO from a sketchy adtech company into the trust loop".

I don't want to wait a minute for Anthropic's login-by-email link, and have the process slam the brakes on my workflow and train of thought.

I don't want to wait a minute for OpenAI's MFA-by-email code (even though I disabled that in the account settings, it still did it).

I don't want to deal with desktop clients I don't trust, or that might not keep up with feature improvements. Nor have to kludge up a clumsy virtualization sandbox for an untrusted client, just to ask an LLM questions that could just be in a Web browser.

linkregister

In the modern age of mass credential stuffing attacks exploiting password reuse, MFA is one of the most effective tools for reducing unauthorized logins. Companies that don't adopt it are risking unacceptably high levels of credit card chargebacks.

I wish the standard were for companies to check new passwords against leaked password lists, e.g. what https://haveibeenpwned.com uses.

I use a similar workflow and have found that websites that allow passkey-based login can avoid the friction of waiting for TOTP codes or magic links.

amluto

How about using supporting WebAuthn?

The current claude.ai signin mechanism is rather annoying.

ubj

Very curious to see what usage limits are like for paid plans. Anthropic was already experiencing issues with high-volume model usage for Pro and Max users. I hope their infrastructure is able to adequately support running these additional coding environments on top of model inference.

Just to be clear, I'm excited for the capability to use Claude Code entirely within the browser. However, I've heard reports of Max users experiencing throttled usage limits in recent months, and am concerned as to whether this will exacerbate that issue or not.

CharlesW

Anecdotally, as a Max user typically using Claude Code for >8 hours/day, I've never experienced that. That said, I'm not one of those people using Opus for everything, and in fact I've been happy using Sonnet 4.5 even for planning.

minimaxir

I suspect the release of Claude Haiku 4.5 was done to help reduce usage costs for Anthropic and any use of Claude Code will differ to it if capacity is limited.

EDIT: I had meant defer which is the first time I've made a /r/boneappletea in awhile

chrisweekly

"differ"? did you mean "default"?

scubbo

I imagine "defer"

Redster

Here's the link talking about the sandbox environment and features they're using for this Claude Code. https://www.anthropic.com/engineering/claude-code-sandboxing

fny

I've been using Happy Coder[0] for some time now on web and mobile. I run it `--yolo` mode on an isolated VM across multiple projects.

With Happy, I managed to turn one of these Claude Code instances into a replacement for Claude that has all the MCP goodness I could ever want and more.

[0]: https://happy.engineering/

ShipEveryWeek

This looks nice! I’ve been using terminus + tailscale to get similar results, but I’ll give this a go

jngiam1

I got so used to having Claude Code read some of my MCP tools, and was bummed to see that it couldn't connect to them yet on the web.

Pretty cool though! Will need to use it for some more isolated work/code edits. Claude Code is now my workhorse for a ton of stuff including non-coding work (esp. with the right MCPs)

HN

Claude Code on the web

Claude Code on the web