Cloudlflare builds OAuth with Claude and publishes all the prompts
519 comments
·June 2, 2025rienbdj
The commits are revealing.
Look at this one:
> Ask Claude to remove the "backup" encryption key. Clearly it is still important to security-review Claude's code!
> prompt: I noticed you are storing a "backup" of the encryption key as `encryptionKeyJwk`. Doesn't this backup defeat the end-to-end encryption, because the key is available in the grant record without needing any token to unwrap it?
I don’t think a non-expert would even know what this means, let alone spot the issue and direct the model to fix it.
victorbjorklund
That is how LLM:s should be used today. An expert prompts it and checks the code. Still saves a lot of time vs typing everything from scratch. Just the other day I was working on a prototype and let claude write code for a auth flow. Everything was good until the last step where it was just sending the user id as a string with the valid token. So if you got a valid token you could just pass in any user id and become that user. Still saved me a lot of time vs doing it from scratch.
Vinnl
At least for me, I'm fairly sure that I'm better at not adding security flaws to my code (which I'm already not perfect at!) than I am at spotting them in code that I didn't write, unfortunately.
bryant
They're different mindsets. Some folks are better editors, inspectors, auditors, etc, whereas some are better builders, creators, and drafters.
So what you're saying makes sense. And I'm definitely on the other side of that fence.
XCSme
> Still saves a lot of time vs typing everything from scratch.
In my experience, it takes longer to debug/instruct the LLM than to write it from scratch.
Culonavirus
Depends on what you're doing. For example when you're writing something like React components and using something like Tailwind for styling, I find the speedup is close to 10X.
zx8080
> An expert prompts it and checks the code. Still saves a lot of time vs typing everything from scratch.
It's a lie. The marketing one, to be specific, which makes it even worse.
victorbjorklund
huh?
0points
I really don't agree with the idea that expert time would just be spent typing, and I'd be really surprised if that's the common sentiment around here.
An expert reasons, plans ahead, thinks and reasons a little bit more before even thinking about writing code.
If you are measuring productivity by lines of code per hour then you don't understand what being a dev is.
brailsafe
> I really don't agree with the idea that expert time would just be spent typing, and I'd be really surprised if that's the common sentiment around here.
They didn't suggest that at all, they merely suggested that the component of the expert's work that would otherwise be spent typing can be saved, while the rest of their utility comes from intense scrutiny, problem solving, decision making about what to build and why, and everything else that comes from experience and domain understanding.
victorbjorklund
Yea, and you still do that now. Lets say you spend 30% of your time coding and the rest planning. Well, now you got even more time for planning.
otabdeveloper4
> Still saves a lot of time vs typing everything from scratch
No it doesn't. Typing speed is never the bottleneck for an expert.
As an offline database of Google-tier knowledge, LLM's are useful. Though current LLM tech is half-baked, we need:
a) Cheap commodity hardware for running your own models locally. (And by "locally" I mean separate dedicated devices, not something that fights over your desktop's or laptop's resources.)
b) Standard bulletproof ways to fine-tune models on your own data. (Inference is already there mostly with things like llama.cpp, finetuning isn't.)
boruto
I realize I procrastinate less when using LLM to write code which I know I could write.
victorbjorklund
Maybe you type faster than me then :) I for sure type slower than Claude code. :)
brailsafe
> No it doesn't. Typing speed is never the bottleneck for an expert
How could that possibly be true!? Seems like it'd be the same as suggesting being constrained to analog writing utensils wouldn't bottleneck the process of publishing a book or research paper. At the very least such a statement implies that people with ADHD can't be experts.
blinded
Sure! But over half the fun of coding is writing and learning.
signa11
> ... Still saves a lot of time vs typing everything from scratch ...
how ? the prompts have still to be typed right ? and then the output examined in earnest.
fastball
A prompt can be as little as a sentence to write hundreds of lines of code.
victorbjorklund
Latest project I been working on. Prompts are a few sentences (and technically I dictate them instead of typing) and the LLM generates a few hundred lines of code.
fragmede
not if you don't want to. speech to text is pretty good these days, and even eg aider has a /voice command thanks to OpenAI's whisper.
dismalaf
> Still saves a lot of time vs typing everything from scratch
Probably very language specific. I use a lot of Ruby, typing things takes no time it's so terse. Instead I get to spend 95% of my time pondering my problems (or prompting the LLM)...
deepsun
With a proper IDE you don't type much even in Java/.Net, it's all autocomplete anyway. "Too verbose" complaints are mostly from Notepad lovers, and those who never needed to read somebody else's code.
victorbjorklund
It can create a whole dashboard view in elixir in a few seconds that is 100 lines long. No way I can type that in the same time.
i5heu
Revealing against what?
If you look at the README it is completely revealed... so i would argue there is nothing to "reveal" in the first place.
> I started this project on a lark, fully expecting the AI to produce terrible code for me to laugh at. And then, uh... the code actually looked pretty good. Not perfect, but I just told the AI to fix things, and it did. I was shocked.
> To emphasize, this is not "vibe coded". Every line was thoroughly reviewed and cross-referenced with relevant RFCs, by security experts with previous experience with those RFCs.
JW_00000
I think OP meant "revealing" as in "enlightening", not as "uncovering something that was hidden intentionally".
rienbdj
> Revealing against what?
Revealing of what it is like working with an LLM in this way.
kortilla
Revealing the types of critical mistakes LLMs make. In particular someone that didn’t already understand OAuth likely would not have caught this and ended up with a vulnerable system.
risyachka
If the guy knew how to properly implement oauth - did he save any time though by prompting or just tried to prove a point that if you actually already know all details of impl you can guide llm to do it?
Thats the biggest issue I see. In most cases I don't use llm because DIYing it takes less time than prompting/waiting/checking every line.
JimDabell
> did he save any time though
Yes:
> It took me a few days to build the library with AI.
> I estimate it would have taken a few weeks, maybe months to write by hand.
– https://news.ycombinator.com/item?id=44160208
> or just tried to prove a point that if you actually already know all details of impl you can guide llm to do it?
No:
> I was an AI skeptic. I thoughts LLMs were glorified Markov chain generators that didn't actually understand code and couldn't produce anything novel. I started this project on a lark, fully expecting the AI to produce terrible code for me to laugh at. And then, uh... the code actually looked pretty good. Not perfect, but I just told the AI to fix things, and it did. I was shocked.
— https://github.com/cloudflare/workers-oauth-provider/?tab=re...
theshrike79
Do people save time by learning to write code at 420WPM? By optimising their vi(m) layouts and using languages with lots of fancy operators that make things quicker to write?
Using an LLM to write code you already know how to write is just like using intellisense or any other smart autocomplete, but at a larger scale.
DonHopkins
[flagged]
throwaway2037
While I think this is a cool (public) experiment by Claude, asking an LLM to write security-sensitive code seems crazy at this point. Ad absurdum: Can you imagine asking Claude to implement new functionality in OpenSSL libs!?
PeterStuer
Which is exactly why AI coding assistants work with your expertise rather than replace it. Most people I see fail at AI assisted development are either non-technical people expecting the AI will solve it all, or technical people playing gotcha with the machine rather than collaborating with it.
bootsmann
There is also one quite early in the repo where the dev has to tell Claude to store only the hashes of secrets
kentonv
Yeah I was disappointed in that one.
I hate to say, though, but I have reviewed a lot of human code in my time, and I've definitely caught many humans making similar-magnitude mistakes. :/
hn_throwaway_99
I just wanted to say thanks so much publishing this, and especially your comments here - I found them really helpful and insightful. I think it's interesting (though not unexpected) that many of the other commenters' comments here show what a Rorschach test this is. I think that's kind of unfortunate, because your experience clearly showed some of the benefits and limitations/pitfalls of coding like this in an objective manner.
I am curious, did you find the work of reviewing Claude's output more mentally tiring/draining than writing it yourself? Like some other folks mentioned, I generally find reviewing code more mentally tiring than writing it, but I get a lot of personal satisfaction by mentoring junior developers and collaborating with my (human) colleagues (most of them anyway...) Since I don't get that feeling when reviewing AI code, I find it more draining. I'm curious how you felt reviewing this code.
kentonv
I find reviewing AI code less mentally tiring that reviewing human code.
This was a surprise to me! Until I tried it, I dreaded the idea.
I think it is because of the shorter feedback loop. I look at what the AI writes as it is writing it, and can ask for changes which it applies immediately. Reviewing human code typically has hours or days of round-trip time.
Also with the AI code I can just take over if it's not doing the right thing. Humans don't like it when I start pushing commits directly to their PR.
There's also the fact that the AI I'm prompting is, obviously, working on my priorities, whereas humans are often working on other priorities, but I can't just decline to review someone's code because it's not what I'm personally interested in at that moment.
When things go well, reviewing the AI's work is less draining than writing it myself, because it's basically doing the busy work while I'm still in control of high-level direction and architecture. I like that. But things don't always go well. Sometimes the AI goes in totally the wrong direction, and I have to prompt it too many times to do what I want, in which case it's not saving me time. But again, I can always just cancel the session and start doing it myself... humans don't like it when I tell them to drop a PR and let me do it.
Personally, I don't generally get excited about mentoring and collaborating. I wish I did, and I recognize it's an important part of my job which I have to do either way, but I just don't. I get excited primarily about ideas and architecture and not so much about people.
jjcm
Most interesting aspect of this is it likely learned this pattern from human-written code!
ActionHank
But AIbros will be running around telling everyone that Claude invented OAuth for Cloudflare all on its own and then opensourced it.
null
paxys
This is exactly the direction I expect AI-assisted coding to go in. Not software engineers being kicked out and some business person pressing a few buttons to have a fully functional app (as is playing out in a lot of fantasies on LinkedIn & X), but rather experienced engineers using AI to generate bits of code and then meticulously reviewing and testing them.
The million dollar (perhaps literally) question is – could @kentonv have written this library quicker by himself without any AI help?
kentonv
It took me a few days to build the library with AI.
I estimate it would have taken a few weeks, maybe months to write by hand.
That said, this is a pretty ideal use case: implementing a well-known standard on a well-known platform with a clear API spec.
In my attempts to make changes to the Workers Runtime itself using AI, I've generally not felt like it saved much time. Though, people who don't know the codebase as well as I do have reported it helped them a lot.
I have found AI incredibly useful when I jump into other people's complex codebases, that I'm not familiar with. I now feel like I'm comfortable doing that, since AI can help me find my way around very quickly, whereas previously I generally shied away from jumping in and would instead try to get someone on the team to make whatever change I needed.
michelsedgh
The fascinating part is that each person is finding their own way of using these tools from kids to elders and everyone in between no matter what your background or language or whatever is
protocolture
This. Lots of people talking up agents right now, but the conversational rubber duck thing hits the spot well for me.
srhtftw
> It took me a few days to build the library with AI. ... > I estimate it would have taken a few weeks, maybe months to write by hand.
I don't think this is a fair assessment give the summary of the commit history https://pastebin.com/bG0j2ube shows your work started on 2025-02-27 and started trailing off at 2025-03-20 as others joined in. Minor changes continue to present.
> That said, this is a pretty ideal use case: implementing a well-known standard on a well-known platform with a clear API spec.
Still, this allowed you to complete in a month what may have taken two. That's a remarkable feat considering the time and value of someone of your caliber.
kentonv
I think the data supports that there were about 5 distinct days when I did a large amount of work on this library, and a sprinkling of minor commits through the rest of the month. Glen's commits, while numerous, were also fairly minor, mostly logistical details around releases.
This library is not the only thing I was working on, nor even the main thing. As the lead engineer of Cloudflare Workers I have quite a few other things demanding my time.
motorest
> (...) your work started on 2025-02-27 and started trailing off at 2025-03-20 as others joined in. Minor changes continue to present.
Your analysis is far too superficial to extract anything meaningful. I know for a fact that I have small projects that took me only a couple of days to get done which have a commit history ranging a few months. Also, software is never done. There's always room to refactor, and LLMs turn that into trivial problems. Lastly, is your project still under development if your commits are README updates, linter runs, and renaming variables?
There is a reason why commit history is not used to track productivity.
manquer
Is it though?
Would someone of author's caliber even be working on trivial slog item like Oauth2 implementation, if not for the novel development approach he wanted to attempt here ?
For the kind of regular jobs a engineer typically is expected to do, would it give 100% productivity jump ?
9dev
Funny thing. I have built something similar recently, that is a 2.1-compliant authorisation server in TypeScript[0]. I did it by hand, with some LLM help on the documentation. I think it took me about two weeks full time, give or take, and there’s still work to do, especially on the testing side of things, so I would agree with your estimate.
I’m going to take a very close look at your code base :)
[0] https://github.com/colibri-hq/colibri/blob/next/packages/oau...
upstairs-war
Thanks kentonv. I picked up where you left off, supported with oauth2.1 rfc, and integrated ms oauth to our internal mcp server. Cool to have Claude be business aware
graeme
>I have found AI incredibly useful when I jump into other people's complex codebases, that I'm not familiar with. I now feel like I'm comfortable doing that
This makes sense. Are there codebases where you find this doesn't work as well, either from the codebase's min required context size or the code patterns not being in the training data?
aprilthird2021
Matches my experiences well. Making changes to large, complex codebases I know well? Teaching the AI to get up to speed with me takes too much time.
Code I know nothing about? AI is very helpful there
philipwhiuk
> Though, people who don't know the codebase as well as I do have reported it helped them a lot.
My problem I guess is that maybe this is just Dunning-Kruger esq. When you don't know what you don't know you get the impression it's smart. When you do, you think it's rubbish.
Like when you see a media report on a subject you know about and you see it's inaccurate but then somehow still trust the media on a subject you're a non-expert on.
motorest
> My problem I guess is that maybe this is just Dunning-Kruger esq. When you don't know what you don't know you get the impression it's smart. When you do, you think it's rubbish.
I see your point. Indeed there are two completely different points of view regarding the output of LLMs:
* Hey, I managed to vibecode my way into a fully working web service with a React SPA after a couple of prompts, and a full automated test suite to boot.
* This project is nowhere as clean as I would have written it, and doesn't even follow my pet coding conventions.
One side lauds LLMs, the other complains they output mainly crap.
The truth of the matter is that the vast majority of software engineers write crap code, as the definition of "crap code" is "something I would have done differently". Opinionated engineers look at the output of LLMs and accuse it of being crap code. Eppur si muove.
throwaway314155
I think most of this just amounts to the same old good developers vs. bad developers situation that we've been in for decades.
giantrobot
> Like when you see a media report on a subject you know about and you see it's inaccurate but then somehow still trust the media on a subject you're a non-expert on.
Gell-Mann Amnesia https://en.m.wikipedia.org/wiki/Gell-Mann_amnesia_effect
gokhan
> Not software engineers being kicked out ... but rather experienced engineers using AI to generate bits of code and then meticulously reviewing and testing them.
But what if you only need 2 kentonv's instead of 20 at the end? Do you assume we'll find enough new tasks that will occupy the other 18? I think that's the question.
And the author is implementing a fairly technical project in this case. How about routine LoB app development?
thewebguyd
> But what if you only need 2 kentonv's instead of 20 at the end? Do you assume we'll find enough new tasks that will occupy the other 18? I think that's the question.
This is likely where all this will end up. I have doubts that AI will replace all engineers, but I have no doubt in my mind that we'll certainly need a lot less engineers.
A not so dissimilar thing happened in the sysadmin world (my career) when everything transitioned from ClickOps to the cloud & Infrastructure as Code. Infrastructure that needed 10 sysadmins to manage now only needed 1 or 2 infrastructure folks.
The role still exists, but the quantity needed is drastically reduced. The work that I do now by myself would have needed an entire team before AWS/Ansible/Terraform, etc.
kentonv
I think there's a huge huge space of software to build that isn't being touched today because it's not cost-effective to have an engineer build them.
But if the time it takes an engineer to build any one thing goes down, now there are a lot more things that are cost effective.
Consider niche use cases. Every company tends to have custom processes and workflows. Think about being an accountant at one company vs. another -- while a lot of the job is the same, there will always be parts that are significantly different. Those bespoke processes often involve manual labor because off-the-shelf accounting software cannot add custom features for every company.
But what if it could? What if an engineer working with AI could knock out customer-specific features 10x as fast as they could in the past. Now it actually makes sense to build those features, to improve the productivity of each company's accounting department.
It's hard to say if demand for engineers will go down or up. I'm not pretending to know for sure. But I can see a possibility that we actually have way more developers in coming years!
mikeocool
Though arguably cloud infra made it so that a lot more companies who never would have built out a data center or leased a chunk of space in one were spinning up some serious infra in AWS or Azure -- and thus hiring at least 1-2 devops engineers.
Before the end of zero interest rate policy, all the sysadmins I knew who the made the transition to devops were never stuck looking for a job for long.
achierius
To be clear, the number of people employed as "SREs" or "production engineers" is actually far, far higher (at least an order of magnitude) than in the days before cloud became a thing. There are simply far more apps / companies / businesses / etc. who use cloud hosting than there ever were doing on-prem work.
tkiolp4
I don’t think we would need less engineers… the work to be done will increase instead. Example: now it takes 10 engineers to release a product in 10 months without AI. With AI it takes lets say 1 engineer to release the same product in 1 month. What’s the company gonna do now? Release 10 products in 10 months without AI 10 engineers (each using AI).
It’s an exaggeration I know, but you get the point.
intended
I’m going to bet that it’s going to need far less AI.
There was another article posted somewhere that made a parallel between the AI hype and no-code, outsourcing and other waves that have come.
paxys
Increased productivity means increased opportuntity. There isn't going to be a time (at least not anytime soon) when we can all sit back and say "yup, we have accomplished everything there is to do with software and don't need more engineers".
spiderice
But there very well might be a time very soon where human's no longer offer economic value to the software engineering process. If you could (and currently you can't) pay an AI $10k/year to do what a human could do in a year, why would you pay the human 6 figures? Or even $20k?
Nobody is claiming that human's won't have jobs simply because "we have accomplished everything this is to do". It's that humans will offer zero economic value compared to AI because AI gets so good and so cheap.
simonw
I guess I have trouble emphasizing with "But what if you only need 2 kentonv's instead of 20 at the end?" because I'm an open source oriented developer.
What's open source for if not allowing 2 developers to achieve projects that previously would have taken 20?
dkdcio
> The million dollar (perhaps literally) question is – could @kentonv have written this library quicker by himself without any AI help?
I *think* the answer to this is clearly no: or at least, given what we can accomplish today with the tools we have now, and that we are still collectively learning how to effectively use this, there's no way it won't be faster (with effective use) in another 3-6 months to fully-code new solutions with AI. I think it requires a lot of work: well-documented, well-structured codebases with fast built-in feedback loops (good linting/unit tests etc.), but we're heading there no
motorest
> I think the answer to this is clearly no: or at least, given what we can accomplish today with the tools we have now, and that we are still collectively learning how to effectively use this, there's no way it won't be faster (with effective use) in another 3-6 months to fully-code new solutions with AI.
I think these discussions need to start from another point. The techniques changed radically, and so did the way problems are tackled. It's not that a software engineer is/was unable to deliver a project with/without LLMs. That's a red herring. The key aspects are things like the overall quality of the work being delivered vs how much time it took to reach that level of quality.
For example, one of the primary ways a LLM is used is not to write code at all: it's to explain to you what you are looking at. Whether it's used as a Google substitute or a rubber duck, developers are able to reason with existing projects and even explore approaches and strategies to tackle problem like they were never able to do so. You no longer need to book meetings with a principal engineer to as questions: you just drop a line in Copilot Chat and ask away.
Another critical aspect is that LLMs help you explore options faster, and iterate over them. This allows you to figure out what approach works best for your scenario and adapt to emerging requirements without having to even chat with anyone. This means that, within the timeframe you would deliver the first iteration of a MVP, you can very easily deliver a much more stable project.
james_marks
Exactly this
> Another critical aspect is that LLMs help you explore options faster, and iterate over them. This allows you to figure out what approach works best for your scenario and adapt to emerging requirements without having to even chat with anyone. This means that, within the timeframe you would deliver the first iteration of a MVP, you can very easily deliver a much more stable project
colonCapitalDee
I'm had great success with downloading source code and docs and using Claude Code to query them
necovek
In a "well-documented, well-structured codebase with fast built-in feedback loops", a human programmer is really empowered to make changes fast. This is exactly what's needed for fast iteration, including in unfamiliar codebases.
When you are not introducing a new pattern in the code structure, it's mostly copy-paste and then edit.
But it's also extremely rare, so a pretty high bar to be able to benefit from tools like AI.
bigstrat2003
> but rather experienced engineers using AI to generate bits of code and then meticulously testing and reviewing them.
My problem is that (in my experience anyways) this is slower than me just writing the code myself. That's why AI is not a useful tool right now. They only get it right sometimes so it winds up being easier to just do it yourself in the first place. As the saying goes: bad help is worse than no help at all, and AI is bad help right now.
motorest
> My problem is that (in my experience anyways) this is slower than me just writing the code myself.
In my experience, the only times LLMs slow down your task is when you don't use them effectively. For example, if you provide barely any context or feedback and you prompt a LLM to write you the world, of course it will output unusable results, primarily because it will be forced to interpolate and extrapolate through the missing context.
If you take the time to learn how to gently prompt a LLM into doing what you need, you'll find out it makes you far more productive.
JimDabell
> My problem is that (in my experience anyways) this is slower than me just writing the code myself.
How much experience do you have writing code vs how much experience do you have prompting using AI though? You have to factor in that these tools are new and everybody is still figuring out how to use them effectively.
imiric
> You have to factor in that these tools are new and everybody is still figuring out how to use them effectively.
I think that the skills required are highly overblown.
The user should be aware of what each model excels at, its context size, temperature, and other parameters; how to communicate well, set system prompts and phrase tasks in a clear, succinct yet informative way; how to refocus the session when it veers off track; keep up to date with the latest (<~6mo) concepts and tooling, and so on.
All of this is trivial for a competent software engineer. The idea that it requires some specialized training that couldn't be attained by experimentation and reading a blog post is absurd. "Prompt engineering" just isn't a thing.
uludag
I feel this is on point. So not only is there the time lost correcting and testing AI generated code, but there's also the mental model you build of the code when you write it yourself.
Assuming you want a strong mental model of what the code does and how it works (which you'd use in conversations with stakeholders and architecture discussions for example), writing the code manually, with perhaps minor completion-like AI assistance, may be the optimal approach.
0xbadcafebee
That's not the million dollar question; anyone who's done any kind of AI coding will tell you it's ridiculously faster. I haven't touched JavaScript, CSS & HTML in like a decade. But I got a whole website created with complex UI interactions in 20 minutes - and no frameworks - by just asking ChatGPT to write stuff for me. And that's the crappy, inefficient way of doing this work. Would have taken me a week to figure out all that. If I'd known how to do it already, and I was very good, perhaps it would have taken the same amount of time? But clearly there is a force-multiplier at work here.
The million dollar question is, what are the unintended, unpredicted consequences of developing this way?
If AI allows me to write code 10x faster, I might end up with 10x more code. Has our ability to review it gotten equally fast? Will the number of bugs multiply? Will there be new classes of bugs? Will we now hire 1 person where we hired 5 before? If that happens, will the 1 person leaving the company become a disaster? How will hiring work (cuz we have such a stellar track record at that...)? Will the changing economics of creating software now make SaaS no longer viable? Or will it make traditional commercial software companies no longer viable? Will the entire global economy change, the way it did with the rise of the first tech industry? Are we seeing a rebirth?
We won't know for sure what the consequences are for a while. But there will be consequences.
motorest
> This is exactly the direction I expect AI-assisted coding to go in. Not software engineers being kicked out and some business person pressing a few buttons to have a fully functional app (as is playing out in a lot of fantasies on LinkedIn & X), but rather experienced engineers using AI to generate bits of code and then meticulously reviewing and testing them.
There is a middle ground: software engineers being kicked out because now some business person can hand over the task of building the entire OAuth infrastructure to a single inexperienced developer with a Claude account.
petersellers
I'm not so sure that would work well in practice. How would the inexperienced developer know that the code created by the AI was correct? What if subtle bugs are introduced that the inexperienced developer didn't catch until it went out into production? What if the developer didn't even know how to debug those problems correctly? Would they know that the code they are writing is maintainable and extensible, or are they just going to generate a new layer of code on top of the old one any time they need a new feature?
motorest
> I'm not so sure that would work well in practice. How would the inexperienced developer know that the code created by the AI was correct?
Not a problem. The industry has evolved to tolerate buggy code that barely works. In fact, in some circles that's what's already expected from the baseline. LLMs change nothing in this regard. In fact, they arguably improve upon this problem as it becomes trivial to implement extensive automated test suites.
> What if subtle bugs are introduced that the inexperienced developer didn't catch until it went out into production?
That's what is happening in the real world without LLMs entering the picture.
intended
By then the person who suggested the idea has left the firm.
stackskipton
>experienced engineers using AI to generate bits of code and then meticulously reviewing and testing them
And where are supposed to get experienced engineers if replaced all Jr Devs with AI? There is a ton of benefit from drudgery of writing classes even if seems like grunt work at the time.
jstummbillig
This is not where AI-assisted coding is going. Where it is going is: The AI will quickly become better at avoiding these types of mistakes than humans ever were (and are ever going to be), because they can and thus will be RL'ed away. What will be left standing longest is providing the vision wrt what the actual problem is, you want to solve.
c-linkage
I very much appreciate the fact that the OP posted not just the code developed by AI but also posted the prompts.
I have tried to develop some code (typically non-web-based code) with LLMs but never seem to get very far before the hallucinations kick in and drive me mad. Given how many other people claim to have success, I figure maybe I'm just not writing the prompts correctly.
Getting a chance to see the prompts shows I'm not actually that far off.
Perhaps the LLMs don't work great for me because the problems I'm working on a somewhat obscure (currently reverse engineering SAP ABAP code to make a .NET implementation on data hosted in Snowflake) and often quite novel (I'm sure there is an OpenAuth implementation on gitbub somewhere from which the LLM can crib).
8-prime
This is something that I have noticed as well. As soon as you venture into somewhat obscure fields, the output quality of LLMs drastically drops in my experience.
Side note, reverse engineering SAP ABAP sounds torturous.
Vicinity9635
The worst part isn't even that the quality drops off, it's that the quality drops off but the tone of the responses don't. So hallucinations can start and it's just confidently wrong or even dangerous code and the only way to know better is to be better than the LLM in the first place.
They might surpass us someday, but we aren't there yet.
theshrike79
The usual solution is a multi-tiered one.
First you use any LLM with a large context to write down the plan - preferably in a markdown file with checkboxes "- [ ] Task 1"
Then you can iterate on the plan and ask another LLM more focused on the subject matter to do the tasks one by one, which allows it to work without too much hallucination as the context is more focused.
mtlynch
>In all seriousness, two months ago (January 2025), I (@kentonv) would have agreed.
I'm confused by "I (@kentonv)" means here because kentonv is a different user.[0] Are you saying this is your alt? Or is this a typo/misunderstanding?
Edit: Figured out that most of your post is quoting the README. Consider using > and * characters to clarify.
kentonv
He is quoting from the project readme. I wrote all this text.
mdaniel
Thanks for weighing in here
If I might make a suggestion, based on how fast things change, even within a model family, you may benefit from saying Claude what. I was especially cognizant of this given the recent v4 release which (of course) hailed as the second coming. Regardless, you may want to update your readme to say
It may also be wildly out of scope for including in a project's readme, but knowing which of the bazillions of coding tools you used would also help a tiny bit with this reproduction crises found in every single one of these style threads
kentonv
I believe it's important to say when AI was used so heavily in building a library -- it would feel dishonest to me to claim I wrote it all myself. I also think it's just a pretty interesting thing to know about. So I think it belongs in the readme. (But I'm not making a moral judgment on what anyone else does.)
It was almost entirely Claude Sonnet 3.7. I agree I should add the version to the readme.
diggan
> It may also be wildly out of scope for including in a project's readme
The entire point of the repository seems to be to invalidate/validate the thesis if LLMs are good enough to be pair programmers right now. Removing it from the README makes no sense in that context.
diggan
It's a literal copy-paste from the README, I think it was supposed to be quoted but parent messed it up somehow.
https://github.com/cloudflare/workers-oauth-provider/blob/fe...
dang
(this comment was originally a reply to https://news.ycombinator.com/item?id=44159167, which summarized the readme in a confusing way.)
null
null
jes5199
I’ve been using Claude (via Cursor) on a greenfield project for the last couple months and my observation is:
1. I am much more productive/effective
2. It’s way more cognitively demanding than writing code the old-fashioned way
3. Even over this short timespan, the tools have improved significantly, amplifying both of the points above
pton_xd
This mirrors my experience and those I've talked to.
LLM assisted coding is a way to get stuff done much faster, at a greatly increased mental cost / energy spent. Oddly enough.
piker
The small dopamine hits you get from "it compiles" are completely automated away, and you're forced to survive on the goal alone. The issues are necessarily complex and require thinking about how the LLM has gotten it subtly wrong.
Painful, but effective?
chii
> Oddly enough.
i actually dont find that outcome odd at all. The high cognative demand comes from the elimination of spurious busy work that would normally come with coding (things like syntax sugars, framework outline, and such). If an AI takes care of all of these things, and lets an author "code" at the speed of thought, you'd be running your engine at maximum.
Not to mention the need to also critically look at the generated code to ensure it's actual correctness (hopefully this can also be helped/offloaded by an ai in the future).
diggan
> It’s way more cognitively demanding than writing code the old-fashioned way
How are you using it?
I've been mainly doing "pair programming" with my own agent (using Devstral as of late) and find the reviewing much easier than it would been to literally type all of the code it produces, at least time wise.
I've also tried vibe coding for a bit, and for that I'd agree with you, as you don't have any context if you end up wanting to review something. Basically, if the project was vibe coded from the beginning, it's much harder to get into the codebase.
But when pair programming with the LLM, I already have a built up context, and understand how I want things to be and so on, so reviewing pair programmed code goes a lot faster than reviewing vibe coded code.
jes5199
I’ve tried a bunch of things but now I’m mostly using Cursor in agent mode with Claude Sonnet 4, doing small-ish pull-request-sized prompts. I don’t have to review code as carefully as I did with Claude 3.7
but I’m finding the bottleneck now is architecture design. I end up having these long discussions with chatGPT-o3 about design patterns, sometimes days of thinking, and then relatively quick implementation sessions with Cursor
jcims
It will be interesting to see if and how all of this improves standards around how we document the architecture and concepts of software.
SkyPuncher
> 2. It’s way more cognitively demanding than writing code the old-fashioned way
Funnily, enough, I find the exact opposite. I feel so much relief that I don't have to waste time figuring out every, single detail. It frees me up to focus on architectural and higher level changes.
jes5199
I guess what I mean is, I found the details sort of “mindless” before. Code that I could write in my sleep. Now I only have to do the thinky parts
layer8
This means that you fully trust the LLM to get the details right.
null
infinitebattery
From this commit: https://github.com/cloudflare/workers-oauth-provider/commit/...
===
"Fix Claude's bug manually. Claude had a bug in the previous commit. I prompted it multiple times to fix the bug but it kept doing the wrong thing.
So this change is manually written by a human.
I also extended the README to discuss the OAuth 2.1 spec problem."
===
This is super relatable to my experience trying to use these AI tools. They can get halfway there and then struggle immensely.
diggan
> They can get halfway there and then struggle immensely.
Restart the conversation from scratch. As soon as you get something incorrect, begin from the beginning.
It seems to me like any mistake in a messages chain/conversation instantly poisons the output afterwards, even if you try to "correct" it.
So if something was wrong at one point, you need to go back to the initial message, and adjust it to clarify the prompt enough so it doesn't make that same mistake again, and regenerate the conversation from there on.
int_19h
Chatbot UIs really need better support for conversation branching all around. It's very handy to be able to just right-click on any random message in the conversation in LM Studio and say, "branch from here".
diggan
Maybe it's contrarian, maybe it's not, but I don't think Chat UIs are well suited for software engineering/programming at all, we need something completely different. Being able to branch conversations and such would be useful, but probably not for the way I do software. Besides, I'm rarely beyond 3 messages (1 system, 1 user, 1 assistant) in any usage of the chat UIs. Maybe it's more useful to people with different workflows.
impure-aqua
Certainly in my version of LM Studio (0.3.15) it has a branch button at the end of every message [0]
carlosjs23
AI Studio has this, I usually ask it to plan and I do some rounds of refining until the plan covers all my requirements, then I branch this conversation, a branch for each feature, none of the branches get polluted this way.
eikenberry
I thought Claude still has a problem generating the same output for the same input? That you can't just rewind and rerun and get to the same point again.
diggan
> I thought Claude still has a problem generating the same output for the same input?
I haven't used Anthropic's models/software in a long time (months, basically forever in AI ecosystem), so don't know exactly how it works now.
But last time I used Claude, you could edit the first message, and then re-generate the assistants next message based on your edit. Most of the LLM interfaces has one or another way of doing this, I can't imagine they got rid of that feature.
What I'm suggesting isn't to use the exact same input (the first message), but rather change it so you remove the chances of something incorrect happening later after that.
throwaway314155
> can't just rewind and rerun and get to the same point again
Why would you want to? The whole point of a retry is that your previous conversation attempt went poorly.
viktorcode
It can be done, but for my environment the sum of all prompts that I end up typing to get the right result ends up being longer than the actual code.
So now I'm using LLMs as crapshoot machines for generating ideas which I then implement manually
dingnuts
Can you imagine if Excel worked like this? the formula put out the wrong result, so try again! It's like that scene from The Office where Michael has an accountant "run it again." It's farcical. They have created computers that are bad at math and I will never forgive them.
Also, each try costs money! You're pulling the lever on a god damned slot machine!
I will TRY AGAIN with the same prompt when I start getting a refund for my wasted money and time when the model outputs bullshit, otherwise this is all confirmation and sunk cost bias talking, I'm sure if it.
diggan
> Can you imagine if Excel worked like this?
I mean, why would I imagine that? Who would want that? It's like the argument against legal marijuana, and someone replies "But would you like your pilot to be high when flying?!". Right tool for the right job, clearly when you want 100% certainty then LLMs aren't the tool for that. Just because they're useful for some things don't mean we have to replace everything with them.
> Also, each try costs money!
I guess you're using some paid API? Try a different way then. I mostly use the web UI from OpenAI, or Codex lately, or ran locally with my own agent using local weights, neither is "each try costs money" more than writing data to my SSD is costing me money.
It's not a holy grail some people paint it, and not sure we're across the "productivity threshold" (https://news.ycombinator.com/item?id=44160664) yet, but it's worth trying it out probably before jumping to conclusions. But no one is forcing you either, YMMV and all that.
krooj
The comment in lines 163 - 172 make some claims that are outright false and/or highly A/S dependent, to the point where I question the validity of this post entirely. While it's possible that an A/S can be pseudo-generated based on lots of training data, each implementation makes very specific design choices: i.e.: Auth0's A/S allows for a notion of "leeway" within the scope of refresh token grant flows to account for network conditions, but other A/S implementations may be far more strict in this regard.
My point being: assuming you have RFCs (which leave A LOT to the imagination) and some OSS implementations to train on, each implementation usually has too many highly specific choices made to safely assume an LLM would be able to cobble something together without an amount of oversight effort approaching simply writing the damned thing yourself.
arendtio
One way to mitigate the issue is to use tests or specifications and let the AI find a solution to the spec.
A few months ago, solving such a spec riddle could take a while, and most of the time, the solutions that were produced by long run times were worse than the quick solutions. However, recently the models have become significantly better at solving such riddles, making it fun (depending on how well your use case can be put into specs).
In my experience, sonnet 3.7 represented a significant step forward compared to sonnet 3.5 in this discipline, and Gemini 2.5 Pro was even more impressive. Sonnet 4 makes even fewer mistakes, but it is still necessary to guide the AI through sound software engineering practices (obtaining requirements, discovering technical solutions, designing architecture, writing user stories and specifications, and writing code) to achieve good results.
Edit: And there is another trick: Provide good examples to the AI. Recently, I wanted to create an app with the OpenAI Realtime API and at first it failed miserably, but then I added the most important two pages of the documentation and one of the demo projects into my workspace and just like that it worked (even though für my use-case the API calls had to be use quite differently).
fxnn
That's one thing where I love Golang. I just tell Aider to `/run go doc github.com/some/package`, and it includes the full signatures in the chat history.
It's true: often enough AI struggles to use libraries, and doesn't remember the usage correctly. Simply adding the go doc fixed that often.
nicce
I am waiting for studies whether we have just an illusion of production or these actually save man hours in the long term in creation of production-level systems.
mysterydip
This to me is why I think these tools don't have actual understanding, and are instead producing emergent output from pooling an incomprehensibly large set of pattern-recognized data.
diggan
> these tools don't have actual understanding, and are instead producing emergent output from pooling an incomprehensibly large set of pattern-recognized data
I mean, bypassing the fact that "actual understanding" doesn't have any consensus about what it is, does it matter if it's "actual understanding" or "kind of understanding", or even "barely understanding", as long as it produces the results you expect?
sceptic123
> as long as it produces the results you expect?
But it's more the case of "until it doesn't produce the results you expect" and then what do you do?
mysterydip
No, I was not making a critique on its effectiveness at generating usable results. I was responding to what I've seen in several other articles here arguing towards anthropomorphism.
nisegami
Same. But I personally find it a lot easier to do those bits at the end than to begin from a blank file/function, so it's a good match for me.
SkyPuncher
Same here. Sometimes you just need time to stew in the problem/solution space.
LLMs let me be ultraproductive upfront then come in at the end to clean up when I have a full understanding.
aeneas_ory
Very impressive, and at the same time very scary because who knows what security issues are hidden beneath the surface. Not even Claude knows! There is very reliable tooling like https://github.com/ory/hydra readily available that has gone through years of iteration and pentests. There are also lots of libraries - even for NodeJS - that have gone through certification.
In my view this is an antipattern of AI usage and „roll your own crypto“ reborn.
vaidhy
I think the discussions are also missing another key element. The time in takes to read someone else code is way more mentally tiring.
When I am writing the code, my mind tracks what I have done and the new pieces flow. When I am reading code written by someone else, there is no flow.. I have to track individual pieces and go back and forth on what was done before.
I can see myself using LLMs for short snippets rather than start something top down.
jauntywundrkind
> Again, please check out the commit history -- especially early commits -- to understand how this went.
Direct link to earliest page of history: https://github.com/cloudflare/workers-oauth-provider/commits...
A lot of very explicit & clear prompting, with direct directions to go. Some examples on the first page: https://github.com/cloudflare/workers-oauth-provider/commit/... https://github.com/cloudflare/workers-oauth-provider/commit/...
simonw
The most clearly Claude-written commits are on the first page, this link should get you to them: https://github.com/cloudflare/workers-oauth-provider/commits...
Luker88
Hello Cloudflare, impressive result, I did not think things were this advanced.
Still, legal question where I'd like to be wrong: AFAIK (and IANAL) if I use AI to generate images, I can't attach copyright to it.
But the code here is clearly copyrighted to you.
Is that possible because you manually modify the code?
How does it work in examples like this one where you try to have close to all code generated by AI?
kentonv
I am also not a lawyer, but I believe the law here is yet to be fully settled. Here in the US, there have been lower-court rulings but surely it will go to the supreme court.
There are parts of the library that I did write by hand, which are presumably copyright Cloudflare either way. As for whether the AI-generated parts are, I guess we'll see.
But given the thing is MIT-licensed, it doesn't seem like it matters much in this case?
dvrp
Did you check the latest documents from copyright.gov? They’re interesting exactly because of what you’re saying
Luker88
I did not, especially seeing as I am not from the USA, so I'd like to have the point of view of a multinational company
--edit: didn't the same office have a controversy a few weeks ago where AI training was almost declared not-fair-use, and the boss was fired on the spot byt the administration, or something like that?
Things sounds confusing to me, which is why I'm asking
fastball
I believe you are wrong about AI-generated images as well.
declan_roberts
Getting a "Too Many Requests" error is kind of hilarious given the company involved.
rcastellotti
same
See also https://github.com/cloudflare/workers-oauth-provider/commits... (via https://news.ycombinator.com/item?id=44161672)