Skip to content(if available)orjump to list(if available)

The new calculus of AI-based coding

The new calculus of AI-based coding

41 comments

·October 27, 2025

Animats

> Instead, we use an approach where a human and AI agent collaborate to produce the code changes. For our team, every commit has an engineer's name attached to it, and that engineer ultimately needs to review and stand behind the code. We use steering rules to setup constraints for how the AI agent should operate within our codebase,

This sounds a lot like Tesla's Fake Self Driving. It self drives right up to the crash, then the user is blamed.

groby_b

Except here it's made abundantly clear, up front, who has responsibility. There's no pretense that it's fully self driving. And the engineer has the power to modify every bit of that decision.

Part of being a mature engineer is knowing when to use which tools, and accepting responsibility for your decisions.

It's not that different from collaborating with a junior engineer. This one can just churn out a lot more code, and has occasional flashes of brilliance, and occasional flashes of inanity.

Animats

> Except here it's made abundantly clear, up front, who has responsibility.

By the people who are disclaiming it, yes.

cadamsdotcom

"We have real mock versions of all our dependencies!"

Congratulations, you invented end-to-end testing.

"We have yellow flags when the build breaks!"

Congratulations! You invented backpressure.

Every team has different needs and path dependencies, so settles on a different interpretation of CI/CD and software eng process. Productizing anything in this space is going to be an uphill battle to yank away teams' hard-earned processes.

Productizing process is hard but it's been done before! When paired with a LOT of spruiking it can really progress the field. It's how we got the first CI/CD tools (eg. https://en.wikipedia.org/wiki/CruiseControl) and testing libraries (eg. pytest)

So I wish you luck!

gachaprize

Classic LLM article:

1) Abstract data showing an increase in "productivity" ... CHECK

2) Completely lacking in any information on what was built with that "productivity" ... CHECK

Hilarious to read this on the backend of the most widely publicized AWS failure.

alfalfasprout

Yep. The problem is then leadership sees this and says "oh, we too can expect 10x productivity if everyone uses these tools. We'll force people to use them or else."

And guess what happens? Reality doesn't match expectations and everyone ends up miserable.

Good engineering orgs should have engineers deciding what tools are appropriate based on what they're trying to do.

gtsop

[dead]

r0x0r007

"For me, roughly 80% of the code I commit these days is written by the AI agent" Therefore, it is not commited by you, but by you in the name of AI agent and the holy slop. What to say, I hope that 100x productivity is worth it and you are making tons of money. If this stuff becomes mainstream, I suggest open source developers stop doing the grind part, stop writing and maintaining cool libraries and just leave all to the productivity guys, let's see how far they get. Maybe I've seen too many 1000x hacker news..

philipp-gayret

This is the first time I see "steering rules" mentioned. I do something similar with Claude, curious how it looks for them and how they integrate it with Q/Kiro.

manmal

Those rules are often ignored by agents. Codex is known to be quite adhering, but it falls back to its own ideas, which run counter to rules I‘ve given it. The longer a session goes on, the more it goes off the rails.

philipp-gayret

I'm aware of the issues around rules as in a default prompt. I had hoped the author of the blog meant a different mechanism when they mentioned "steering rules". I do mean something different, where an agent will self-correct when it is seen going against rules in the initial prompt. I have a different setup myself for Claude Code, and would call parts of that "steering"; adjusting the trajectory of the agent as it goes.

CharlesW

Everything related to LLMs is probabilistic, but those rules are also often followed well by agents.

CharlesW

I'd assume it's related to this Amazon "Socratic Human Feedback (SoHF): Expert Steering Strategies for LLM Code Generation" paper: https://assets.amazon.science/bf/d7/04e34cc14e11b03e798dfec5...

moron4hire

If you are producing real results at 10x then you should be able to show that you are a year ahead of schedule in 5 weeks.

Waiting to see anyone show even a month ahead of schedule after 6 months.

ned_roberts

Looking at the “metrics” they shared, going from committing just about zero code over the last two years to more than zero in the past two months may be a 10x improvement. I haven’t seen any evidence more experienced developers see anywhere near that speedup.

__MatrixMan__

I've never worked anywhere that knew where they were going well enough that it was even possible to be a month ahead of schedule. By the time a month has elapsed the plan is entirely different.

AI can't keep up because its context window is full of yesteryear's wrong ideas about what next month will look like.

whiterook6

This reads like "Hey, we're not vibe coding, but when we do, we're careful!" with hints of "AI coding changes the costs associated with writing code, designing features, and refactoring" sprinkles in to stand out.

brazukadev

But here's the critical part: the quality of what you are creating is way lower than you think, just like AI-written blog posts.

collingreen

Upvoted for dig that is also an accurate and insightful metaphor.

exasperaited

Absolutely none of that article has ever even so much as brushed past the colloquial definition of "calculus".

These guys actually seem rattled now.

null

[deleted]

photochemsyn

Well, 'calculus' is the kind of marketing word that sounds more impressive than 'arithmetic' and I think 'quantum logic' has gone a bit stale, and 'AI-based' might give more hope to the anxious investor class, as 'AI-assisted' is a bit weak as it means the core developer team isn't going to be cut from the labor costs on the balance sheet, they're just going to be 'assisted' (things like AI-written unit tests that still need some checking).

"The Arithmetic of AI-Assisted Coding Looks Marginal" would be the more honest article title.

collingreen

"Galaxy-brain pair programming with the next superintelligence"

skinnymuch

Interesting enough to me though I only skimmed.

I switched back to Rails for my side project a month ago and ai coding when doing not too complex stuff has been great. While the old NextJS code base was in shambles.

Before I was still doing a good chunk of the NextJS coding. I’m probably going to be directly coding less than 10% of the code base from here on out. I’m now spending time trying to automate things as much as possible, make my workflow better, and see what things can be coded without me in the loop. The stuff I’m talking about is basic CRUD and scraping/crawling.

For serious coding, I’d think coding yourself and having ai as your pair programmer is still the way to go.

Madmallard

first the Microsoft guy touting agents

now AWS guy doing it !

"My team is no different—we are producing code at 10x of typical high-velocity team. That's not hyperbole - we've actually collected and analyzed the metrics."

Rofl

"The Cost-Benefit Rebalance"

In here he basically just talks about setting up mock dependencies and introducing intermittent failures into them. Mock dependencies have been around for decades, nothing new here.

It sounds like this test system you set up is as time consuming as solving the actual problems you're trying to solve, so what time are you saving?

"Driving Fast Requires Tighter Feedback Loop"

Yes if you're code-vomiting with agents and your test infrastructure isn't rock solid things will fall apart fast, that's obvious. But setting up a rock solid test infrastructure for your system involves basically solving most of the hard problems in the first place. So again, what? What value are you gaining here?

"The communication bottleneck"

Amazon was doing this when I worked there 12 years ago. We all sat in the same room.

"The gains are real - our team's 10x throughput increase isn't theoretical, it's measurable."

Show the data and proof. Doubt.

Yeah I don't know. This reads like complete nonsense honestly.

Paraphrasing: "AI will give us huge gains, and we're already seeing it. But our pipelines and testing will need to be way stronger to withstand the massive increase in velocity!"

Velocity to do what? What are you guys even doing?

Amazon is firing 30,000 people by the way.

lispisok

We're back to using LOC as a productivity metric because LLMs are best at cranking out thousands of LOC really fast. Personal experience I had a colleague use Claude Code top create a PR consisting of a dozen files and thousands of line of code for something that could have been done in a couple hundred LOC in a single file.

yahoozoo

I had a coworker use Copilot to implement tab indexing through a Material UI DataGrid. The code was a few hundred lines. I showed them a way to do it in literally one line passed in the slot properties.

CharlesW

> We're back to using LOC as a productivity metric because LLMs are best at cranking out thousands of LOC really fast.

Can you point me to anyone who knows what they're talking about declaring that LOC is the best productivity metric for AI-assisted software development?

chipsrafferty

Are you implying that the author of this article doesn't know what they are talking about? Because they basically declared it in the article we just read.

Can you point me to where the author of this article gives any proof to the claim of 10x increased productivity other than the screenshot of their git commits, which shows more squares in recent weeks? I know git commits could be net deleting code rather than adding code, but that's still using LOC, or number of commits as a proxy to it, as a metric.

p1necone

"Our testing needs to be better to handle all this increased velocity" reads to me like a euphemistic way of saying "we've 10x'ed the amount of broken garbage we're producing".

blibble

if you've ever had a friend that you knew before, then they went to work at amazon, it's like watching someone get indoctrinated into a cult

and this guy didn't survive there for a decade by challenging it

reenorap

No.

The way to code going forward with AI is Test Driven Development. The code itself no longer matters. You give the AI a set of requirements, ie. tests that need to pass, and then let it code whatever way it needs to in order to fulfill those requirements. That's it. The new reality us programmers need to face is that code itself has an exact value of $0. That's because AI can generate it, and with every new iteration of the AI, the internal code will get better. What matters now are the prompts.

I always thought TDD was garbage, but now with AI it's the only thing that makes sense. The code itself doesn't matter at all, the only thing that matters is the tests that will prove to the AI that their code is good enough. It can be dogshit code but if it passes all the tests, then it's "good enough". Then, just wait a few months and then rerun the code generation with a new version of the AI and the code will be better. The humans don't need to know what the code actually is. If they find a bug, write a new test and force the AI to rewrite the code to include the new test.

I think TDD has really found its future now that AI coding is here to stay. Human code doesn't matter anymore and in fact I would wager that modifying AI generated code is as bad and a burden. We will need to make sure the test cases are accurate and describe what the AI needs to generate, but that's it.

HellDunkel

No.

The reason AI code generation works so well is a) it is text based- the training data is huge and b) the output is not the final result but a human readable blueprint (source code), ready to be made fit by a human who can form an abstract idea of the whole in his head. The final product is the compiled machine code, we use compilers to do that, not LLMs.

Ai genereted code is not suitable to be directly transferred to the final product awaiting validation by TDD, it would simply be very inefficient to do so.

pcarolan

I mostly agree, but why stop at tests? Shouldn’t it be spec driven development? Then neither the code or the language matter. Wouldn’t user stories and requirements à la bdd (see cucumber) be the right abstraction?

reenorap

I don't think you're wrong but I feel like there's a big bridge between the spec and the code. I think the tests are the part that will be able to give the AI enough context to "get it right" quicker.

It's sort of like a director telling an AI the high level plot of a movie, vs giving an AI the actual storyboards. The storyboards will better capture the vision of the director vs just a high level plot description, in my opinion.

__MatrixMan__

Maybe one day. I find myself doing plenty of course correction at the test level. Safely zooming out doesn't feel imminent.

gmd63

Why stop there? Whichever shareholders flood the datacenter with the most electrical signals get the most profits.

blibble

you will end up with something that passes all your tests then smashes into the back of the lorry the moment it sees anything unexpected

writing comprehensive tests is harder than writing the code

reenorap

Then you write another test. That's the whole point of TDD. As you keep writing more tests, the closer it gets to its final form.

blibble

right, and by the time I have 2^googolplex tests then the "AI" will finally be able to produce a correctly operating hello world

oh no! another bug!