AI code review: Should the author be the reviewer?
37 comments
·May 1, 2025simonw
As a programmer, your responsibility is to produce working code that you are confident in. I don't care if you used an LLM to help you get to that point, what matters is that when you submit a PR you are effectively saying:
"I have put the work into this change to ensure that it solves the problem to the best of my ability. I am willing to stake my reputation on this being good to the best of my knowledge, and it is not a waste of your time to review this and help confirm I didn't miss anything."
So no, AI assisted programming changes nothing: you're still the author, and if your company policy dictates it then you should still have a second pair of human eyes look at that PR before you land it.
ebiester
The example they were using, however, was Devin, which is supposed to be autonomous. I think they're presenting a slightly different use case than the rest of us are.
simonw
Oh interesting, I missed that detail.
I don't believe in the category of "autonomous AI coding agents". That's not how responsible software development works.
codydkdc
we basically use the same as above, but for reviews. you can use AI to help with reviews, but you're signing off when you approve the PR
nemomarx
If you were writing code for a business and actually paid someone else to do a module of the code or whatever, I don't think that would actually change the use case? if you're submitting it as your work through the normal flow it should go through a normal reviewer right
godelski
> your responsibility is to produce working code that you are confident in
I highly agree with this but can we recognize that even before AI coding that this (low standard) is not being met? We've created a culture where we encourage cutting corners and rushing things. We pretend there is this divide between "people who love to code" and "people who see coding as a means to an end." We pretend that "end" is "a product" and not "money". The ones who "love to code" are using code as a means to make things. Beautiful code isn't about its aesthetics, it is about elegant solutions that solve problems. Love to code is about taking pride in the things you build.We forgot that there was something magic about coding. We can make a lot of money and make the world a better place. But we got too obsessed with the money that we let it get in the way of the latter. We've become rent seeking, we've become myopic. Look at Apple. They benefit from developers make apps even without taking a 30% cut. They would still come out ahead if they paid developers! The apps are the whole fucking reason we have smartphones, the whole reason we have computers in the first place. I call this myopic because both parties would benefit in the long run, getting higher rewards than had we not worked together. It was the open system that made this world, and in response we decided to put up walls.
You're right, it really doesn't matter who or what writes the code. At the end of the day it is the code that matters. But I think we would be naive to dismiss the prevalence of "AI Slop". Certainly AI can help us, but are we going to use it to make better code or are we going to use it to write shit code faster? Honestly, all the pushback seems to just be the result of going too far.
dakshgupta
I'm not sure that commercially-motivated, mass-produced code takes away from "artisan" code. The former is off putting for the artisans among us, but if you were to sort the engineers at a well functioning software company by how good they are/how well they're compensated, you'd have approximately produced a list of who loves the craft they most.
godelski
I'm not talking about "artisan code". I'm talking about having pride in your work. I'm talking about being an engineer. You don't have to love your craft to make things have some quality. It helps, but it isn't necessary.
But I disagree. I don't think you see these strong correlations between compensation and competency. We use dumb metrics like leet code, jira tickets filled, and lines of code written. It's hard to measure how many jira tickets someone's code results in. It's hard to determine if it is because they wrote shit code or because they wrote a feature that is now getting a lot of attention. But we often know the answer intuitively.
There's so much low hanging fruit out there. We were dissing YouTube yesterday right?
Why is my home page 2 videos taking up 70% of the row, then 5 shorts, 2 videos taking 60% of the row, 5 shorts, and then 3 videos taking the whole row? All those videos are aligned! Then I refresh the page and it is 2 rows of 3.
I search a video and I get 3 somewhat related videos and then just a list of unrelated stuff. WHY?!
Why is it that when you have captions on that these will display directly on top of captions (or other text) that are embedded into the video? You tell me you can autogenerate captions but can't auto-detect them? This is super clear if you watch any shorts.
Speaking of shorts do we have to display comments on top of the video? Why are we filling so much of the screen real estate with stuff that people don't care about and cover the actual content? If you're going to do that at least shrink the video or add an alpha channel.
I'm not convinced because I see so much shit. Maybe you're right and that the "artisans" are paid more, but putting a diamond in a landfill doesn't make it any less of a dump. I certainly think "the masses" get in the way of "the artisans".
The job of an engineer is to be a little grumpy. The job of an engineer is to identify problems and to fix them. The "grumpyness" is just direction and motivation.
Edit:
It may be worth disclosing that you're the CEO of an AI code review bot. It doesn't invalidate your comment but you certainly have a horse in the race. A horse that benefits from low quality code becoming more prolific.
rienbdj
Except lots of engineers now sling AI generated slop over the wall and expect everyone else to catch any issues. Before, generating lots of realistic code was time consuming, so this didn’t happen so much.
simonw
Those engineers are doing their job badly and should be told to do better.
gilleain
Agreed. To put it another way, a few years ago, you could copy/paste from a similar code example you found online or elsewhere in the same repository, tweak it a bit then commit that.
Still bad. AI just makes it faster to make new bad code.
edit : to be clearer, the problem in both copy/paste and AI examples is the lack of thought or review.
svieira
I want to highlight this bit:
> 2. Engineers underestimate the degree to which this is true, and don’t carefully review AI-generated code to the degree to which they would review their own.
> The reason for #2 isn’t complacency, one can review code at the speed at which they can think and type, but not at the speed at which an LLM can generate. When you’re typing code, you review-as-you-go, when you AI-generate the code, you don’t.
> Interestingly, the inverse is true for mediocre engineers, for whom AI actually improves the quality of the code they produce. AI simply makes good and bad engineers converge on the same median as they rely on it more heavily.
I find it interesting that the mode of interaction is different (I definitely find it that way myself, code review is a different activity than designing, which is different than writing, but code review tends to have aspects of design-and-write but in different orders.
throwup238
Different people also work better in different modes of interaction which is what I think this article papers over.
For me, reviewing code is much easier than writing it because, while the amount of codebase context stays the same in both modes, the context for writing takes up quite a bit more space in addition. I rarely get a nicely specced out issue where I can focus on writing the code, instead spending a lot of mental capacity trying to figure out how to fill in the details that were left.
Focusing on the codebase during review reallocates that context to just the codebase. My brain then pattern matches against code that’s already in front of me much easier than when writing new code. Unfortunately LLMs are largely at the junior engineer level and reviewing those PRs takes a lot more mental effort than my coworkers’.
okdood64
A good engineer != a good coder.
SOTGO
I thought the section on finding bugs was interesting. I’d be curious how many false positives the LLM identified to get the true positive rate that high. My experience with LLMs is that they will find “bugs” if you ask them too, even if there isn’t one.
dakshgupta
This specific case each file had a single bug in it, and the bot was instructed to find exactly one bug. The wrong cases were all false positives, in that it made up a bug
never_better
In my experience that's what's separated the different AI code review tools on the market right now.
Some are tuned far better than others on signal/noise ratio.
pjmlp
Eventually being a technical architect doing an acceptance review from deliverables would be only thing left.
gonzan
I think it absolutely makes sense. Especially if the bot and prompts that go in the code review are different from the bot/prompts that wrote the code. But sometimes even the same one can find different errors if you just give it more cycles/iterations to look at the code.
We humans (most of us anyways) don't write everything perfectly in one go, AI doesn't either.
AI tooling is improving so the AI can write tests for its own code and do pre-reviews but I don't think it ever hurts to have both an AI and a human review the code of any PR opened, no matter who or what opened it.
I'm also building a tool in the space https://kamaraapp.com/ and I found many times that Kamara's reviews find issues in Kamara's own code. I can say that I also find bugs in my own code when I review it too!
We've also been battling with the same issue greptile has in the example provided where the code suggestion is in the completely wrong line. We got it kind of under control, but I haven't found any tool that gets it right 100% of the time. Still a bit to go for the big AI takeover.
devrandoom
Nothing wrong with reviewing your own code per se. But it's not a "code review" as such.
It's very hard to spot your own mistakes, you tend to read what you intended to write, not what's actually written.
This applies both to code and plain text.
phamilton
We used to have a "2 approvals" policy on PRs. It wasn't fully enforced, it was a plugin to Gitlab we built that would look for two "+1" comments to unhide the merge button.
I used to create PRs and then review my own code. If I liked it, I'd +1 it. If I saw problems, I'd -1 it. Other engineers would yell at me that I couldn't +1 my own code. When I showed them PRs that had my -1 on it, they just threw their hands up and moved on, exasperated.
I've carried that habit of reviewing my own code forward, even though we now have real checks that enforce a separate reviewer. It's a good habit.
gitroom
honestly i still trust a second pair of human eyes more than any bot, but gotta admit AI can find some funky bugs id never spot
dakshgupta
Something to be said about having two sets of eyes that are as different from one another as possible, which is achieved by one of them not being human.
JonChesterfield
> As a result - asking an LLM to review its own code is looking at it with a fresh set of eyes.
I wonder how persuasive that line of reasoning is. It's nonsense in a few dimensions but that doesn't appear to be a blocker to accepting a claim.
Anyone remember explaining to someone that even though the computer said a thing, that thing is still not right? Really strong sense that we're reliving that experience.
logicchains
A nice thing about AI is it can write a bunch more unit tests than a human could (or would) in a short span of time, and often even fix the issues it encounters on its own. The ideal workflow is not just having the AI do code review, but also write and run a bunch of tests that confirm the code behaves as specified (assuming a clear spec).
If too many unit tests slowing down future refactoring is a problem (or e.g. AI writing tests that rely on implementation details), the extra AI-written tests can just be thrown away once the review process is complete.
JonChesterfield
I love having loads of unit tests that get regenerated whenever they're an inconvenience. There's a fantastic sense of progress from the size of the diff put up for review, plus you get to avoid writing boring old fashioned tests. Really cuts down on the time wasted on understanding the change you're making and leaves one a rich field of future work to enjoy.
logicchains
You shouldn't need to write unit tests to understand the change you're making if you wrote a sufficiently detailed specification beforehand. Now, writing a sufficiently detailed spec is itself an art and a skill that takes practice, but ultimately when mastered it's much more efficient than writing a bunch of boilerplate tests that a machine's now perfectly capable of generating by itself.
nemomarx
Don't you have to review the tests to make sure they really meet the spec / test all the cases of the spec anyway? It feels a little fragile to have less oversight there compared to being able to talk to whoever wrote the test cases out or being that person yourself
dakshgupta
Programming today still has "cruft", unit tests being an example. The platonic ideal is to have AI reduce the cruft so engineers can focus on the creativity and problem solving. In practice, AI does end up taking over the creative bits as people prompt at higher levels of abstraction.
When we first started OpenHands (fka OpenDevin) [1], AI-generated PRs would get opened with OpenHands as the PR creator/owner. This created two serious problems:
* First, the person who triggered the AI could approve and merge the PR. No second set of human eyes needed. Essentially bypassed the code review process
* Second, the PR had no clear owner. Many of them would just languish with no one advocating for them to get merged. Worse, if one did get merged and caused problems, there was no one you could hold responsible.
We quickly switched strategies--every PR is owned by a human being. You can still see which _commits_ were done by OpenHands, but your face is on the PR, so you're responsible for it.
[1] https://github.com/All-Hands-AI/OpenHands