How to build a coding agent
66 comments
·August 24, 2025ofirpress
We (the Princeton SWE-bench team) built an agent in ~100 lines of code that does pretty well on SWE-bench, you might enjoy it too: https://github.com/SWE-agent/mini-swe-agent
simonw
OK that really is pretty simple, thanks for sharing.
The whole thing runs on these prompts: https://github.com/SWE-agent/mini-swe-agent/blob/7e125e5dd49...
Your task: {{task}}. Please reply
with a single shell command in
triple backticks.
To finish, the first line of the
output of the shell command must be
'COMPLETE_TASK_AND_SUBMIT_FINAL_OUTPUT'.
sireat
Pretty sure you also need about 120 lines of prompting from default.yaml
https://github.com/SWE-agent/mini-swe-agent/blob/7e125e5dd49...
nivertech
system_template: str = "You are a helpful assistant that can do anything."
anything? Sounds like an AI Safety issue ;)pinoy420
[dead]
meander_water
> 1. Analyze the codebase by finding and reading relevant files 2. Create a script to reproduce the issue 3. Edit the source code to resolve the issue 4. Verify your fix works by running your script again 5. Test edge cases to ensure your fix is robust
This prompt snippet from your instance template is quite useful. I use something like this for getting out of debug loops:
> Analyse the codebase and brainstorm a list of potential root causes for the issue, and rank them from most likely to least likely.
Then create scripts or add debug logging to confirm whether your hypothesis is correct. Rule out root causes from most likely to least by executing your scripts and observing the output in order of likelihood.
faangguyindia
when a problem is entirely self contained in a file, it's very easy to edit it with LLM.
that's not the case with a codebase, where things are littered around in tune with specific model of organisation the developer had in mind.
fmbb
Lumpers win again!
koakuma-chan
> in tune with specific model of organisation
You wish
BenderV
Nice but sad to see lack of tools. Most your code is about the agent framework instead of specific to SWE.
I've built a SWE agent too (for fun), check it out => https://github.com/myriade-ai/autocode
diminish
> sad to see lack of tools.
Lack of tools in mini-swe-agent is a feature. You can run it with any LLM no matter how big or small.
BenderV
I'm trying to understand what does it got to do with LLM size? Imho, right tools allow small models to perform better than undirected tool like bash to do everything. But I understand that this code is to show people how function calling is just a template for LLM.
ghuntley
cheers i'll add it in.
Teever
What sort of results have you had from running it on its own codebase?
johannesboyne
A very similar "how to guide" can be found here https://ampcode.com/how-to-build-an-agent written by Thorsten Ball. In general Amp is quite interesting - obviously no hidden gem anymore ;-) but great to see more tooling around agentic coding being published. Also, because similar agentic-approaches will be part of (certain/many?) software suits in the future.
campbellbell
Makes sense, the author says he also works at Amp
manojlds
Ghuntley also works at Amp
akk0
If a picture is usually worth 1000 words, the pictures in this are on a 99.6% discount. What the actual...?
digitcatphd
The problem I have with this is that this style of agent design, providing enormous autonomy, makes sense in coding while keeping an expert human in the loop since it can self-correct via debugging. What would the other use cases of giving an agent this much autonomy be today versus a more structured flow versus something more like LangGraph?
Western0
Instead of writing about how to build an agent, show us one project that this agent has built.
normie3000
Why are any of the tools beyond the bash tool required?
Surely listing files, searching a repo, editing a file can all be achieved with bash?
Or is this what's demonstrated by https://news.ycombinator.com/item?id=45001234?
the_mitsuhiko
Technically speaking, you can get away with just a Bash tool, and I had some success with this. It's actually quite interesting to take away tools from agents and see how creative they are with the use.
One of the reasons why you get better performance if you give them the other tools is that there has been some reinforcement learning on Sonne with all these tools. The model is aware of how these tools work, it is more token-efficient and it is generally much more successful at performing those actions. The Bash tool, for instance, at times gets confused by bashisms, not escaping arguments correctly, not handling whitespace correctly etc.
normie3000
> The model is aware of how these tools work, it is more token-efficient and it is generally much more successful at performing those actions.
Interesting! This didn't seem to be the case in the OP's examples - for instance using a list_files tool and then checking if the json result included README vs bash [ -f README ]
dotancohen
> The Bash tool, for instance, at times gets confused by bashisms, not escaping arguments correctly, not handling whitespace correctly etc.
This was the only informative sentence in the reply. Can you please go on in this manner - it was an important question.zarzavat
Separate tools is simpler than having everything go through bash.
If everything goes through bash then you need some way to separate always safe commands that don't need approval (such as listing files), from all other potentially unsafe commands that require user approval.
If you have listing files as a separate tool then you can also enforce that the agent doesn't list any files outside of the project directory.
normie3000
> you need some way to separate always safe commands that don't need approval (such as listing files), from all other potentially unsafe commands that require user approval.
This is a very strong argument for more specific tools, thanks!
BenderV
Why do humans need a IDE when we could do anything in a shell? Interface give you the informations you need at a given moment and the actions you can take.
normie3000
To me a better analogy would be: if you're a household of 2 who own 3 reliable cars, why would you need a 4th car with smaller cargo & passenger capacities, higher fuel consumption, worse off-road performance and lower top speed?
faangguyindia
>Why are any of the tools beyond the bash tool required?
My best guess is they started out with a limited subset of tools and realised they can just give it bash later.
kissgyorgy
This is explained in 3.2 How to design good tools?
This saves the LLM from having to do multiple low level clicking and typing and keeps it on track. Help the poor model out, will ya!?
normie3000
I'm not sure where this quote is from - it doesn't seem to appear in the linked article.
hobofan
I hate to do meta-commentary (the content is a decent beginner level introduction to the topic!), but this is some of the worst AI-slop-infused presentation I've seen with a blog post in a while.
Why the unnecessary generated AI pictures in between?
Why put everything that could have been a bullet point into it's own individual picture (even if it's not AI generated)? It's very visually distracting, breaks the flow of reading, and it's less accessible as all the picture lack alt-text.
---
I see that it's based on a conference talk, so it's possibly just 1:1 the slides. If that's the case please put it up in it's native conference format, rather than this.
gregrata
Wow. Yeah. That's unreadable - my frustration and annoyance levels got high fast, had to close the page before I went for the power button on my machine :)
bambax
Agreed. It's unreadable.
anonzzzies
what's the best current cli (with a non interactive option) that is on par with Claude code but can work with other llms like ollama, openrouter etc? I tried stuff like aider but it cannot discover files, the open source gemini one but it was terrible; what is a good one that maybe is the same as CC if you plug in Opus?
akdev1l
Haven’t tried many but the LLM cli seems alright to me
Tewboo
Building a coding agent involves defining clear goals, leveraging AI, and iterating based on feedback. Start with a simple task and scale up.
fullstackwife
Exactly my approach to gaining knowledge and learning through building your own(`npx genaicode`). When I was presenting my work on a local meetup I got this exact question: "why u building this instead of just using Cursor". The answer is explained in this article(tl;dr; transformative experience), even though some parts of it are already outdated or will be outdated very soon as the technology is making progress every day.
revskill
Where is the program synthesis? My way of thinking is given primitives as tools, i want the model to construct and return the program to execute.
Of course following nix philosophy is another way.
Very simplistic view on the problem domain IMHO. Yah sure we can add a bunch of functions... ok. But how about snapshotting (or at least work with git), sandboxing both process and network level, prompt engineering, detect when stuck, model switching with parallel solvers for better solutions. These are the kind of things that make coding agents reliable - not function declarations.