Kilo Code: Speedrunning open source coding AI
59 comments
·March 26, 2025adamgordonbell
janpaul123
You're too kind!! The speedrun ethos has already been super fun with this team. :)
I hope that we'll also be able to bring enough skills, strategy, and taste to the space. Time will tell, but we're giving it our best shot!
garyrob
> It won’t be OpenAI or Claude. They have other priorities. The real opportunity is for small teams who move fast, stay close to users, and keep ahead of the pack.
It could be small teams within those companies, however, who have special access to the full power of the platform.
I don't know where things will end up, but I'm leaning towards the big guys dominating coding, because they do coding themselves, and so are automatically extremely sensitive to the issues for that particular task. They can build tools for themselves and share with the world.
It's true that an external team may end up doing it better and be used internally. I just don't think the outcome is predictable at this point.
gtirloni
> Those will get fixed. The question is who gets there first.
Why is being first an advantage? Developers change tools all the time. If OpenAI API is giving better resources, just switch. If Kilo Code or whatever tool is producing fewer bugs, switch.
adamgordonbell
Hence my comment:
>The counterpoint to that would be that all these tools are gonna end up sort of the same and there won't be a way to differentiate.
I mean, I have LLM preferences, but the competition does force a downward pressure on the market. The competition benefits me but not OpenAI.
realharo
>We want to build for the dream of billions of programmers; billions of artists; billions of scientists—using computing as moldable clay.
At that point, why even keep humans in the loop. Just let it exist in the background and generate better ideas than any human would anyway.
observationist
Just let the car go wherever it wants, faster than any human would anyway. Or - just let the fire exist in the background, it'll generate more heat than humans will ever need anyway.
The point isn't to make humans pointless. The point is to empower humans. We need to remain in control, and be the users of the tool, and not a tool for some mindless system.
Intelligence and consciousness are separate things - you can automate a lot of intelligence without having even rudimentary consciousness or self awareness - LLMs currently in operation are at most pseudo-conscious within their test-time contexts, and even then, every pass resets whatever awareness there might be. With millions of tokens context length, that might start to enter into the realm of a thing we should be concerned about, but even then, there's no ongoing persisted state to carry anything between passes aside from the text or image patch tokens or what have you.
What this means, essentially, is that we can augment our human capabilities without usurping the agency of some artificial being - these AIs are not individual moral agents in their own right, and likely will never be unless we specifically build that recursion and persistent state into the models, and incorporate a realtime adaptive self and world construct.
This means that the software is a tool - use the tool to augment your life and be a force multiplier in everything you do. The scope of intelligence augmentation has leapt from spreadsheets to nearly every cognitive domain in the human experience - people proficient with excel were better accountants than people using pen and paper. People using delivery vans are better than people using a horse and wagon. This new technology means that people using AI will be able to do more, faster, and likely better, than people who don't.
With neural lace - whatever form it ends up being - we'll end up with genuine exocortex augmentation. Even without that direct integration, however, the human in the loop is the entire point of this technology. There's a tiny list of things conscious machines might be good for, and all sorts of deep and obvious arguments for not creating a new, self aware, agentic species that's immediately in conflict with and on a trajectory to outcompete humans.
Use the tool of AI to be a force multiplier for everything in your life that AI is capable of handling well. This makes you a benevolent dictator for life in your own life, delegating everything that makes sense, working with it to free up your resources for the things that you decide are the highest priority. Spend more time brainstorming, building relationships, deploying resources, and getting the most out of being human. This is the promise of AI, and why people get excited about it. We're going to have a huge struggle, as humanity, in dealing the empowerment and amplification of everything in our lives. Making sure that we retain agency, that humans are ultimately in charge of our own destiny, is probably the most important principle to adhere to, above all others.
realharo
I was mainly referring to phrases such as "billions of scientists" - the point of science is to solve problems and discover knowledge. If you have an AI good enough to achieve that (billions of scientists), that means it can probably progress without being actively driven by people at all - and probably do a better job at it too.
We can still do things "for fun", but our efforts will be more toys than serious projects (except when it comes to relationships with other people).
xnx
Gemini 2.5 seems to be the current king of AI coding. In addition to being "smart", it has a huge context window. The one-shot examples on Twitter are astounding.
outside2344
Really? I would have said it was Claude-3.7 based on experience.
khaledh
I too thought Sonnet 3.7 was hard to beat. But from my few interactions with Gemini 2.5, it is freakishly good. The level of discourse is somewhere near talking to an experienced staff engineer that is almost always right.
Workaccount2
Claude was until yesterday
Imustaskforhelp
The world of ai moves so fast ...
Seriously though, I get a really great feel from claude 3.7 but let's see google gemini 2.5 , I have tried it but didn't like it's "style" but I just used it for a simple go official language sort example , nothing too fancy. Might need to benchmark it more
janpaul123
JP here! Would love to answer your questions!
We listed a bunch of ideas for larger improvements in the blog: Instant app; Up-to-date docs; Prompt/product-first workflows; Browser IDE; Local/on-prem models; Live collaboration; Parallel-agents; Code variants; Shared context; Open source sharing; MCP marketplace; Integrated CI; Monitoring/production agents; Security agents; Sketching..
What would you like us to build?
arevno
The obvious thing would be LSP interrogation, which would allow the token context to be significantly smaller than entire files. If you have one file open, and you are working on a function that calls out to N other modules, instead of packing the context with N files, you get ONLY the sections of those files the LSP tells you to look at.
janpaul123
Yes! This is high on our list. Context window compression is a big deal, and this is one of the main ways to do it, IMO.
Have you tried any tools that do this particularly well?
amarant
One thing that I think would be cool, and that could perhaps be good starting point, is a TDD agent. How I imagine this working:
User (who is a developer) writes tests, and a description of the desired application. The agent attempts to build the application, compiles the code, runs the tests, and automatically feeds any compiler errors and test failures back to agent so that it can fix it's own mistakes without input of the user.
Based on my experience of current programming agents, I imagine it'll take the agent a couple of attempts to get an application that compiles and passes all the tests. What would be really great to see is an agent (with a companion application probably) that automates all those retries in a good way.
i imagine the hardest parts will be to interpret compiler output, and (this is where things get real tricky) test output, and how to translate that into code changes in the existing code base.
janpaul123
Yeah, this is a great workflow! What's more, agents are particularly good at writing tests, since they're simpler and mostly linear, so they can even help with that part.
As to your point of automating retries, with my last prototype I played a lot with having agents do multiple parallel implementations, and then pick the first one that works, or lets you choose (or even have another agent choose).
Have you tried any tools that have this workflow down, or at least approach it?
amarant
I have not! But I've often been frustrated when an agent gives me code that doesn't compile, and I keep thinking that would be a solvable problem. One computer program should be able to talk to the other
999900000999
This is going to sound a bit odd, but I suggest you detail what your tools do well and what they struggle with. For example I love Haxe, which is a niche programming language primarily for game development.
The vast majority of the time I try to use an llm with it, the code is essentially useless as it will try to invent methods that don't even exist.
For example if you're coding agents are really only good at JavaScript and a little bit of python, tell me that front and center.
janpaul123
Good point! In that sense we're similar to most AI coding agents in that the languages we do well are the languages the mainstream LLMs do well. We might zoom in and add really good support for particular languages though (not decided yet), in which case we'll def mention that front and center!
Have you found any LLMs or coding agents that work well with Haxe? It might be a bit too niche for us (again, not sure yet), but I'd be very curious to see what they do well!
999900000999
This works well, however it literally will need to digest an entire repository. So for example if I feed it a repository for a haxe framework, it'll work much better than something like Chat GPT.
Zondartul
In my unqualified opinion, LLMs would do better at niche languages or even specific versions of mainstream languages, as well as niche frameworks, if they were better at consultig the documentation for the language or framework, for example, the user could give the LLM a link to the docs or an offline copy, and the LLM would prioritise the docs over the pretrained code. Currently this is not feasible because 1. limited context is shared with the actual code, 2. RAG is one-way injection i to the LLM, the LLM usually wouldn't "ask for a specific docs page" even if they probably should.
eutropia
Local Agent, 100%.
If I'm just exploring ideas for fun or scratching my own itch, I have no desire to be thinking about a continuous stream of expenditure happening in the background when I have an apple silicon mac with 64GB of ram fully capable of running an agentic stack with tool calling etc.
Please make it trivial to setup and use a llamafile or similar as the LLM for this.
janpaul123
I agree, this would be good to have soon, especially as good models keep getting smaller, and hardware keeps getting cheaper.
spankalee
Your timeline is indeed crazy fast. Did you recruit the 9 others in your first week? Did you pitch and secure funding in that week too? reply
janpaul123
In roughly the last 2 weeks, yes. It helped that everyone involved also activated their network, so we got a multiplicative effect. Can't speak to funding for now unfortunately.
tristor
To be honest, I have yet to use any GenAI tool that makes me feel like it can replace me just writing code (I write this as an Engineer turned PM, that would really like the promise of GenAI to be true). What I'd actually like to see more than anything is a GenAI "agent" that can act like the /user/ of my software to help me identify gaps in documentation as the software changes and the documentation drifts/becomes stale, and generally help me to explore code paths that are off the happy path but will get hit by real users. I think there's a lot more value in having GenAI help me test/document my work than in trying to do my work, because I will always write higher quality code than GenAI can produce.
janpaul123
Totally agree!
bluelightning2k
I usually support everything but isn't this literally just "we are trying to fork roo code and pay $15 of your tokens so we can show VCs that we have users" - as in people like free money. But that wouldn't be enough of a bribe to justify using the fork over the real project for me at least
janpaul123
Our backers have no interest in fake metrics. ;) It's a good way to quickly get feedback, which is key to our strategy. Totally fine to keep using Roo Code (or Cline) of course!
quikoa
>We don't take any cut, either per token or per top-up. In the future we'll add more LLM providers.
So where does the money come from?
janpaul123
At this point we plan to monetize enterprise features (LDAP login, things like that).
yapyap
> Since then I’ve been thinking a lot about AI agents. They’re the closest I’ve seen to the dream of “programming for all”
Programming is programming for all, you just have to put some effort in. This is alike to saying you wish there was a ‘Spanish for all’ so you invented Google Translate
cpldcpu
Their approach seems very compelling, but I don't understand if/how they are building a differentiated product? The space of code agents is already pretty crowded.
janpaul123
We’ll take all the features people love in other products, and implement them in a coherent package as quickly as we can.
null
rounce
Is this a real open source project or a pretend 'source (maybe-kinda) available' kinda thing where the really useful part is stuffed behind a paywall and the 'open source' part is just to lure you into the walled garden?
jiri
how is it different from roo code or github copilot?
I talked to JP about this project. He's excited in a way that's hard not to catch. His core thesis is simple: coding agents are the future, and the winners will be the ones who can execute.
It won’t be OpenAI or Claude. They have other priorities. The real opportunity is for small teams who move fast, stay close to users, and keep ahead of the pack.
That makes sense. LLMs are already powerful, almost magical at times. But using them as coding agents still takes real work. They can do amazing things, and be frustrating and make a mess. There are rough edges and big gaps.
Those will get fixed. The question is who gets there first.
The counterpoint to that would be that all these tools are gonna end up sort of the same and there won't be a way to differentiate.
Which way will it play out? I'm not really sure.