Agentic Coding Recommendations

196 comments

·June 12, 2025

darkxanthos

I stumbled into Agentic Coding in VS Code Nightlys with co-pilot using Claude Sonnet 4 and I've been silly productive. Even when half my day is meetings, you wouldn't be able to tell from my git history.

My thinking now is removed from the gory details and is a step or two up. How can I validate the changes are working? Can I understand this code? How should it be structured so I can better understand it? Is there more we can add to the AI conventions markdown in the repo to guide the Agent to make fewer mistaken assumptions?

Last night I had a file with 38 mypy errors. I turned it over to the agent and went and had a conversation with my wife for 15 minutes. I came back, it summarized the changes it made and why, I debated one of the changes with it but ultimately decided it was right.

Mypy passed. Good to go.

I'm currently trying to get my team to really understand the power here. There's a lot of skeptics and the AI still isn't perfect and people who are against the AI era will latch onto that as validation but it's exactly opposite the correct reaction. It's really validation because as a friend of mine says

"Today is the worst day you will have with this technology for the rest of your life."

namaria

> "Today is the worst day you will have with this technology for the rest of your life."

Why do we trust corporations to keep making things better all of a sudden?

The most jarring effect of this hype cycle is that all appear to refers to some imaginary set of corporate entities.

GardenLetter27

I trust it more with Rust than Python tbh, because with Python you need to make sure it runs every code path as the static analysis isn't as good as clippy + rust-analyzer.

diggan

I agree, had more luck with various models writing Rust than Python, but only in the case where they have tools available so one way or another it can run `cargo check` and see the nice errors, otherwise it's pretty equal between the two.

I think the excellent error messages in Rust also help as much humans as it does LLMs, but some of the weaker models get misdirected by some of the "helpful" tips, like some error message suggest "Why don't you try .clone here?" when the actual way to address the issue was something else.

redman25

That's true typed languages seem to handle the slop better. One thing I've noticed specifically with rust is that agents tend to overcomplicate things though. They tend to start digging into the gnarlier bits of the language much quicker than they probably need to.

ajdidbdbsgs

> Last night I had a file with 38 mypy errors

Fixing type checker errors should be one the least time consuming things you do. This was previously consuming a lot of your time?

A lot of the AI discourse would be more effective if we could all see the actual work one another is doing with it (similar to the cloudflare post).

diggan

> AI discourse would be more effective if we could all see the actual work one another is doing with it

Yes, this is a frequent problem both here and everywhere else. The discussions need to include things like exact model version, inference parameters, what system prompt you used, what user prompt, what code you gave it, what exactly it replied and so much more details, as currently almost every comment is "Well, I used Sonnet last week and it worked great" without any details. Not to mention discussions around local models missing basic stuff like what quantization (if any) and what hardware you're running it on. People just write out "Wow fast model" or stuff like that, and call it a day.

Although I understand why, every comment be huge if everyone always add sufficient context. I don't know the solution to this, but it does frustrate me.

square_usual

There's many examples of exactly what you're asking for, such as Kenton Varda's Cloudlfare oauth provider [1] and Simon Willison's tools [2]. I see a new blog post like this with detailed explanations of what they did pretty frequently, like Steve Klabnik's recent post [3], which while it isn't as detailed has a lot of very concrete facts. There's even more blog posts from prominent devs like antirez who talk about other things they're doing with AI like rubber ducking [4], if you're curious about how some people who say "I used Sonnet last week and it was great" are working, because not everyone uses it to write code - I personally don't because I care a lot about code style.

[1]: https://github.com/cloudflare/workers-oauth-provider/

[2]: https://tools.simonwillison.net/

[3]: https://steveklabnik.com/writing/a-tale-of-two-claudes/

[4]: https://antirez.com/news/153

BeetleB

> The discussions need to include things like exact model version, inference parameters, what system prompt you used, what user prompt, what code you gave it, what exactly it replied and so much more details, as currently almost every comment is "Well, I used Sonnet last week and it worked great" without any details...Not to mention discussions around local models missing basic stuff like what quantization (if any) and what hardware you're running it on.

While I agree with "more details", the amount of details you're asking for is ... ridiculous. This is a HN comment, not a detailed study.

SparkyMcUnicorn

I feel like that would get tiresome to write, read, and sort through. I don't like everyone's workflow, but if I notice someone making a claim that indicates they might be doing something better than me, then I'm interested.

Maybe keeping your HN profile/gist/repo/webpage up to date would be better.

dimal

I don’t know about fixing python types, but fixing typescript types can be very time consuming. A LOT of programming work is like this —- not solving anything interesting or difficult, but just time-consuming drudgery.

These tools have turned out to be great at this stuff. I don’t think I’ve turned over any interesting problems to an LLM and had it go well, but by using them to take care of drudgery, I have a lot more time to think about the interesting problems.

I would suggest that instead of asking people to post their work, try it out on whatever bullshit tasks you’ve been avoiding. And I specifically mean “tasks”. Stuff where the problem has already been solved a thousand times before.

andnand

Whats your workflow? Ive been playing with Claude Code for personal use. Usually new projects for experimentation. We have Copilot licenses through work so I've been playing around with VS Code agent mode for the last week. Usually using 3.5, 3.7 Sonnet or 04-mini. This is in a large Go project. Its been abysmal at everything other than tests. I've been trying to figure out if I'm just using the tooling wrong but I feel like I've tried all the "best practices" currently. Contexts, switching models for planning and coding, rules, better prompting. Nothings worked so far.

namaria

I really don't get it. I've tested some agents and they can generate boilerplate. It looks quite impressive if you look at the logs, actually seems like an autonomous intelligent agent.

But I can run commands on my local linux box that generate boilerplate in seconds. Why do I need to subscribe to access gpu farms for that? Then the agent gets stuck at some simple bug and goes back and forth saying "yes, I figured out and solved it now" and it keeps changing between two broken states.

The rabid prose, the Fly.io post deriding detractors... To me it seems same hype as usual. Lots of words about it, the first few steps look super impressive, then it gets stuck banging against a wall. If almost all that is said is prognostication and preaching, and we haven't seen teams and organizations racing ahead on top of this new engine of growth... maybe it can't actually carry loads outside of the demo track?

It can be useful. Does it merit 100 billion dollar outlays and datacenter-cum-nuclear-powerplant projects? I hardly think so.

SparkyMcUnicorn

Switch to using Sonnet 4 (it's available in VS Code Insiders for me at least). I'm not 100% sure but a Github org admin and/or you might need to enable this model in the Github web interface.

Write good base instructions for your agent[0][1] and keep them up to date. Have your agent help you write and critique it.

Start tasks by planning with your agent (e.g. "do not write any code."), and have your agent propose 2-3 ways to implement what you want. Jumping straight into something with a big prompt is hit or miss, especially with increased task complexity. Planning also gives your agent a chance to read and understand the context/files/code involved.

Apologies if I'm giving you info you're already aware of.

[0] https://code.visualstudio.com/docs/copilot/copilot-customiza...

[1] Claude Code `/init`

andnand

This is exactly what I was looking for. Thanks! Im trying to give these tools a fair shot before I judge them. Ive had success with detailed prompts and letting the agent jump straight in when working on small/new projects. Ill give more planning prompts a shot.

Do you change models between planning and implementation? I've seen that recommended but it's been hard to judge if that's made a difference.

polskibus

My experiments with copilot and Claude desktop via mcp on the same codebase suggest that copilot is trimming the context much more than desktop. Using the same model the outputs are just less informed.

8note

make sure it writes a requirements and design doc for the change its gonna make, and review those. and, ask it to ask you questions about where there's ambiguity, and to record those responses.

when it has a work plan, track the workplan as a checklist that it fills out as it works.

you can also atart your conversations by asking it to summarize the code base

km144

> Is there more we can add to the AI conventions markdown in the repo to guide the Agent to make fewer mistaken assumptions?

Forgive my ignorance, but is this just a file you're adding to the context of every agent turn or this a formal convention in the VS code copilot agent? And I'm curious if there's any resources you used to determine the structure of that document or if it was just a refinement over time based on mistakes the AI was repeating?

jnwatson

I just finished writing one. It is essentially the onboarding doc for your project.

It is the same stuff you'd tell a new developer on your team: here are the design docs, here are the tools, the code, and this is how you build and test, and here are the parts you might get hung up on.

In hindsight, it is the doc I should have already written.

apwell23

> you wouldn't be able to tell from my git history.

I can easily tell from git history which commits were heavily AI generated

fvdessen

I find it excellent news that all the techniques that make agentic coding more efficient also make human coding more efficient. There was a worry that code would become big mud balls that only AI understand, but it looks like the opposite. Clear code is important for AI productivity, so it now matters even more, because the difference of productivity is immediately and objectively measurable. Before AIs what code was well factored or not was largely a matter of opinion. Now you can say; look how better Claude works on codebase A vs codebase B, and present your case with numbers.

diggan

> There was a worry that code would become big mud balls

That's always been a worry with programming (see basically all Rich Hickey talks), and is still a problem since people prefer "moving fast today" instead of "not having 10 tons of technical debt tomorrow"

LLMs makes it even easier for people to spend the entire day producing boilerplate without stopping for a second to rethink why they are producing so much boilerplate. If the pain goes away, why fix it?

jnwatson

Literally less than an hour ago, I reviewed a bunch of LLM-generated boilerplate. I then told the agent to show me a plan to refactor it. I suggested some tweaks, and then it implemented the plan and then tested that it didn't break anything.

It isn't much different than dealing with an extremely precocious junior engineer.

Given how easy it is to refactor now, it certainly makes economic sense to delay it.

diggan

But I'm guessing you're doing those refactors because you know they're ultimately worth it, because you have experience programming since before LLMs?

Like I know boilerplate and messy code sucks because I've had to work with it, without LLMs, and I know how much it sucks. I think you do too, but I think we know that, because we had to fight with it in the past.

dimal

The type of person that would do that would have done the same thing without LLMs. LLMs don’t change anything except now they can just create their big ball of mud faster.

The pain of shitty code doesn’t go away. They can ship your crappy MVP faster, but technical debt doesn’t magically go away.

This is an awesome opportunity for those people to start learning how to do software design instead of just “programming”. People that don’t are going to be left behind.

v5v3

"There was a worry that code would become big mud balls that only AI understand, but it looks like the opposite."

For now...

soulofmischief

As long as interfaces are well defined, comprehensive tests are written, memory is safely handled and time complexity is analyzable, who cares what the rest of the code looks like.

I understand programming for the sake of programming, chasing purity and really digging into the creative aspects of coding. But I get that same kick out of writing perfect interfaces, knowing that the messier the code underneath is, the more my beautiful interface gets to shine. But transformers are offering us a way to build faster, to create more, and to take on bigger complexity while learning deeply about new domains as we go. I think the more we lean into that, we might enter a software golden age where the potential for creativity and impact can enter a whole new level.

bluefirebrand

> As long as interfaces are well defined, comprehensive tests are written, memory is safely handled and time complexity is analyzable, who cares what the rest of the code looks like

The thing is, code that does all of the things you listed here is good looking code almost by definition

If AI was anywhere near capable of producing this quality then it would be so thrilling, wouldn't it?

But it's not. The consensus seems to be pretty universal that AI code is Junior to Intermediate quality at best, the majority of the time

That generally isn't code that satisfies the list of quality criteria you mentioned

iDont17

Yeh, exactly. Code doesn’t matter. Correct and stable electrical states matter.

Energy based models and machines that boot strap from models, organize their state to a prompt are on their way. The analog hole for coders is closing.

Most software out there is the layers of made up tools and such to manage and deploy software. We’ll save a lot of cycles pruning it all for generic patterns.

5-10 more years it’s all hardware again. Then no longer need to program a computer like it’s 1970.

ath3nd

> As long as interfaces are well defined, comprehensive tests are written, memory is safely handled and time complexity is analyzable, who cares what the rest of the code looks like.

Software engineers and anyone who'd get hired to fix the mess that the LLM created. Also, ironically, other LLMs would probably work better on...not messy code.

> But transformers are offering us a way to build faster, to create more, and to take on bigger complexity

Wow. You sound as if building faster or creating more is synonymous with quality or utility. Or as if LLMs allow us to take on a bigger level of complexity (this is where they notoriously crumble).

> we might enter a software golden age where the potential for creativity

I haven't heard of a single (good) software engineer whose creativity was stifled by their inability to code something. Is an LLM generating a whole book in Hemingway style considered creative, or a poem? Or a program/app?

physicles

I was struck by this too. Good error messages, fast tools, stable ecosystems, simple code without magic, straight SQL… it’s what I always want. Maybe agents will be what raises the bar for dev experience, simply because they work so quickly that every slowdown matters.

yuri91

So using agents forces (or at least nudges) you to use go and tailwind, because they are simple enough (and abundant in the training data) for the AI to use correctly.

Does this mean that eventually in a world where we all use this stuff, no new language/framework/library will ever be able to emerge?

Competing with the existing alternatives will be too hard. You won't even be able to ask real humans for help on platforms like StackOverflow because they will be dead soon.

furyofantares

> Does this mean that eventually in a world where we all use this stuff, no new language/framework/library will ever be able to emerge?

I highly doubt it. These things excel at translation.

Even without training data, if you have an idiosyncratic-but-straightforward API or framework, they pick it up no problem just looking at the codebase. I know this from experience with my own idiosyncratic C# framework that no training data has ever seen, that the LLM is excellent at writing code against.

I think something like Rust lifetimes would have a harder time getting off the ground in a world where everyone expects LLM coding to work off the bat. But something like Go would have an easy time.

Even with the Rust example though, maybe the developers of something that new would have to take LLMs into consideration, in design choices, tooling choices, or documentation choices, and it would be fine.

chuckadams

> So using agents forces (or at least nudges) you to use go and tailwind

Not even close, and the article betrays the author's biases more than anything else. The fact that their Claude Code (with Sonnet) setup has issues with the `cargo test` cli for instance is hardly a categorical issue with AIs or cargo, let alone rust in general. Junie can't seem to use its built-in test runner tool on PHP tests either, that doesn't mean AI has a problem with PHP. I just wrote a `bin/test-php` script for it to use instead, and it figures out it has to use that (telling it so in the guidelines helps, but it still keeps trying to use its built-in tool first)

As for SO, my AI assistant doesn't close my questions as duplicates. I appreciate what SO is trying to do in terms of curation, but the approach to it has driven people away in droves.

rolisz

I tried Junie in PyCharm and it had big problems with running tests or even using the virtual environment set up in PyCharm for that project.

You'd expect more from the company that is developing both the IDE and the AI agent...

chuckadams

JB's product strategy is baffling. The AI assistant is way more featureful, but it's a lousy agent. Junie is pretty much only good as an agent, but it's hardwired to one model, doesn't support MCP, but does have a whole lot of internal tools ... which it can't seem to use reliably. They really need to work on having just one good AI product that does it all.

I really liked Augment, except for its piggish UI. Then they revealed the price tag, and back to Junie I went.

PeterStuer

A traditional digital stack's lifecycle is:

1. The previous gen has become bloated and complex because it widened it's scope to cover every possible miche scenario and got infiltrated by 'expert' language and framework specialists that went on an atrotecture binge.

2. As a result a new stack is born, much simpler, back to basics, than the poorly aged encumbant. It doesn't cover every niche, but it does a few newly popular things realy easy and well, and rises on the coattails of this new thing as the default envoronment for it.

3. Over time the new stack ages just as poorly as the old stack for all the same reasons. So the cycle repeats.

I do not see this changing with ai-assisted coding, as context enrichment is getting better allowing a full stack specification in post training.

bluefirebrand

> It doesn't cover every niche, but it does a few newly popular things realy easy and well, and rises on the coattails of this new thing as the default envoronment for it

How will it ever rise on the coattails of anything if it isn't in the AI training data so no one is ever incentivized to use it to begin with?

jacobr1

AI legible documentation. If you optimize for a "1-pager" doc you can add to the context of an LLM and that is all it needs to know to use your package or framework ... people will use it if has some kind non-technical advantage. deepwiki.com is sorta an attempt to automate doing something like this.

bluehatbrit

Just yesterday I gave Claude (via Zed) a project brief and a fresh elixir phoenix project. It had 0 problems. It did opt for tailwind for the css, but phoenix already sets it up when using `mix phx.new` so that's probably why.

I don't buy that it pushes you into using Go at all. If anything I'd say they push you towards Python a lot of the time when asking it random questions with no additional context.

The elixir community is probably only a fraction of the size of Go or Python, but I've never had any issues with getting it to use it.

uncircle

> Does this mean that eventually in a world where we all use this stuff, no new language/framework/library will ever be able to emerge?

That's a very good question.

Rephrased: as good training data will diminish exponentially with the Internet being inundated by LLM regurgitations, will "AI savvy" coders prefer old, boring languages and tech because there's more low-radiation training data from the pre-LLM era?

The most popular language/framework combination in early 2020s is JavaScript/React. It'll be the new COBOL, but you won't need an expensive consultant to maintain in the 2100s because LLMs can do it for you.

Corollary: to escape the AI craze, let's keep inventing new languages. Lisps with pervasive macro usage and custom DSLs will be safe until actual AGIs that can macroexpand better than you.

NitpickLawyer

> Rephrased: as good training data will diminish exponentially with the Internet being inundated by LLM regurgitations

I don't think the premise is accurate in this specific case.

First, if anything, training data for newer libs can only increase. Presumably code reaches github in a "at least it compiles" state. So you have lots of people fight the AIs and push code that at least compiles. You can then filter for the newer libs and train on that.

Second, pre-training is already mostly solved. The pudding seems to be now in post-training. And for coding a lot of post-training is done with RL / other unsupervised techniques. You get enough signals from using generate -> check loops that you can do that reliably.

The idea that "we're running out of data" is way too overblown IMO, especially considering the last ~6mo-1y advances we've seen so far. Keep in mind that the better your "generation" pipeline becomes, the better will later models be. And the current "agentic" loop based systems are getting pretty darn good.

bluefirebrand

> First, if anything, training data for newer libs can only increase.

How?

Presumably in the "every coder is using AI assistants" future, it will be an incredible amount of friction to get people to adopt languages that AI assistants don't know anything about

So how does the training data for a new language get made, if no programmers are using the language, because the AI tools that all programmers rely on aren't trained on the language?

The snake eating its own tail

Tomte

I’m wondering whether we may see programming languages that are either unreadable to humans or at least designed towards use by LLMs.

energy123

Yes, and an efficient tokenizer designed only for that language. As the ratio of synthetic data to human data grows this will become more plausible.

temp0826

LLM as a frontend to LLVM IR maybe.

Lapel2742

> Does this mean that eventually in a world where we all use this stuff, no new language/framework/library will ever be able to emerge?

If you truly believe in the potential of agentic AI, then the logical conclusion is that programming languages will become the assembly languages of the 21st century. This may or may not become the unfortunate reality.

fhd2

I'd bet money that in less than six months, there'll be some buzz around a "programming language for agents".

Whether that's going to make sense, I have some doubts, but as you say: For an LLM optimist, it's the logical conclusion. Code wouldn't need to be optimised for humans to read or modify, but for models, and natural language is a bit of an unnecessary layer in that vision.

Personally I'm not an LLM optimist, so I think the popular stack will remain focused on humans. Perhaps tilting a bit more towards readability and less towards typing efficiency, but many existing programming languages, tools and frameworks already optimise for that.

pelagicAustral

My best results have been with Ruby/Rails and either vanilla Bootstrap, or something like Tabler UI, Tailwind seems to be fine as well, but I'm still not a fan of the verbosity.

With a stable enough boilerplate you can come up with outstanding results in a few hours. Truly production ready stuff for small size apps.

cpursley

How are you getting results when Ruby has no type system? That seems like where half the value of LLM coding agents are (dumping in type errors and it solving them).

diggan

Bunch of unit, functional and E2E tests, just like before LLMs :) Haven't tried with Ruby specifically but works well with JavaScript and other dynamic languages so should work fine with Ruby too.

Keats

I've been trying Claude Code with Sonnet 4.0 for a week or so now for Rust code but it feels really underwhelming (and expensive since it's via Bedrock right now). Everytime it's doing something it's missing half despite spending a lot of time planning at the beginning of the session. What am I missing?

dimitri-vs

Same. I have a very efficient workflow with Cursor Edit/Agent mode where it pretty much one-shots every change or feature I ask it to make. Working inside a CLI is painful, are people just letting Claude Code churn for 10-15 minutes and then reviewing the diff? Are people even reviewing the code?

danielbln

This sort of asynchronous flow will become more and more mainstream. chatgpt.com/codex, Google's Jules and to a degree Claude Code (even though that's local) are all following that pattern: phrase a goal, send it off to the agent, review the diff and request changes, rinse and repeat until ready for PR review.

For me this only works for fairly tightly scoped tasks that aren't super complex, but it does work. And I think the days of staring down the IDE will be coming to a close for all but the most complex coding tasks in the future.

apwell23

> Are people even reviewing the code?

No because its boring. Thats why we don't have airplane pilots just watch the machine thats fully on autopilot.

exfalso

Exact same experience. I have no clue what other people are doing. I was hunting for use cases where it could be used and it kept not working. I don't get it.

energy123

Is it only Rust that you've had this experience with or is it a general thing?

Keats

I'm not sure if it's Rust related. It manages to write the Rust code just fine, it's just that it doesn't seem to

- think of everything that is needed for a feature (fixable via planning at the beginning)

- actually follow that plan correctly

I just tried with a slightly big refactor to see if some changes would improve performance. I had it write the full plan and baseline benchmarks to disk, then let it go in yolo mode. When it was done it only implemented something like half of the planning phases and was saying the results look good despite all the benchmarks having regressed.

exfalso

Also tried it with Python. The autocomplete there was ok-ish(although to me the "wait for it -> review suggested code" cycle is a bit too slow), but getting it to code even standalone well-defined functions was a clear failure. I spent more time trying to fix prompts than it took to write the functions in the first place.

andyferris

I had been trying with Rust, but after this article I think I might change tack and attempt a project in Go...

bananapub

it shouldn't be expensive - you can pay for Pro ($20/month) or Max ($100 or $200/month) to get what would cost >> $1000/month in API costs.

andyferris

Can you use Claude Code with Pro? I was trying to figure this out and I thought you couldn't (unless you enter an API key and pay for tokens).

rolisz

Yes, since last week or so.

bananapub

https://news.ycombinator.com/item?id=44179604

Keats

Yep i know but I have free AWS credits sooo

gk1

Nice to see container use mentioned (https://github.com/dagger/container-use). I work with the team that made it (a lot of ex-Docker folks including the creator of Docker.)

Running agents in parallel will be a big deal as soon as we learn (or the agents learn) how to reliably work with just one.

Even before then, if you're trying to get work done while the agent is doing its own thing or you find yourself watching over the agent's "shoulder" out of fear it'll change something you didn't ask it to change, then it's useful to run it in a containerized dev environment.

Container use is definitely early but moving quickly, and probably improved even since this post was published. We're currently focused on stability, reducing git confusion, better human<>agent interaction, and environment control.

linguistbreaker

My take on choice of language:

1) Java has the largest, oldest and most explicit data set for the LLM to reference, so it's likely to be the most thorough, if not the most correct.

2) Go with the language YOU know best because you'll be able to spot when the LLM is incorrect, flawed in its 'reasoning', hallucinating etc.

diggan

> Java has the largest, oldest and most explicit data set for the LLM to reference

That seems to be a recommendation for coding with LLMs that don't have access to tools to look up APIs, docs and 3rd party source-code, rather than something you'd chose for "Agentic Coding".

Once the tooling can automatically figure out what is right, what language you use matters less, as long as source code ends available somewhere the agent can read it when needed.

Agree much with your 2nd point though, all outputs still require careful review and what better language to use than one you know inside-out?

tough

I have been learning Go, Swift, and Rust with the help of LLM/ Agents.

basically the terser/safer syntax and runtime compilation errors are a great tight feedback loop for the agent to fix stuff by itself.

null

[deleted]

fibers

Why is this? is there just a insanely large codebase of open source projects in Java (the only thing i can think of is the entire Apache suite)? Or is it because the docs are that expressive and detailed for a given OSS library?

linguistbreaker

Java's API docs are very complete and explicit.

Certain points about the language, as well as certain long-existing open source projects have been discussed ad-nauseum online. This all adds to the body of knowledge.

Macha

I always assumed the LLMs had the most python code to reference, as they seem to default to Python most often if you don't specify

haiku2077

> Context system: Go provides a capable copy-on-write data bag that explicitly flows through the code execution path, similar to contextvars in Python or .NET's execution context. Its explicit nature greatly simplifies things for AI agents. If the agent needs to pass stuff to any call site, it knows how to do it.

I believe this is considered a bad practice: the general attitude is that the only sane use case for values in context.Context is tracing data, and all other data should he explicitly passed via arguments.

physicles

Agreed on all points.

The only place I’ve encountered this pattern is in chromedp, the go wrapper for the chrome headless browser driver. Its API… isn’t good.

Most methods you use are package globals that take a context.Context as a first parameter. But you have to understand that this context is a _special_ one: you can’t pass any old context like context.Background(), you must pass a context you got from one of the factory methods.

If you want to specify a timeout, you use context.WithTimeout. Clever I guess, but that’s the only setting that works like that.

It’s essentially a void*.

the_mitsuhiko

I'm really not an expect in Go, but the data that I'm passing at the moment via context is the type of data which is commonly placed there by libraries I use: database connections, config, rate limiters, cache backends etc. Does not seem particularly bad to me at least.

haiku2077

If you use context.Context for this you give up a lot of type safety and generally make your data passing opaque.

It's totally fine to put multiple values into a different data bag type that has explicit, typed fields. For example, the Echo framework has its own strongly typed and extensible Context interface for request scoped data: https://pkg.go.dev/github.com/labstack/echo#Context

the_mitsuhiko

> If you use context.Context for this you give up a lot of type safety and generally make your data passing opaque.

The data passing maybe, not sure how you lose type safety. The value comes from the context with the right type just fine. The stuff that I'm attaching to the context are effectively globals just that this way you can enable proper isolation in tests and else.

From my limited experience with echo, the context there is not at all the same thing.

EdwardDiego

"Write the simplest code you can, so the dumb AI can understand it" isn't the massive sell I was expecting.

I wonder how that interacts with his previous post?

https://lucumr.pocoo.org/2025/2/20/ugly-code/

horsawlarway

Honestly, I find this approach to be useful pretty much anytime you're working with other people as well.

There are absolutely times to be extremely focused and clever with your code, but they should be rare and tightly tied to your business value.

Most code should be "blindingly obvious" whenever possible.

The limit on developers isn't "characters I can type per minute" it's "concepts I can hold in my head."

The more of those there are... The slower you will move.

Don't create more interfaces over the existing ones, don't abstract early, feel free to duplicate and copy liberally, glue stuff together obviously (even if it's more code, or feels ugly), declare the relevant stuff locally, stick with simple patterns in the docs, don't be clever.

You will write better code. Code shouldn't be pretty, it should be obvious. It should feel boring, because the hard part should be making the product not the codebase.

benob

> This is not an advertisment for Claude Code. It's just the agent I use at the moment. What else is there? Alternatives that are similar in their user experiences are OpenCode, goose, Codex and many others. There is also Devin and Cursor's background agents but they work a bit different in that they run in the cloud.

What do you recommand to get a Claude-code-like experience in the open-source + local llm ecosystem?

the_mitsuhiko

> What do you recommand to get a Claude-code-like experience in the open-source + local llm ecosystem?

There is nothing at the moment that I would recommend. However I'm quite convinced that we will see this soon. First of all I quite like where SST's OpenCode is going. The upcoming UX looks really good. Secondly because having that in place, will make it quite easy to put local models in when they get better. The issue really is that there are just not enough good models for tool usage yet. Sonnet is so shockingly good because it was trained for excellent tool usage. Even Gemini does not come close yet.

This is all just a question of time though.

hucker

Have you tried aider, and if so, how is it lacking compared to Claude Code in your opinion?

the_mitsuhiko

I only tried aider with hosted models and it's too expensive compared to Claude Code so I did not give it a real proper try.

CuriouslyC

Aider is almost there, in fact it's intentionally "not" there. You can set it up to do things like run test/static analysis automatically and fix errors, and work with it to get a to-do list set up so the entire project is spec'd out, then just keep prompting it with "continue...". It has a hard coded reflection limit of 3 iterations right now, but that can also be hacked to whatever you want. The only thing missing for full agentic behavior is built in self prompting behavior.

diggan

> The only thing missing for full agentic behavior is built in self prompting behavior.

Correct me if I'm wrong, but Aider still doesn't do proper tool calling? Last time I tried it, they did it the "old school" way of parsing out unix shell commands from the output text and ran it once the response finished streaming, instead of the sort of tool call/response stuff we have today.

sandinmyjoints

I think this is still the case. There are some open issues around this. I am surprised they have not moved forward more. I find Aider hugely useful, but would like the opportunity to try out MCP with it.

Karrot_Kream

There's an open PR for MCP integration (actually 2 PRs but one has more community consensus around it) with a Priority label on it but it hasn't been merged yet. Hopefully soon.

CuriouslyC

Yeah, tool/mcp integration isn't great with Aider out of the box.

mickeyp

Shameful plug: my upcoming app perhaps?

Single-file download, fuss-free and install-less that runs on mac, windows and linux (+ docker of course.) It can run any model that talks to openai (which is nearly all of them), so it'll work with the big guys' models and of course other ones like ones you run privately or on localhost.

Unlike Claude Code, which is very good, this one runs in your browser with a local app server to do the heavy lifting. A console app could be written to use this self-same server, too, of course (but that's not priority #1) but you do get a lot of nice benefits that you get for free from a browser.

One other advantage, vis-a-vis Armin's blog post, is that this one can "peek" into terminals that you _explicitly_ start through the service.

It's presently in closed alpha, but I want to open it up to more people to use. If you're interested, you and anyone else who is interested can ping me by email -- see my profile.

elpocko

>run any model that talks to openai (which is nearly all of them)

What does that mean? I've never seen any locally run model talk to OpenAI, how and why would they? Do you mean running an inference server that provides an OpenAI-compatible API?

mickeyp

Sorry, to clarify: OpenAI has an specification for their API endpoints that most vendors are compatible with or have adopted wholesale.

So, if your model inference server understands the REST API spec that OpenAI created way back, you can use a huge range of libraries that in theory only "work" with OpenAI.

gk1

I see a new alternative (or attempt at one) come out every few days, so it shouldn't be long before we have "the one" alternative.

https://www.app.build/ was just launched by the Neon -- err, Databricks -- team and looks promising.

saint_yossarian

The Neovim plugin CodeCompanion is currently moving into a more agentic direction, it already supports an auto-submit loop with builtin tools and MCP integration.

Yes it's not a standalone CLI tool, but IMHO I'd rather have a full editor available at all times, especially one that's so hackable and lightweight.

yroc92

I’m also interested to hear ideas for this.

tonnydourado

Gotta say, 100/200 bucks monthly feels prohibitively expensive for even trying out something, particularly something as unproven as code-writing AI, even more particularly when other personal experiences with AI have been at the very least underwhelming, and extra particularly when the whole endeavor is so wrapped up in ethical concerns.

BeetleB

Try Aider with API usage. Learn how to control context size (/clear, /add, /drop). Limit context to 25K. Use whatever model you want (Sonnet 4 or Gemini 2.5 Pro).

For simple scripts, it often costs me under $1 to build. I'm working on a bigger tool these days, and I've done lots of prompts, a decent amount of code, over 100 tests, and my running total is right now under $6.

I'd suggest learn the basics of using AI to code using Aider, and then consider whether you want to try Claude Code (which is likely more powerful, but also more expensive unless you use it all the time).

Karrot_Kream

Yeah I've been using Aider mostly and just started using Codex, very similar to Claude Code, yesterday. Aider is more manual and requires more guiding but it's also an order of magnitude cheaper.

The monkey brain part of me that really doesn't trust an LLM and trusts my decades of hard-won programming experience also prefers using Aider because the usage flow generally goes:

1. Iterate with Aider on a plan

2. Tell Aider to write code

3. Review the code

4. Continue hacking myself until I want to delegate something to an LLM again.

5. Head back to Step 1.

Codex automates this flow significantly but it's also a lot more expensive. Just the little bits of guiding I offer an LLM through Aider can make the whole process a lot cheaper.

It's unclear to me whether the full agentic Claude Code/Codex style approach will win or whether Aider's more carefully guided approach will win in the marketplace of ideas, but as a pretty experienced engineer Aider seems to be the sweet spot between cost, impact, and authorial input.

BeetleB

Yes, my concerns as well about the more powerful tools (which I admit I haven't tried).

Even with Aider, I feel it goes too fast and I sometimes actively slow it down (by giving it only very incremental changes rather than get it to do a larger chunk). I think I'd be totally lost with a more powerful agentic tool.

jononor

One month at 20 USD seems like it should be plenty to try it out on a small project or two to decide wether it is worth trying 100 bucks/month? Or one can just wait a couple of months as people report their learnings.

dukeyukey

You can use Claude Code either pay-as-you-go with an API key, or subscribe to the $20 Pro subscription.

jpadamspdx

https://github.com/dagger/container-use (cu) is improving daily. Happy to help get it working if you're hitting anything (we're all in dagger.io discord). Last night I tried it with Amazon Q Developer CLI chat (with claude-3.7-sonnet), which I hadn't touched before (will PR how-to to the README today). MCP integration just worked for me. Figured out where to put the agent rules for Q and how to restrict to just the tools from cu. I kicked off three instances of Q to modify my flask app project with the same prompt in parallel (don't step on the local source) and got three variants to review in short order. I merged the one I liked into the repo and tossed the rest.

linsomniac

I had a recent discussion with another member of the Python community (OA is written by a big name in Python).

He started off saying "learning to code with AI is like learning to cook by ordering off the menu". I know he meant "an AI being the way you learn how to code", but there's another meaning that I've been thinking a lot about because my 16yo son is really into coding and I'm trying to come up with how I can help him be successful in the world at the horizon where he starts doing it professionally.

In that way, "learning how to work together with an AI to code" is a really, really interesting question. Because the world is going to look VERY different in 2-6 years.

The thread in question: https://bsky.app/profile/alsweigart.bsky.social/post/3lr6guv...

bgwalter

I think this discussion boxes new students into the mediocre category right from the start.

Do we really want to tell Fabrice Bellard that he isn't productive enough?

If you want to train people to become fungible factory workers on the other hand, train them to work on the conveyor belt.

linsomniac

I get your point, but I'm envisioning a different endpoint.

Let's take your factory example: Factories are just a fact of life right now, almost nobody is producing bespoke cars or phones or clothing. So given that my son is basically 100% likely to be working with an automation line, how do I get him on the track to being a machine operator or a millwright rather than doing conveyor belt work?