Skip to content(if available)orjump to list(if available)

The Claude Code Framework Wars

The Claude Code Framework Wars

66 comments

·September 7, 2025

grim_io

I've tried some of those "frameworks" for claude code, but it's difficult to measure any objective improvement.

I tend to lean towards them being snake oil. A lot of process and ritual around using them, but for what?

I don't think the models themselves are a good fit for the way these frameworks are being used. It probably goes against their training.

Now we try to poison the context with lots of (for my actual task at hand) useless information so that the model can conform to my superficial song-and-dance process? This seems backwards.

I would argue that we need less context poisoning with useless information. Give the model the most precise information for the actual work to be done and iterate upon that. The song and dance process should happen outside of the context constrained agent.

bicx

I adopted a couple practices (using dev containers and worktrees) just to make life a little easier. I also built my own shell script “framework” to help manage the worktrees and create project files. However, that took me just a couple days to do on my own (also using CC), and it didn’t lock me into a specific tool.

I do agree that context poisoning is a real thing to watch out for. Coincidentally, I’d noticed MCP endpoint definitions had started taking a substantial block of context for me (~20k tokens), and that’s now something I consider when adopting any MCP.

grim_io

I'm considering removing serena MCP, since cc got better with its own tools.

The new /context cc command is great for visualizing what uses how much of the context.

On the other hand, I'm curious about dagger's container-use MCP. https://container-use.com/agent-integrations

nicwolff

This article doesn't mention "subagents" https://docs.anthropic.com/en/docs/claude-code/sub-agents which makes me wonder when it was written. I'm finding that just delegating "scan the memory bank for information relevant to the current task" and "run the unit and functional tests and report back only the failures or coverage" to subagents does a lot to keep the main agent's context from filling up.

3uler

Many of the frameworks mentioned leverage subagents

bdangubic

> which makes me wonder when it was written

On Sept 6, 2025

montroser

It's kinda the same as real life project managers.

musbemus

One thing I hope to see included is a precursor step when constructing specs where Claude is used to intelligently inquire about gaps to fill that would disambiguate the implementation. If you told an engineer to do something with a set of requirements and outcomes, they'd naturally also have follow-up questions to ensure alignment before executing.

joshstrange

Yes, kind of like open AI’s deep research tool. I often find that a number of mistakes are made because no clarification questions or ask or even considered.

CompoundEyes

For anyone that’s applied one of these at what level of autonomy are you using it? And in what setting? Greenfield?

I see one mention brownfield development. Has anyone with experience using these frameworks fired up Claude Code on enterprise software and had confident results? I have unchecked access to Claude Code at work and based on personal agentic coding I’m sure they do aid it. I have decent but not consistent results with my own “system” in our code base. At least until the front end UI components are involved even with Playwright. But I’m curious — how much litter is left behind? How is your coworker tolerance? How large are your pull requests? What is your inference cost? How do these manage parallel?

The README documentation for many have a mix of fevered infomercial, system specific jargon, emoji splatter and someone’s dad’s very specific toolbox organization approach only he understands. Some feel like they’re setting the stage to sell something…trademarked!? Won’t Anthropic and others just incorporate the best of the bunch into their CLI tools in time?

Outside of work I’ve regularly used a reasoning model to produce a ten page spec, wired my project with strictest lint, type check, formatter, hooks, instruct it to check off as it goes and do red green TDD. I can tell gpt-5 in Cursor to “go”, occasionally nudge to stay on task and “ok next” then I’ll end up with what I wanted in time plus gold plating. The last one was a CLI tool for my agent to invoke and track their own work. Anyone with the same tools can just roll their own.

dsiegel2275

I'm only three weeks into using Claude Code but I'm now seeing impressive results using a structured, "role" or "persona" based approach in a large (500K+ SLOC) Elixir / Phoenix codebase. I'm using the $200 Max plan - so my inference costs are fixed.

For certain, the results are better when I use it to build new features into our platform - as opposed to making complicated refactors or other deep changes to existing parts of the system. But even in the latter case, if we have good technical documentation capturing the design and how parts of the system work (which we don't in many places), Claude Code can make good progress.

At first I was seeing a fair amount of what I would consider "bad code" - implementation and code that either didn't follow accepted coding style and patterns or that simply wasn't structured for reusability, maintainability. But after strengthening the CLAUDE.md file and adding an "elixir-code-reviewer" subagent which the "developer" persona had to use - the quality of code improved significantly.

Our platform is open source, you can see our current Claude commands and subagents here: https://github.com/Simon-Initiative/oli-torus/tree/master/.c...

fny

I didn't realize just how bad LLMs are with unpopular languages until now:

"Elixir lists do not support index based access via the access syntax"

"Never use else if or elseif in Elixir, always use cond or case for multiple conditionals."

LiamPowell

Interestingly I've found that Claude is very good at writing valid Ada, it just writes complete garbage that doesn't follow the specification at all. As an example I asked it to change the handling of command line arguments in a program so that duplicates would cause an error rather than being ignored. If I recall correctly it took 6 course corrections to get to what I asked for, but each time it wrote valid code that just didn't complete the task. One I remember was arbitrarily limiting the length of an argument to 4096 characters and then only accepting arguments that were exactly 4096 characters.

Here is the relevant change, it didn't have any sort of hidden complexity: https://github.com/Prunt3D/prunt/commit/b4d7f5e35be6017846b8...

iandanforth

Thanks for sharing that extensive documentation!

stocksinsmocks

I got a strong sense of LLM style in the blog. Interesting information, but that I’m learning about AI from AI is amusing.

stingraycharles

Yeah, unfortunately this is very often the case with articles about AI.

In my own experience, this type of stuff is just wishful thinking right now: for anything non-trivial, you still need to monitor Claude Code closely and interrupt when you discover it goes on the wrong train of thought.

Additionally, for security reasons, you don’t want it to give it too many permissions, and/or actually see which commands it’s executing.

The “frameworks” OP talks about are still far away. Right now the best way to think about it is an intern which is usually wrong but can cramp out code at lightning speed.

Szpadel

looks like either author didn't checked well repos that link or it's indeed redacted deep research output.

Examples: superClaude is not mcp server at all metaGPT looks like is not compatable with Claude code at all

matt3D

Pretty surprised BMAD-method wasn't mentioned.

For my money it's by far the best Claude Code compliment.

touristtam

BMAD is mentioned in the QA part, FWIW.

3uler

The BMAD system seems similar to the AgentOS mentioned in the post.

This way of context engineering has definitely been the way to go for me, although I’ve just implemented it myself… using Claude to help generate commands and agents and tweaking them to my liking, lately been using json as well as markdown to share context between steps.

CGamesPlay

What is this? Just a system prompt? What makes it so good for you?

https://github.com/bmad-code-org/BMAD-METHOD

matt3D

It manifests as a sort of extension for Claude Code.

When I'm in the terminal I can call on Agents who can create standardised documents so there is a memory of the product management side of things that extends beyond the context window of Claude.

It guides you through the specification process so that you have extremely tight tasks for Claude to churn through, with any context, documentation and acceptance criteria.

Perhaps there are others similar, but I have found it completely transformative.

3uler

It’s basically a set of commands and agents and a way to structure context.

touristtam

It is Agile for interactive session with an LLM.

imiric

This amuses me to no end: https://github.com/bmad-code-org/BMAD-METHOD/issues/546

An AI tool finding issues in a set of YAML and Markdown files generated by an AI tool, and humans puzzled by all of it.

> We should really have some code reviewer...

Gemini to the rescue!

MarcelOlsz

Same with taskmaster, also not there.

3uler

I never found taskmaster that useful, something about how it forced you to work didn’t click with me…

MarcelOlsz

Yeah that's fair, it doesn't feel great. It does work if you have something very concrete you want to make and know how to do it, so its pretty easy to scope out into some tasks and subtasks, but working on something where you generate it as you go and requires editing tasks its pretty bad.

faangguyindia

I often wonder why agent is not allowed to manage its own context like humans do?

Why recycle full history into every future turn until you run out of context window?

Perhaps letting agent manage its own context while knowing what an effective context and the harm or going over context or smartly making that tradeoff, it can navigate the tasks better?

vivzkestrel

what is your opinion about the research paper that got published recently which claims that AI coding actually slows you down 20%

3uler

That’s like saying a vim expert would be slower in VS Code - technically the IDE does more, but expertise with your existing tools often beats learning new ones.

Also that study was from early 2025 before Claude 4 which to me was a big break through in productivity, I did not really find these tools too useful before using sonnet 4.

vivzkestrel

but has that cut down your prompting time, i assume an AI agent would take a fixed amount of time to generate N lines of code. Constructing effective prompts is probably where most time is spent, has this time been cut down with newer releases or has it been proved somehow that we need N less prompts to achieve the same result with newer AI models?

anonyfox

isn't this textbook "bitter lesson" playing out here again? whatever "frameworks" people try to build, the next generation of models will make them obsolete, no?

---

link: http://www.incompleteideas.net/IncIdeas/BitterLesson.html

touristtam

The models are still ingesting text, are they not? Those framework are providing textual guidance to what the task at hand should be aim for. Those are formalising part of the context passed to the LLM, regardless of the model itself.

g42gregory

I tried B-MAD Framework and it was like night and day. Can’t work without it. I’d like to see more frameworks like that.

sharts

context management seems like low level programming where you need to carefully place the right things on cpu registers to do the operation you want correctly.

On difference is that we have less control of the context to add/remove things per task necessary.

AbuAssar

did anyone try any of these so called frameworks? do they deliver or just riding the hype-wagon?

stuartjohnson12

Having occasionally looked into these, the overwhelming issue is that pretty much all of them are built with themselves and the output is what you'd expect - vast quantities of untested functionality, no documentation beyond a wall of Claude-isms about how next generation and feature rich it is, and no thought to applicability beyond the narrow set of projects that interest the author.

troupo

> a set of rules, roles, and workflows that make its output predictable and valuable.

Let me stop you right there. Are you seriously talking about predictable when talking about a non-deterministic black box over which you have no control?

andsoitis

> Are you seriously talking about predictable when talking about a non-deterministic

Predictability and determinism are related but different concepts.

A system can be predictable in a probabilistic sense, rather than an exact, deterministic one. This means that while you may not be able to predict the precise outcome of a single event, you can accurately forecast the overall behavior of the system and the likelihood of different outcomes.

https://philosophy.stackexchange.com/questions/96145/determi...

Similarly, a system can be deterministic yet unpredictable due to practical limitations like sensitivity to initial conditions (chaos theory), lack of information, or the inability to compute predictions in time.

sublinear

The topic of chaos is underrated when people talk about deterministic systems, but I think it's at least (usually?/always?) a tractable problem to draw up a fractal or something and find the non-chaotic regions of a solution space. You have nice variables to work with when you draw up a model of the problem.

Maybe someone can elaborate better, but it seems there is no such luck trying to map probability onto problems the way "AI" is being used today. It's not just a matter of feeding it more data, but finding what data you haven't fed it or in some cases knowing you can't feed it some data because we have no known way to represent what is obvious to humans.

troupo

From the discussion in the link: "Predictability means that you can figure out what will happen next based on what happened previously."

Having used nearly all of the methods in the original article, I can predict that the output of the model is nearly indistinguishable from a coin toss for many, many, many rather obvious reasons.

raincole

Yes, and there is absolutely nothing wrong with that. Living creatures are mostly black boxes. It doesn't mean we don't aim for making medicine with predictable effects (and side effects).

sublinear

I don't think that's accurate.

Don't most people working in medicine usually have biology and chemistry degrees? Are you saying those sciences are dark arts?

raincole

Having biology degrees doesn't make you understand every detail of human body. There are many, many drugs that are known to work (by double blind testing) but we don't know exactly how.

The details of how penicillin kills bacteria were discovered in 2000s. Only about half a century of after its commercial production. And I'm quite sure we'll still see some more missing puzzle pieces in the future.

troupo

Medicine that can either kill you, cure you, or have no effect at any given time for the same disease is quite unlikely to even pass certification.

Do you know why?

raincole

That is exactly my point.

brookst

Would you say a top tier human developer produces predictable output? I would, in the sense that it will be well designed and implemented code that meets the requirements. Can we guess every variable name and logic choice? Probably not.

troupo

> Would you say a top tier human developer produces predictable output?

First you'd have to prove that LLMs can be equated to a "top tier human developer"

> I would, in the sense that it will be well designed and implemented code that meets the requirements.

Indeed. Something LLMs can or cannot do with all the predictability of a toss coin.

signatoremo

Huh? I guarantee you, if you give two different developers tbe exact sane set of requirements, that you’d get two very different programs. Try it. They likely perform differently also, performance- or resource-wise.

Would you still call that predictable? Of course you would, as long as they meet your requirements. Put it another way, anything is unpredictable depending on your level of scrutiny. AI is likely less predictable than human, doesn’t mean it isn’t helpful. You are free to dismiss it of course.

troupo

> Of course you would, as long as they meet your requirements.

Key word: "as long as they meet your requirements".

I've yet to meet an LLM that can predictably do that. Even on the same code with the same tools/prompt/rituals a few hours apart.

> AI is likely less predictable than human, doesn’t mean it isn’t helpful.

I'm struggling to see where I said they weren't helpful or that I dismissed them

dist-epoch

Non-deterministic does not mean not-predictable.

Quantum mechanics is non-deterministic, yet you can predict the motion of objects with exquisite precision.

All these "non-deterministic boxes" will give the same answer to the question "What is the capital of France"

sublinear

Yes, but the "exquisite precision" comes from the deterministic parts of physics.

baq

Nah. The only thing we can establish precisely at the lowest levels is probability. We can and do engineer systems to maximize probabilities of desired outcomes and minimize probabilities of undesirable ones.

Frankly I don’t understand how software engineers (not coders mind you) can have issues with non deterministic tools while browsing the web on a network which can stop working anytime for any reason.

cindyllm

[dead]