Don't Build Multi-Agents
57 comments
·September 1, 2025colonCapitalDee
I'm building a simple agent accessible over SMS for a family member. One of their use cases is finding recipes. A problem I ran into was that doing a web search for recipes would pull tons of web pages into the context, effectively clobbering the system prompt that told the agent to format responses in a manner suited for SMS. I solved this by creating a recipe tool that uses a sub-agent to do the web search and return the most promising recipe to the main agent. When the main agent uses this tool instead of performing the web search itself, it is successfully able to follow the system prompt's directions to format and trim the recipe for SMS. Using this sub-agent to prevent information from entering the context dramatically improved the quality of responses. More context is not always better!
I bring this up because this article discusses context management mostly in terms of context windows having a maximum size. I think that context management is far more than that. I'm still new to this building agents thing, but my experience suggests that context problems start cropping up well before the context window fills up.
_0ffh
You mean sub-agent as in the formatting agent calls on the the search-and-filter agent? In that case you might just make a pipeline. Use a search agent, then a filter agent (or maybe only one search-and-filter agent), then a formatting agent. Lots of tasks work better with a fixed pipeline than with freely communicating agents.
edoceo
Are you in USA? How to get around those 10DLC limits on typical SMS/API things (eg Twilio). Or did you go through that process (which seems a lot for a private use-case)
colonCapitalDee
I am in the USA! Although these days that exclamation point doesn't feel great...
I'm using an old Android phone (Pixel 2 from 2017), a 5$ a month unlimited SMS plan from Tello, and https://github.com/capcom6/android-sms-gateway. For bonus points (I wanted to roll my own security, route messages from different numbers to prod and ppe instances of my backend, and dedup messages) I built a little service in Go that acts as an intermediary between my backend and android-sms-gateway. I deploy this service to my android device using ADB, android-sms-gateway talks to it, and it talks to my backend. I also rooted the android device so I could disable battery management for all apps (don't do this if you want to walk around with the phone of course). It works pretty well!
I plan to open-source this eventually TM, but first I need to decouple my personal deployment infra from the bits useful to everyone else
faangguyindia
There’s both “no multi-agent system” and “multi-agent system,” depending on how you look at it. In reality, you’re always hitting the same /chat/completion API, itself has no awareness of any agents. Any notion of an agent comes purely from the context and instructions you provide.
Separating agents, has a clear advantage. For example, suppose you have a coding agent with a set of rules for safely editing code. Then you also have a code search task, which requires a completely different set of rules. If you try to combine 50 rules for code editing with 50 rules for code searching, the AI can easily get confused.
It’s much more effective to delegate the search task to a search agent and the coding task to a code agent. Think of it this way: when you need to switch how you approach a problem, it helps to switch to a different “agent”, a different mindset with rules tailored for that specific task.
Do i need to think differently about this problem? if yes, you need a different agent!
So yes, conceptually, using separate agents for separate tasks is the better approach.
eab-
There's both "no multi-program system" and "multi-program system", depending on how you look at it. In reality, you're always executing the same machine code, itself has no awareness of programs.
stirfish
This unironically helped me work through a bug just now
datadrivenangel
Calling a different prompt template an 'agent' doesn't help communicate meaningful details about an overall system design. Unnecessary verbiage or abstraction in this case.
CuriouslyC
We're in the context engineering stone age. You the engineer shouldn't be trying to curate context, you should be building context optimization/curation engines. You shouldn't be passing agents context like messages, they should share a single knowledge store with the parent, and the context optimizer should just optimally pack their context for the task description.
hansvm
You're not wrong. This is just a storage/retrieval problem. But ... the current systems have limits. If you want commercial success in <3yrs, are any of those ideas remotely viable?
CuriouslyC
Oh yeah, and if you tried to do one now it'd be a bad idea because I'm almost done :)
The agentic revolution is very different from the chatbot/model revolution because agents aren't a model problem, they're a tools/systems/process problem. Honestly the models we have now are very close to good enough for autonomous engineering, but people aren't giving them the right tools, the right processes, we aren't orchestrating them correctly, most people have no idea how to benchmark them to tune them, etc. It's a new discipline and it's very much in its infancy.
nickreese
Is there a framework for this?
CuriouslyC
I have one that's currently still cooking, I have good experimental validation for it but I need to go back and tune the latency and improve the install story. It should help any model quite a bit but you have to hack other agents to integrate it into their api call machinery, I have a custom agent I've built that makes it easy to inject though.
jrvarela56
Ive used CrewAI to compose agents, it’s easy to mix and match and it does seem to change context based on roles https://docs.crewai.com/en/guides/agents/crafting-effective-...
sippeangelo
"should just"?
CuriouslyC
It's really not hard. It's just all the IR/optimization machinery we already have applied to a shared context tree with locality bias.
TuringNYC
What software are you using to create these beautiful diagrams in the article?
curl-up
In the context compression approach, why aren't the agents labeled as subagents instead? The compressed context is basically a "subtask".
This is my main issue with all these agentic frameworks - they always conviniently forget that there is nothing "individual" about the thing they label "an agent" and draw a box around.
Such "on demand" agents, spawned directly from previos LLM output, are never in any way substantially different from dynamic context compression/filtering.
I think the only sensible framework is to think in terms of tools, with clear interfaces, and a single "agent" (single linear interaction chain) using those tools towards a goal. Such tools could be LLM-based or not. Forcing a distinction between a "function tool" and an "agent that does somethng" doesn't make sense.
peab
Yeah, i agree with thinking of things as a single agent + tools.
From the perspective of the agent, whether the tools are deterministic functions, or agents themselves, is irrelevant.
jskalc92
I think the most common implementation of "subagents" doesn't get full context of a conversation, rather just an AI-generated command.
Here task is fullfilled with the full context so far, and then compressed. Might work better IMO.
adastra22
In my experience it does not work better. There are two context related values to subagent tool calls: (1) the subagent trials and deliberations don’t poison the callers context [this is a win here]; and (2) the called agent isn’t unduly influenced by the caller’s context. [problem!]
The latter is really helpful for getting a coding assistant to settle on a high quality solution. You want critic subagents to give fresh and unbiased feedback, and not be influenced by arbitrary decisions made so far. This is a good thing, but inheriting context destroys it.
adastra22
> As of June 2025, Claude Code is an example of an agent that spawns subtasks. However, it never does work in parallel with the subtask agent, and the subtask agent is usually only tasked with answering a question, not writing any code.
Has this changed since June? Because I’ve been experimenting over the last month with Claude Cide subagents that work in parallel and agents which write code (doing both simultaneously is inadvisable for obvious reasons, at least without workspace separation).
clbrmbr
I’ve been quite successful since June doing parallel edits just on different components within the same codebase. But I’ve not been able to do it with “auto-accept” because I need a way to course correct if one of the agents goes off the rails.
mreid
Is it concerning to anyone else that the "Simple & Reliable" and "Reliable on Longer Tasks" diagrams look kind of like the much maligned waterfall design process?
CuriouslyC
Waterfall is just a better process with agents. Agile is garbage when inserting yourself in the loop causes the system to drop to 10% velocity.
amelius
It looks more like alchemy, thb.
DarkNova6
To me it seems more like the typical trap of a misfit bounded context.
sputknick
This is very similar to the conclusion I have been coming to over the past 6 months. Agents are like really unreliable employees, that you have to supervise, and correct so often that its a waste of time to delegate to them. The approach I'm trying to develop for myself is much more human centric. For now I just directly supervise all actions done by an AI, but I would like to move to something like this: https://github.com/langchain-ai/agent-inbox where I as the human am the conductor of work agents do, then check in with me for further instructions or correction.
avereveard
"I designed a bad system so all system of these class must be bad"
They really handing out ai domain to anyone these days.
codelion
Whom to believe? Devin or Claude? - https://www.anthropic.com/engineering/multi-agent-research-s...
KaoruAoiShiho
If you actually read the claude article it says the same things as the cognition article, it just has a different definition of multi-agent.
null
> By using React, you embrace building applications with a pattern of reactivity and modularity, which people now accept to be a standard requirement, but this was not always obvious to early web developers.
This is quite a whopper. For one thing, the web started off reactive. It did take a while for a lot of people to figure out how to bring that to client-side rendering in a reasonably decent way (though, I'm sorry, IMO that doesn't actually include react). Second, "modularity" has been a thing for quite some time before the web existed. (If you want to get down to it, separating and organizing and your processes in information systems predate computers.)