SWE-Grep and SWE-Grep-Mini: RL for Fast Multi-Turn Context Retrieval
12 comments
·October 16, 2025kburman
I thought https://playground.cognition.ai/ was just returning some cached query results, but no, they’re actually spinning up real VMs and running live queries without any authentication or restrictions. That must be costing them a fortune.
breadislove
guys please release the benchmark or the benchmark code. like this is just "trust me bro"
marstall
SWE-1 has been being booped up by WindSurf to me lately and I've been impressed - often (enough?) getting me the same answers as GPT5 etc., but almost instantly. Gotta say speed is nice.
swyx
nice, what does booped up mean? is this gen z lingo?
marstall
ha more like how i talk to my two year old. WindSurf's Cascade sidebar tool (which i use in RubyMine) has a stable of LLMs and it somewhat randomly switches the active one out from time to time. So I get a taste of what different ones are like, it's kind of cool.
silasalberti
hey I'm from the SWE-grep team - feel free to ask me any questions :)
daralthus
this would be useful outside of coding. could you release a benchmark so we can have more models tuned for this?
tifa2up
Searched for 'hi' and it took 166s to return a response using this model: https://pasteboard.co/oB4VqVC5FGkl.png
Claude Code took 0.1s, Cursor CLI 19s
mgambati
If you ask a real question, then you might get real results.
swyx
(coauthor) main charts/evals here https://x.com/cognition/status/1978867021669413252
you can try the https://playground.cognition.ai/ here
i wrote a longer explainer here https://x.com/swyx/status/1978874342743343254 but saving you the click
this was a perspective cut from the blogpost, but let me explain why subagents kill long context
Like you can spend $500m building 100 million context models, and they would be 1) slow, 2) expensive to use, 3) have huge context rot. O(n) is the lower bound.
Cog's approach is something you learn in day 1 of CS50 - divide and parallelize. Embeddings are too dumb, Agentic Search is too slow. So train limited-agency (max 4 turns), natively parallel tool calling (avg parallelism of 7-8, custom toolset) fast (2800tok/s) subagents to give the performance of Agentic Search under an acceptable "Flow Window" that feels immaterially slower than Embeddings.
The benefit of this is threefold:
- 8 ^ 4 toolcalls cover a very large code search space. can compound subagent calls if more needed.
- predictable cost & end to end latency
- subagent outputs "clean" contexts, free of context failure modes like context poisoning and context rot
we originally called this Rapid Agentic Search, to contrast with RAG. but Fast Context rolls off the tongue better.
-- Second perspective --
The Fundamental Equation of Coding Agents is:
Coding Agent Performance = Ability to Read the Right Files * Ability to Generate the Right Diffs
Fast Context is Cognition's first solution for the Read. As codebases get larger and and tasks get more complex, Reads get more important. the average production codebase first query in Cascade is >60% just searching and reading files.
But if this were just about speed, it might not be that exciting. I think there are unappreciated effects in performance as well when you have very good context. In other words:
Context Engineering is Actually Very Important. Too important for humans and hardcoded rules.
The swe-greps are the first dedicated context engineer agent models.
ntntnt
lol dead thread, cognition begging to grab some traction in this space.
LLM product managers: Show me what's in the context convenient to where I am prompting. Likely the user knowing and editing the precise context between requests will be a user task for a long time