Why Cline doesn't index your codebase

113 comments

·May 27, 2025

jeffchuber

This is still retrieval and RAG, just not vector search and indexing. it’s incredibly important to be clear about terms - and this article does not meet the mark.

nick-baumann

Fair point Jeff -- you're right that we're still doing retrieval. The key distinction is how we retrieve.

Traditional RAG for code uses vector embeddings and similarity search. We use filesystem traversal and AST parsing - following imports, tracing dependencies, reading files in logical order. It's retrieval guided by code structure rather than semantic similarity.

I highly recommend checking out what the Claude Code team discovered (48:00 https://youtu.be/zDmW5hJPsvQ?si=wdGyiBGqmo4YHjrn&t=2880). They initially experimented with RAG using embeddings but found that giving the agent filesystem tools to explore code naturally delivered significantly better results.

From our experience, vector similarity often retrieves fragments that mention the right keywords but miss the actual implementation logic. Following code structure retrieves the files a developer would actually need to understand the problem.

So yes -- I should have been clearer about the terminology. It's not "no retrieval" -- it's structured retrieval vs similarity-based retrieval. And with today's frontier models having massive context windows and sophisticated reasoning capabilities, they're perfectly designed to build understanding by exploring code the way developers do, rather than needing pre-digested embeddings.

phillipcarter

Probably good to add a disclaimer at the top that clarifies the definition, since RAG is ultimately just a pattern, and vector indexes are just one way to implement the pattern.

Indeed, industry at large sees RAG as equivalent to "vector indexes and cosine similarity w.r.t. input query", and the rest of the article explains thoroughly why that's not the right approach.

anon373839

> industry at large sees RAG as equivalent to "vector indexes and cosine similarity w.r.t. input query"

Yep, and this is getting really old. Information retrieval is not a new problem domain. Somehow, when retrieved info is fed into an LLM, all nuance is lost and we end up with endless pronouncements about whether retrieval is/is not "dead".

aryamaan

Hi, nick, given that this product is opensourced, I have a request/ wish:

It would be wondeful if some of the tools the projects uses are exposed to build on. Like the tools related to AST, finding definitions, and many more

dcreater

If you're putting everything in the context window, is it still considered "retrieval"? Did we have a preexisting robust definition of what constitutes retrieval?

Tycho

Don’t take this the wrong way, but did you use an LLM to generate this reply? The reply is good, but the writing style just piqued my curiosity.

kunzhi

This may not be what is meant here, but I wonder if in the future anybody who actually bothered to learn to write well will automatically be flagged as likely having used AI to write their prose.

For instance, everyone seems to believe that em dashes are something only an AI would use -- but I've been using them in my writing for a long time.

jasonjmcghee

This doesn't read like an LLM at all...

dkubb

I wonder if the more we use LLMs the more our written patterns will begin to match them. Those of us who work with them the most are likely to be affected earliest.

null

[deleted]

arbot360

[flagged]

WhitneyLand

The article reads fine to me.

Yes by technicality RAG could mean any retrieval, but in practice when people use the term it’s almost always referring to some sort of vector embedding and similarity searching.

jamesblonde

Wittginstein would concur

paxys

> it’s incredibly important to be clear about terms

Is it? None of these terms even existed a couple of years ago, and their meaning is changing day by day.

throwaway314155

Fairly pedantic take.

colordrops

I guess that's technically true, but RAG has colloquially taken on the meaning of vector database retrieval. Perhaps there's a paper out there that defines RAG specifically as any data retrieval, but at this point in time that's so general a term that it's bordering on useless. It's like saying "network connected application". No one has said that for decades now that it's status quo. Also, there are many types of networks, but "network connected app" generally meant TCP, despite it not being in the name.

null

[deleted]

silverlake

Have you guys at Cline considered using LLMs to create summaries of files and complex functions? Rather than read a 500 line function, feed it a short comment on what the function is doing. I'd like to use a local LLM to create summaries at every level: function, file, directory. Then let the LLM use that to find the right code to read. This is basically how I navigate a large code base.

dpe82

I've just used Cline to produce files like that, and then later when starting a task in plan mode I tell it to read those files to get a sense of the project's structure. I also tell it to update them as necessary after whatever task we're doing is finished.

WhitneyLand

I generally agree with the article and the approach given practical constraints, however it’s all stop gap anyway.

Using Gemini 2.5’s 1MM token context window to work with large systems of code at once immediately feels far superior to any other approach. It allows using an LLM for things that are not possible otherwise.

Of course it’s damn expensive and so hard to do in a high quality way it’s rare luxury, for now…

orbital-decay

It's always a tradeoff, and most of the time chunking and keeping the context short performs better.

I feed long context tasks to each new model and snapshot just to test the performance improvements, and every time it's immediately obvious that no current model can handle its own max context. I do not believe any benchmarks, because contrary to the results of many of them, no matter what the (coding) task is, the results start getting worse after just a couple dozen thousand tokens, and after a hundred the accuracy becomes unacceptable. Lost-in-the-middle is still a big issue as well, at least for reasoning if not for direct recall - despite benchmarks showing it's not. LLMs are still pretty unreliable at one-shotting big things, and everything around it is still alchemy.

loandbehold

1 million tokens is still not enough for real life codebases (100Ks to millions loc)

simonklee

And it's obviously expensive use this approach.

ramoz

I kept wondering why Cursor was indexing my codebase, it was never clear.

Anyway context to me enables a lot more assurance and guarantees. RAG never did.

My favorite workflow right now is:

  - Create context with https://github.com/backnotprop/prompt-tower
  - Feed it to Gemini
  - Gemini Plans
  - I pass the plan into my local PM framework
  - Claude Code picks it up and executes
  - repeat

anukin

How does this work?

It’s not clear how context is used to plan by Gemini then the plan is fed to local framework. Do I have to replan every time context changes?

woah

They use a tool like the one they linked to put all their files into one file that they give to gemini.

Then they put the plan into their "PM framework" (some markdown files?) to have Claude Code pick tasks out of.

greymalik

Can you give an example of a local PM framework? What happens in this step - ticket creation?

mindcrash

Warning: I don't know what this post does in the background but it definitely slows down Firefox 138 to the point that it is barely usable

sbarre

Yeah same here, the whole site is just unresponsive... So it's not just you.

lsaferite

I almost had to force-quit FF as well.

nchmy

I cant help but think this article was published as a direct response to Cline's (arguably superior) fork, Roo Code, releasing an experimental Codebase Indexer in the past week...

https://docs.roocode.com/features/experimental/codebase-inde...

Augment Code's secret sauce is largely its code indexer, and I find it to be the best coding agent around.

olejorgenb

I've been wondering when someone would finally use the actual code structure to do RAG. It seems like such an obvious, if somewhat harder (at least if you need to support many languages), approach.

The vector/keyword based RAG results I've seen so far for large code bases (my experience is Cody) has been quite bad. For a smaller projects (using Cursor) it seems to work quite well though.

arthur-st

Aider has been doing that for a long time now, it was the first project to do this afaik.

behnamoh

I never had good experience with RAG anyway, and it felt "hacky". Not to mention most of it basically died when most models started supporting +1M context.

LLMs are already stochastic. I don't want yet another layer of randomness on top.

WaltPurvis

>Not to mention most of it basically died when most models started supporting +1M context.

Do most models support that much context? I don't think anything close to "most" models support 1M+ context. I'm only aware of Gemini, but I'd love to learn about others.

fkyoureadthedoc

GPT 4.1 / mini / nano

consumer451

As the context grows, all LLMs appear to turn into idiots, even just at 32k!

> We evaluate 12 popular LLMs that claim to support contexts of at least 128K tokens. While they perform well in short contexts (<1K), performance degrades significantly as context length increases. At 32K, for instance, 10 models drop below 50% of their strong short-length baselines. Even GPT-4o, one of the top-performing exceptions, experiences a reduction from an almost-perfect baseline of 99.3% to 69.7%.

https://news.ycombinator.com/item?id=44107536

heyhuy

> and it felt "hacky"

I think the pattern that coined "RAG" is outdated, that pattern being relying on cosine similarities against an index. It was a stop gap for the 4K token window era. For AI copilots, I love Claude Code, Cline's approach of just following imports and dependencies naturally. Land on a file and let it traverse.

No more crossing your fingers with match consign and hoping your reranker did drop a critical piece.

k__

Cursor does it.

Cline doesn't.

Aider goes the middle way with repo maps.

Let's see what works best.

therealmarv

I would even classify aider a different way because it avoids what the blog posts is talking about (or moves the responsibility more to the human). In aider you can manually define or basically override which files are important for your context and for the LLM to know. With 1M context windows that's like a no brainer (except for extremely massive projects). I mean the repo map truly helps but I compare it with like driving a manual car... you shift the gears yourself by selecting the relevant files yourself and not your "automatic car"/AI code assistant for you like on cursor, cline, roo, claude code etc.

bryanlarsen

What's the difference between Aider's repo map and the Cline AST? Sounds like a small difference in implementation than a difference in concept.

k__

I think, the repomap is an AST that doesn't go down all the way.

wiradikusuma

When I'm in the middle of a conversation with AI, sometimes I like to change the code immediately ("Oh that's wrong, let me fix that first"). Does it mess the context? Or I need to close/Ctrl+C and reopen?

didibus

I thought Cursor had support for MCP now? So in theory it can now navigate the code base, query for code structure, and so on as well no?

NitpickLawyer

Eh... I'm not convinced. I like cline, I've been using it here and there and I think it found a good mix between "vibesus take the wheel" and "hey, I'm still here and I enjoy doing this". I was particularly surprised that it worked pretty well with local models. A lot of that is on the model (tested w/ devstral) but a lot of it is on the cradle (e.g. aider is great at what it does, but local model support is hit and miss).

First, like some other comments have mentioned RAG is more than result = library.rag(). I get that a lot of people feel RAG is overhyped, but it's important to have the right mind model around it. It is a technique first. A pattern. Whenever you choose what to include in the context you are performing RAG. Retrieve something from somewhere and put it in context. Cline seems to delegate this task to the model via agentic flows, and that's OK. But it's still RAG. The model chooses (via tool calls) what to Retrieve.

I'm also not convinced that embedding can't be productive. I think nick is right to point out some flaws in the current implementations, but that doesn't mean the concept in itself is flawed. You can always improve the flows. I think there's a lot to gain from having embeddings, especially since they capture things that ASTs don't (comments, doc files, etc).

Another aspect is the overall efficiency. If you have somewhat repetitive tasks, you'll do this dance every time. Hey, fix that thing in auth. Well, let's see where's auth. Read file1. Read file2. Read fileN. OK, the issue is in ... You can RAG this whole process once and re-use (some) of this computation. Or you can do "graphRAG" and do this heavy lifting once per project and have AST + graph + model dump that can be RAGd. There's a lot of cool things you can do.

In general I don't think we know enough about the subject, best practices and useful flows to confidently say "NO, never, nuh-huuh". I think there might be value there, and efficiencies to be gained, and some of them seem like really low hanging fruit. Why not take them?

avereveard

at some point they will move from scanning files to scannign the AST and then token consumption will be greatly reduced by default, the challenge is that then you need something generic enough like tree-sitter to reduce the monumental effort of integrating a number of parsers.

layer8

Why would an AST greatly reduce LLM token consumption?

avereveard

a lot of token are used reading files whole just to understand where to fit the feature requested and the edit point, access to an AST would allow the llm to see the project "wireframe" so to say, by asking classes or method level granularity, and only then retrieving the source for the symbol that most likely contains the edit point the llm needs. some token consumption there is anovaidable as the llm need the source to build a diff, but still, it's focused on the feature than the search.

bicepjai

More focus on what to use rather when the whole file where the code snippet sits

jdoliner

Does anybody else see high CPU and GPU utilization on this site with a process called ClineBot?

crop_rotation

After trying cline,aider,codex, and what not, I feel claude code is just so so better than all of them. e.g It takes much much fewer prompts to be able to do the same thing compared to cline. tbh I am not sure how cline will compete against something like Claude code due to the resource/capability imbalance. Does anyone else have a different experience?

XenophileJKO

I really felt like Claude Code would benefit greatly from a similar structural map. The little map it made in the Claude.md is insufficient. When the code base grows or you add a more componentized approach, Claude Code started favoring a locality bias which increases the architectural entropy a LOT.

loandbehold

Same experience. Claude Code is much better than all other tools. I suspect Claude Code uses some private features of Claude model that's not available to other tools. It only makes sense that Anthropic will develop their model in conjunction with the tool to produce best result.

HN

Why Cline doesn't index your codebase

Why Cline doesn't index your codebase