DeepRAG: Thinking to retrieval step by step for large language models

32 comments

·February 4, 2025

mkw5053

This reminds me of the Agent Workflow Memory (AWM) paper [1], which also tries to find optimal decision paths for LLM-based agents but relies on in-context learning, whereas DeepRAG fine-tunes models to decide when to retrieve external knowledge.

I’ve been thinking about how modifying AWM to use fine-tuning or an external knowledge system (RAG) might work—capturing the ‘good’ workflows it discovers rather than relying purely on prompting.

[1] https://arxiv.org/abs/2409.07429 - Agent Workflow Memory (Wang et al., 2024)

brunohaid

Noice!

Does anyone have a good recommendation for a local dev setup that does something similar with available tools? Ie incorporates a bunch of PDFs (~10,000 pages of datasheets) and other docs, as well as a curl style importer?

Trying to wean myself off the next tech molochs, ideally with local functionality similar to OpenAIs Search + Reason, and gave up on Langchain during my first attempt 6 months ago.

weitendorf

My company (actually our two amazing interns) was working on this over the summer, we abandoned it but it’s 85% of the way to doing what you want it to do: https://github.com/accretional/semantifly

We stopped working on it mostly because we had higher priorities and because I became pretty disillusioned with top-K rag. We had to build out a better workflow system anyway, and with that we could instead just have models write and run specific queries (eg list all .ts files containing the word “DatabaseClient”), and otherwise have their context set by users explicitly.

The problem with RAG is that simplistic implementations distract and slow down models. You probably need an implementation that makes multiple passes to prune the context down to what you need to get good results, but that’s complicated enough that you might want to build something else that gives you more bang for your buck.

brunohaid

Thanks for the excellent comment & insight!

throwup238

Honestly you're better off rolling your own (but avoid LangChain like the plague). The actual implementation is simple but the devil is in the details - specifically how you chunk your documents to generate vector embeddings. Every time I've tried to apply general purpose RAG tools to specific types of documents like medical records, internal knowledge base, case law, datasheets, and legislation, it's been a mess.

Best case scenario you can come up with a chunking strategy specific to your use case that will make it work: stuff like grouping all the paragraphs/tables about a register together or grouping tables of physical properties in a datasheet with the table title or grouping the paragraphs in a PCB layout guideline together into a single unit. You also have to figure out how much overlap to allow between the different types of chunks and how many dimensions you need in the output vectors. You then have to link chunks together so that when your RAG matches the register description, it knows to include the chunk with the actual documentation so that the LLM can actually use the documentation chunk instead of just the description chunk. I've had to train many a classifier to get this part even remotely usable in nontrivial use cases like caselaw.

Worst case scenario you have to finetune your own embedding model because the colloquialisms the general purpose ones are trained on have little overlap with how terms of art and jargon used in the documents (this is especially bad for legal and highly technical texts IME). This generally requires thousands of examples created by an expert in the field.

benjamincburns

Disclosure: I'm an engineer at LangChain, primarily focused on LangGraph. I'm new to the team, though - and I'd really like to understand your perspective a bit better. If we're gritting the wheels for you rather than greasing them, I _really_ want to know about it!

> Every time I've tried to apply general purpose RAG tools to specific types of documents like medical records, internal knowledge base, case law, datasheets, and legislation, it's been a mess.

Would it be fair to paraphrase you as saying that people should avoid using _any_ library's ready-made components for a RAG pipeline, or do you think there's something specific to LangChain that is making it harder for people to achieve their goals when they use it? Either way, is there more detail that you can share on this? Even if it's _any_ library - what are we all getting wrong?

Not trying to correct you here - rather stating my perspective in hopes that you'll correct it (pretty please) - but my take as someone who was a user before joining the company is that LangChain is a good starting point because of the _structure_ it provides, rather than the specific components.

I don't know what the specific design intent was (again, new to the team!) but just candidly as a user I tend to look at the components as stand-ins that'll help me get something up and running super quickly so I can start building out evals. I might be very unique in this, but I tend to think that until I have evals, I don't really have any idea if my changes are actually improvements or not. Once I have evals running against something that does _roughly_ what I want it to do, I can start optimizing the end-to-end workflow. I suspect in 99.9% of cases that'll involve replacing some (many?) of our prebuilt components with custom ones that are more tailored to your specific task.

Complete side note, but for anyone looking at LangChain to build out RAG stuff today, I'd advise using LangGraph for structuring your end-to-end process. You can still pull in components for individual process steps from LangChain (or any other library you prefer) as needed, and you can still use LangChain pipelines as individual workflow steps if you want to, but I think you'll find that LangGraph is a more flexible foundation to build upon when it comes to defining the structure of your overall workflow.

byefruit

> This generally requires thousands of examples created by an expert in the field.

Or an AI model pretending to be an expert in the field... (works well in a few niche domains I have used this in)

deoxykev

Don't forget to finetune the reranker too if you end up doing the embedding model. That tends to have outsized effects on performance for out of distribution content.

3abiton

I am looking up chunking techniques, but the resources are so scarce on this. What's your recommendation?

petesergeant

It’s the big unsolved problem and nobody’s talking about it. I’ve had some decent success asking an expensive model to generate the chunks and combining that with document location, and my next plan for an upcoming project is to do that hierarchically, but there’s no generally accepted solution yet.

RAG’s big problem is turning PDFs into chunks, both as a parsing problem and as the chunking problem. I paid someone to do the parsing part into markdown for a project recently (including table data summaries) and it worked well. MathPix has an good API for this, but it only works sensibly for PDFs that don’t have insane layouts, and many do.

oceansweep

Not the person you asked, but it's dependent on what you're trying to chunk. I've written a standalone chunking library for an app I'm building: https://github.com/rmusser01/tldw/blob/main/App_Function_Lib...

It's setup so that you can perform whatever type of chunking you might prefer.

RansomStark

If there's a list of techniques and their optimal use cases I haven't found it. I started writing one for the day job, but then graphRAG happened, and Garnter is saying all RAG will be graphRAG.

You can't fight Gartner, no matter how wrong they are, so the work stopped, now everything is a badly implemented graph.

That's a long way to say, if there is a comparison, a link would be most appreciated

dimitri-vs

Semantic chunking is where I would start with now. Also check this out: https://github.com/chonkie-ai/chonkie

crishoj

> but avoid LangChain like the plague

Can you elaborate on this?

I have a proof-of-concept RAG system implemented with LangChain, but would like input before committing to this framework.

t1amat

LangChain is considered complicated to get started with despite offering probably the widest amount of functionality. If you are already comfortable with LangChain you are free to ignore that.

cpursley

I've had great luck just base64'ing images and asking Qwen 2.5 VL to both parse it to markdown and generate a title, description and list of keywords (seems to work well on tables and charts). My plan is to split PDFs into pngs first then run those against Qwen async, then put them into a vector database (haven't gotten around to that quite yet).

metadat

How does the base64 output become useful / usable information to an LLM?

ideashower

Why avoid Langchain?

jondwillis

Continue and Cline work with local models (e.g. via Ollama) and have good UX for including different kinds of context. Cursor uses remote models, but provides similar functionality.

brunohaid

Appreciated! Didn’t know Cline already does RAG handling, thought I’d have to wire that up beforehand.

amrrs

I'm sorry trying to clarify - why would you use Cline (which is coding assistant) for RAG?

jondwillis

I may have misunderstood, but it seems the OPs intent was to get the benefits of RAG, which Cline enables, since it performs what I would consider RAG under the hood.

kordlessagain

I’ve been working on something that provides document search for agents to call if they need the documents. Let me know if you are interested. It’s Open Source. For this many documents it will need some bucketing with semantic relationships, which I’ve been noodling on this last year. Still needs some tweaking for what you are doing, probably. Might get you further along if you are considering rolling your own…

heywoods

Could I take a look at the repo? Thanks!

kordlessagain

https://github.com/MittaAI/webwright

Let me know if you want to go over the code or want to discuss what works and what doesn’t. We had a loop on the action/function call “pipeline” but I changed it to just test if there was a function call or not and then just keep calling.

numba888

> gave up on Langchain during my first attempt 6 months ago

Why? If it's not a secret. I'm just looking for something, not sure actually what... :-\

jondwillis

The title reads awkwardly to a native English speaker. A search of the PDF for "latency" returns one result, discussing how naive RAG can result in latency. What are the latency impacts and other trade-offs to achieve the claimed "[improved] answer accuracy by 21.99%"? Is there any way that I could replicate these results without having to write my own implementation?

null

[deleted]

null

[deleted]

null

[deleted]

HN

DeepRAG: Thinking to retrieval step by step for large language models

DeepRAG: Thinking to retrieval step by step for large language models