DeepSearcher: A Local open-source Deep Research
13 comments
·February 25, 2025parhamn
I think the magic of Grok's implementation of this is that they already have most of the websites cached (guessing via their twitter crawler) so it all feels very snappy. Bing/Brave search don't seem to offer that in their search apis. Does such a thing exist as a service?
tekacs
I’ve been wondering about this and searching for solutions too.
For now we’ve just managed to optimize how quickly we download pages, but haven’t found an API that actually caches them. Perhaps companies are concerned that they’ll be sued for it in the age of LLMs?
The Brave API provides ‘additional snippets’, meaning that you at least get multiple slices of the page, but it’s not quite a substitute.
fragmede
the common crawl dataset is rather massive, though I can't speak to how well it would perform here
stefanwebb
There's two blog posts that go with this, check it out:
https://milvus.io/blog/i-built-a-deep-research-with-open-sou...
https://milvus.io/blog/introduce-deepsearcher-a-local-open-s...
zitterbewegung
I actually tried using this and I came into some issues and I had to replace the openAI text embeddings with the MilvusEmbedding.
https://gist.github.com/zitterbewegung/086dd344d16d4fd4b8931...
The QuickStart had a good response. [1] https://gist.github.com/zitterbewegung/086dd344d16d4fd4b8931...
bilater
Nice - I like people's different twist on Deep Research. Here is mine...with Flow I'm trying a new workflow.
Daniel_Van_Zant
Have been searching for a deep research tool that I can hook up to both my personal notes (in Obsidian) and the web and this looks like this has those capabilities. Now the only piece left is to figure out a way to export the deep research outputs back into my Obsidian somehow.
jianc1010
Sometimes I wanted to do a little coding to automate things with my personal productivity tool so i feel a programatic interface that open source implementation like this provides is very convenient
vineyardmike
I’m curious how this compares to the open-source version made by HuggingFace [1]. As I can tell, the HF version uses reasoning LLMs to search/traverse and parse the web and gather results, then evaluates the results before eventually synthesizing a result.
This version appears to show off a vector store for documents generated from a web crawl (the writer is a vector-store-aaS company)
[1] https://github.com/huggingface/smolagents/tree/main/examples...
fuddle
Considering all the major AI companies have basically created the same deep research product, it would make sense that they focus on a shared open source platform instead.
cma
Cloudflare is going to ruin self hosted things like this and force centralization to a few players. I guess we'll need decentralized efforts to scrape the web and be able to run it on that.
redskyluan
Amazing!
Search is not a problem . What to search is!
Using reasoning model, it is much easier to split task and focus on what to search
This doesn't seem to use local LLMs... so it's not really local. :-\
Is there a deep searcher that can also use local LLMs like those hosted by Ollama and LM Studio?