Skip to content(if available)orjump to list(if available)

Open Deep Research

Open Deep Research

12 comments

·February 4, 2025

transpute

https://techcrunch.com/2025/02/04/hugging-face-researchers-a...

> On GAIA, a benchmark for general AI assistants, Open Deep Research achieves a score of 54%. That’s compared with OpenAI deep research’s score of 67.36%..Worth noting is that there are a number of OpenAI deep research “reproductions” on the web, some of which rely on open models and tooling. The crucial component they — and Open Deep Research — lack is o3, the model underpinning deep research.

Blog post, https://huggingface.co/blog/open-deep-research

swyx

theres always a lot of openTHING of THING after THING is announced. they all usually (not always!) disappoint/dont get traction. i think the causes are

1. running things in production/self hosting is more annoying than just paying like 20-200/month

2. openTHING makers are overhyping their superficial repros and trivializing the minor touches, most particularly in this case...

3. long horizon planning trained with RL in a tight loop that is not available in the open (yes, even with deepseek). the thing that makes OAI work as a product+research company is that products are never launched without first establishing a "prompted baseline" and then finetuning the model from there (we covered this process in https://latent.space/p/karina recently) - which becomes an evals/dataset suite that eventually gets merged in once performance impacts stabilize

4. that said, smolagents and HF are awesome and I like that they are always this on the ball. how does this make money for HF?

ai-christianson

This is pretty awesome—great to see this use of smolagents. I analyzed the code with RA.Aid and here's what it says about how it works:

  The Open Deep Research system implements a sophisticated multi-agent architecture for handling both textual and visual content through several key components:

  **1. Agent Hierarchy:**  
    - Manager agent (CodeAgent) coordinates overall task processing  
    - Specialized sub-agents handle specific types of tasks  
    - Web browser agent with tools for searching and navigation  
    - All agents maintain memory of intermediate steps  

  **2. Core Components:**  
    - SimpleTextBrowser: Text-based web browser with viewport management  
    - TextInspectorTool: Handles document content analysis  
    - VisualQATool: Processes image analysis and captions  
    - Various web tools for search, navigation, and content inspection  

  **3. Key Features:**  
    - Multi-modal processing supporting text, web, and visual content  
    - Hierarchical delegation of tasks to specialized components  
    - Integrated memory management for tracking steps  
    - Support for multiple file types with specialized handlers  
    - Web search capabilities through SERP API  
    - Visual analysis using IDEFICS and GPT-4 models  
    - Markdown conversion for consistent text formatting  

  **4. Tool Integration:**  
    - Clear separation of responsibilities between tools  
    - Coordinated processing of different content types  
    - Structured response formatting  
    - Error handling for unsupported operations  
    - Memory maintenance across operations  

  **5. Content Processing:**  
    - Web content handled by browser tools  
    - Documents processed by text inspector  
    - Images analyzed by visual QA tools  
    - File type-specific conversion and handling  
    - Support for large document processing  

  This architecture enables systematic processing of complex queries involving multiple types of content while maintaining clear separation of concerns and coordinated information flow between components.
Pretty cool approach! Here's the gist of the full research agent trace if anyone is interested: https://gist.github.com/ai-christianson/43447275d5cc0966b1b6...

rvz

Of course an open source version of 'Deep Research' is available as predicted [0] in less than a month.

Open source is already at the finish line.

[0] https://news.ycombinator.com/item?id=42913379

null

[deleted]

tkellogg

it's just an example, but it's great to see smolagents in practice. I wonder how well the import whitelist approach works for code interpreter security.

tptacek

I know some of the point of this is running things locally, but for agent workflows like this some of this seems like a solved problem: just run it on a throwaway VM. There's lots of ways to do that quickly.

ATechGuy

VM is not the right abstraction because of performance and resource requirements. VMs are used because nothing exists that provides same or better isolation. Using a throwaway VM for each AI agent would be highly inefficient (think wasted compute and other resources, which is the opposite of what DeepSeek exemplified).

vineyardmike

Is “DeepSeek” going to be the new trendy way to say to not be wasteful? I don’t think DS is a good example here. Mostly because it’s a trendy thing, and the company still has $1B in capex spend to get there.

Firecracker has changed the nature of “VMs” into something cheap and easy to spin up and throw away while maintaining isolation. There’s no reason not to use it (besides complexity, I guess).

Besides, the entire rest of this is a python notebook. With headless browsers. Using LLMs. This is entirely setting silicon on fire. The overhead from a VM the least of the compute efficiency problems. Just hit a quick cloud API and run your python or browser automation in isolation and move on.

tptacek

To which performance and resource requirements are you referring? A cloud VM runs as long as the agent runs, then stops running.

cma

The rise of captchas on regular content, no longer just for posting content, could ruin this. Cloudflare and other companies have set things up to go through a few hand selected scrapers and only they will be able to offer AI browsing and research services.

lasermike026

I think I'm in love.