New tools for building agents
138 comments
·March 11, 2025bob1029
cpfiffer
This is my sense too.
I haven't really found any agent framework that gives me anything I need above a simple structured gen call.
As you say, most requests to LLMs are (should be?) prompt-in structure-out, in line with the Unix philosophy of doing precisely one thing well.
Agent frameworks are simply too early. They are layers built to abstract a set of design patterns that are not common. We should only build abstractions when it is obvious that everyone is reinventing the wheel.
In the case of agents, there is no wheel to invent. It's all simple language model calls.
I commonly use the phrase "the language model should be the most boring part of your code". You should be spending most of your time building the actual software and tooling -- LLMs are a small component of your software. Agent frameworks often make the language model too large a character in your codebase, at least for my tastes.
sippeangelo
I mirror this sentiment. Even their "function calling" abstraction still hallucinates parameters and schema, and the JSON schema itself is clearly way too verbose and breaks down completely if you feed it anything more complex than 5 very simple function calls. This just seems to build upon their already broken black box abstractions and isn't useful for any real world applications, but it's helpful for getting small proof-of-concept apps going, I guess...
swyx
> Even their "function calling" abstraction still hallucinates parameters and schema
huh? sample code please? this should not be true since Structured Outputs came out - literally prevented from generating invalid json
amitness
It's not enabled by default for their function calling API. So, hallucination is possible.
You have to set 'strict' to True manually to use the same grammar-based sampling they use for structured outputs.
https://platform.openai.com/docs/guides/function-calling?api...
null
Androider
Exactly. You would have to be naive to build a company on top of this kind of API. LLMs are going to be become commodities, and this is OpenAI fighting against that fate as their valuation and continued investment requirements doesn't make any sense otherwise.
If you built on the Assistant API, maybe take the hint and don't just rewrite to the Responses API? Own your product, black box the LLM-of-the-day.
bob1029
> OpenAI fighting against that fate as their valuation and continued investment requirements doesn't make any sense otherwise.
Is it actually the case that OpenAI couldn't be viable if all they offered was a simple chat completion API on top of the web experience?
It seems to me the devil is all in how the margin plays out. I'd focus on driving down costs and pushing boundaries on foundation models. If you are always a half step ahead, highly reliable and reasonably cheap, your competitors will have a tough time. Valuations can be justified if businesses begin to trust the roadmap and stability of the services.
I'll tell you what's not working right now is the insane model naming scheme and rapid fire vision changes. This kind of stuff is spooking the technology leaders of large prospective customers. Only the most permanently online people can keep things straight. Everyone was super excited and on board with AI in 2024 because who wants to be left out. I think that energy is still justified in many ways, but we've also got to find a way to meet more of the customer base where they are currently at. Wrappers and agentic SDKs are not what these people are looking for. Many F500s already have a gigantic development team who can deal with deep, nasty API integrations and related state contraptions. They're looking for assurances/evidence that OAI's business & product line will remain stable for the next 5+ years before going all-in.
jjfoooo4
The point of the bear thesis on OpenAI is that training frontier models is extraordinarily expensive. They can’t produce cutting edge models, charge a cheap price, and make a profit all at once
ozim
Looking at all the „AI specialists” that popped up recently- I have feeling there is enough naivety out there for it to work.
daviding
This bit feels like we are being pushed away from the existing API for non-technical reasons?
> When using Chat Completions, the model always retrieves information from the web before responding to your query. To use web_search_preview as a tool that models like gpt-4o and gpt-4o-mini invoke only when necessary, switch to using the Responses API.
Porting over to the new Responses API is non-trivial, and we already have history, RAG and other things an assistant needs already.
jjfoooo4
From their perspective, if they don’t have your data, it’s too easy to switch providers.
isoprophlex
Exactly this is what's going on. Moat-building.
sagarpatil
Couldn’t have said it better. I’ve developed multiple agents with just function calling, structured outputs and they have been in production for more than a year (back in the day we did not call it agent lol) I think these is targeted towards people who are already using agent frameworks + OpenAI API.
BonoboIO
What are the agents doing for you? Just interested in your actual use cases.
malthaus
outsourcing state to openai & co is great for them as vendor lock-in. the real money in AI will be business- and user-interfacing tools built on top of the vendors and it would be a terrible business decision to not abstract away from the model provider in the background and keep all private data under your domain, also from a data protection / legal point of view
i can understand them trying to prevent their business from becoming a commodity but i don't see that working out for them except for some short term buzz, but others will run with their ideas in domain specific applications
ripped_britches
100000000000%
Don’t be fooled by moving state management to somewhere other than your business logic unless it enables a novel use case (which these SDKs do not)
With that said, glad to see the agentic endpoints available but still going to be managing my state this way
edoceo
I just use OpenAI to help me build these "necessary" patterns against their own API. Why make me use some framework when the AI is the framework?
simonw
There's a really good thread on Twitter from the designer of the new APIs going into the background behind many of the design decisions: https://twitter.com/athyuttamre/status/1899541471532867821
Here's the alternative link for people who aren't signed in to Twitter: https://nitter.net/athyuttamre/status/1899541471532867821
bradyriddle
The nitter link is appreciated!
ElijahLynn
TIL about Nitter, so grateful as I have Twitter blocked on my computer and phone.
mrtksn
I feel like all those AI agent attempts are misguided at their core because they don't attempt to create new ways but replace humans on the legacy systems. This is fundamentally shortsighted because the economy, life and everything is about humans interacting with humans.
The current AI agent approach appears to be permutations of the joke about how people will make AI to expand their once sentence to a long nice e-mail and the AI on the receiving end will summarize that long e-mail into single sentence.
I get that there's a use case for automating tasks on legacy systems but IMHO the real opportunity is the opportunity to remove most of the legacy systems.
Humans are not that bad you know? Is it creating UIs for humans using AI then make AI use these UI to do stuff really the way forward?
NitpickLawyer
> because the economy, life and everything is about humans interacting with humans.
How many hand crafted, clay bowls, baked in a human powered kiln are you using everyday? Or how many weaved baskets, made out of hand picked sticks?
History has showed that anything that can be automated, will be automated. And everything that can be made "cheaper" or "faster" will as well.
mrtksn
That's not the point though, I'm not anti-automation or anything like that. The point is, using robots on interfaces and systems made for people is not the way to go.
Why would you want to have your swipes on Tinder and your trip planning to Rio be automated through human interface? If it was for legit reasons it would have happened as from machine to machine communications. I'm big fan of the AI agent concept, my objection is that at in its current state people don't think out of the box and propose using the current infrastructure to delegate human functions instead of re-imagining the new world that is possible when working together with AI.
NitpickLawyer
> my objection is that at in its current state people don't think out of the box and propose using the current infrastructure to delegate human functions instead of re-imagining the new world that is possible when working together with AI.
Ah, my bad I missread your initial post.
If I now understand what you're saying, I think there's a paralel in manufacturing, where "custom made bots" on assembly line will win against "humanoid bots" every time. The problem there is that you have to first build the custom-made bots, and they only work on that one task. While a "humanoid" bot can, in theory, do more general things with tools already in place for humans.
I think specialised APIs and stuff will eventually be built for AI agents. But in the meantime everyone wants to be first to market, and the "human facing" UI/UX is all we have. So they're trying to make it work with what's available.
bob1029
I think the most valuable path for the current generation of AI models is integrating them with the configuration and administration side of the product.
For example, as a supplemental user experience that power users in your org can leverage to macro out client configuration and project management tasks in a B2B SaaS ecosystem. Tool use can be very reliable when you have a well constrained set of abstractions, contexts and users to work with.
zellyn
Notably not mentioned: Model Context Protocol https://www.anthropic.com/news/model-context-protocol
nilslice
not implementing doesn't mean its not supported https://github.com/dylibso/mcpx-openai-node (this is for mcp.run tool calling with OpenAI models, not generic)
but yes, it's the strongest anti-developer move to not directly support MCP. not surprised given OpenAI generally. but would be a very nice addition!
thenameless7741
it's mentioned in the main thread: https://nitter.net/athyuttamre/status/1899511569274347908
> [Q] Does the Agents SDK support MCP connections? So can we easily give certain agents tools via MCP client server connections?
> [A] You're able to define any tools you want, so you could implement MCP tools via function calling
in short, we need to do some plumbing work.
relevant issue in the repo: https://github.com/openai/openai-agents-python/issues/23
dgellow
Do you have experience with MCP? If yes, what do you think of it?
smcleod
It's great! Easy to work with, makes it quick to build tools and isnt over complicated.
singularity2001
not OP but giving Claude access to local files / emails / database / terminal was … futuristic! (until I hit their stupid request limit)
consumer451
I have been using Windsurf+Sonnet for a couple months, and recently adding Supabase MCP was a total game changer for velocity. I can't believe I waited so long to configure that.
Querying schema from prompt is great, but also being able to say "I cannot see the Create Project button on the projects list screen. Use MCP to see if user with email me@domain.com has the appropriate permissions" is just amazing.
knowaveragejoe
You can (somewhat) bridge between them:
esafak
How do they compare?
cowpig
MCP is a protocol, and Anthropic has provided SDKs for implementing that protocol. In practice, I find the MCP protocol to be pretty great, but it leaves basically everything except the model parts out. I.e. MCP really only addresses how "agentic" systems interact with one another, nothing else.
This SDK is trying to provide a bunch of code for implementing specific agent codebases. There are a bunch of open source ones already, so this is OpenAI throwing their hat in the ring.
IMO this OpenAI release is kind of ecosystem-hostile in that they are directly competing with their users, in the same way that the GPT apps were.
TeeWEE
Correction: the MCP is a protocol for function calling and getting context into a model. It’s can run locally or over a network.
It does not specify how “agentic” systems interact with each other. Depending on what you mean there.
esafak
Thank you. Which open source ones do you recommend?
swyx
swyx here. we got some preview and time with the API/DX team to ask FAQs about all the new APIs.
https://latent.space/p/openai-agents-platform
main fun part - since responses are stored for free by default now, how can we abuse the Responses API as a database :)
other fun qtns that a HN crew might enjoy:
- hparams for websearch - depth/breadth of search for making your own DIY Deep Research
- now that OAI is offering RAG/reranking out of the box as part of the Responses API, when should you build your own RAG? (i basically think somebody needs to benchmark the RAG capabilities of the Files API now, because the community impression has not really updated from back when Assistants API was first launched)
- whats the diff between Agents SDK and OAI Swarm? (basically types, tracing, pluggable LLMs)
- will the `search-preview` and `computer-use-preview` finetunes be merged into GPT5?
mritchie712
for anyone that likes the Agents SDK, but doesn't want their framework attached to OpenAI, we're really liking PydanticAI[0].
darkteflon
There’s also HF’s smolagents[1].
fullstackwife
Openai SDK docs:
> Notably, our SDK is compatible with any model providers that support the OpenAI Chat Completions API format.
so you can use with everything, not only OpenAI?
swyx
yea they mention this on the pod
DrBenCarson
Yes
startupsfail
Does it encode everything as json object, so special characters are getting escaped?
I’ve noticed that with longer responses (particularly involving latex), models are a lot less accurate when the results need to be additionally encoded into JSON.
I like structured, but my preference is yaml/markdown, as it is a lot more readable (and the only thing that works with longer responses, latex or code generation).
ggnore7452
appreciate the question on hparams for websearch!
one of the main reasons i build these ai search tools from scratch is that i can fully control the depth and breadth (and also customize loader to whatever data/sites). and currently the web search isn't very transparent on what sites they do not have full text or just use snippets.
having computer use + websearch is definitely something very powerful (openai's deep research essentially)
dazzaji
I was fortunate to get early access to the new Agent SDK and APIs that OpenAI dropped today and made an open source project to show some of the capabilities [1]. If you are using any of the other agent frameworks like LangGraph/LangChain, AutoGen, Crew, etc I definitely suggest giving this agent SDK a spin.
To ease into it, I added the entire SDK with examples and full documentation as a single text file in my repo [2] so you can quickly get up to speed be adding it to a prompt and just asking about it or getting some quick start code to play around with.
The code in my repo is very modular so you can try implementing any module using one of the other frameworks to do a head-to-head.
Here’s a blog post with some more thoughts on this SDK [3] and some if its major capabilities.
I’m liking it. A lot!
[1] https://github.com/dazzaji/agento6
[2] https://raw.githubusercontent.com/dazzaji/agento6/refs/heads...
[3] https://www.dazzagreenwood.com/p/unleashing-creativity-with-...
anorak27
I have built myself a much simpler and powerful version of the responses API and it works with all LLM providers.
grvdrm
Thank you for your detailed Readme. A relief / joy to read compared to many other libraries/etc. that provide one basic (if that) example and otherwise leave to your own trial/error.
tiniuclx
At $30 per 1k search queries, the OpenAI search API seems very expensive. Perplexity's Sonar model charges just $5 per thousand searches [0].
I wonder what justifies this drastic difference in price.
falcor84
I'm impressed by the advances in Computer Use mentioned here and this got me wondering - is this already mature enough to be utilized for usability testing? Would I be right to assume that in general, a UI that is more difficult for AI to navigate is likely to also be relatively difficult for humans, and that it's a signal that it should be simplified/improved in some way?
m3t4man
Why would you assume that? Modality of engagement is drastically different between the way LLM engages with UI vs human being
falcor84
Oh, I had assumed that it was trained on human interaction data and should be generally similar, and from the examples I saw - it generally was (although still not as good as us). In what sense do you expect it to be drastically different?
jumploops
> “we plan to formally announce the deprecation of the Assistants API with a target sunset date in mid-2026.”
The new Responses API is a step in the right direction, especially with the built-in “handoff” functionality.
For agentic use cases, the new API still feels a bit limited, as there’s a lack of formal “guardrails”/state machine logic built in.
> “Our goal is to give developers a seamless platform experience for building agents”
It will be interesting to see how they move towards this platform, my guess is that we’ll see a graph-based control flow in the coming months.
Now there are countless open-source solutions for this, but most of them fall short and/or add unnecessary obfuscation/complexity.
We’ve been able to build our agentic flows using a combination of tool calling and JSON responses, but there’s still a missing higher order component that no one seems to have cracked yet.
ilaksh
The Agents SDK they linked to comes up 404.
BTW I have something somewhat similar to some of this like Responses and File Search in MindRoot by using the task API: https://github.com/runvnc/mindroot/blob/main/api.md
Which could be combined with the query_kb tool from the mr_kb plugin (in my mr_kb repo) which is actually probably better than File Search because it allows searching multiple KBs.
Anyway, if anyone wants to help with my program, create a plugin on PR, or anything, feel free to connect on GitHub, email or Discord/Telegram (runvnc).
yablak
Loads fine for me. Maybe because I'm logged in?
IncreasePosts
That should be a 403 then. Tsk tsk open ai
29ebJCyy
Technically it should be a 401. Tsk tsk IncreasePosts.
edwinarbus
sorry, fixed link: https://openai.github.io/openai-agents-python/
I don't know how much this API churn is going to help developers who are trying to integrate OAI into real, actual, non-wrapper products. Every vendor-managed state machine that handles conversation, messages, prompt hand-off, etc., has ultimately proven inadequate, presumptive or distracting for my use cases.
At the end of the day, all I ever seem to use is the chat completion API with structured outputs turned on. Despite my "basic" usage, I am employing tool use, recursive conversations, RAG, etc. I don't see the value in outsourcing state management of my "agent" to a 3rd party. I have way more autonomy if I keep things like this local.
The entire premise of these products is that you are feeding a string literal into some black box and it gives you a new string. Hopefully, as JSON or whatever you requested. If you focus just on the idea of composing the appropriate string each time, everything else melts away. This is the only grain that really matters. Think about other ways in which we compose highly-structured strings based upon business state stored in a database. It's literally the exact same thing you do when you SSR a webpage with PHP. The only real difference is how it is served.