A PM's Guide to AI Agent Architecture

50 comments

·September 4, 2025

fny

I really don't understand how people given access to a pile of tools and data sources and unleash them on customers. It's horrible UX in my experience and at times worse than a phone tree.

My view is that you need to transition slowly and carefully to AI first customer support.

1. Know the scope of problems an AI can solve with high probability. Related prompt: "You can ONLY help with the following issues."

2. Escalate to a human immediately if its out of scope: "If you cannot help, escalate to a human immediately by CCing bob@smallbiz.co"

3. Have an "unlocked agent" that your customer service person can use to answer a question and evaluate how well the agent performs in helping. Use this to drive your development roadmap.

4. If the "unlocked agent" becomes good at solving a problem, add that to the in-scope solutions.

Finally, you should probably have some way to test existing conversations when you make changes. (It's on my TODO list)

I've implemented this for a few small businesses, and the process is so seamless that no one has suspected interaction with an AI. For one client, there's not even a visible escalation step: they get pinged on their phone and take over the chat!

risyachka

>> really don't understand how people given access to a pile of tools and data sources and unleash them on customers

It’s pretty simple. When a non-tech person sees faked demos of what it can do - it looks epic and everyone extrapolates results and thinks AI is that good.

small_scombrus

Doubly so if the person deciding what gets implemented doesn't really get what their staff actually do.

LLMs ability to give convincing sounding answers is like catnip for service desk managers who have never actually been on the desk itself

gabriel666smith

I MVP'd one of these (a basic sequence of LLM customer support 'agents') at my last job, I guess spring 2024. So much has changed since then!

'Routing through increasingly specialised agents' was my approach, and the only thing that would've done the job (in MVP form) at the time. There weren't many models that would fit our (v good) CS & Product teams' dataset of "probable queries from customers" into a single context window.

I never personally got my MVP beyond sitting with it beside the customer support inbox, talking to customers. And AFAIK it never moved beyond that after I left.

Nor should it have been, probably - there are (wild, & mostly ineffable) trade-offs that you make the moment you stop actually talking to users at the very moment they get in touch. I don't remember ever making a trade-off like that where it was worthwhile.

I _do_ remember it as perhaps the most worthwhile time I ever spent doing product-y work.

I say that because: To consider a customer support query type that might be 0.005% of all queries received by the CS team, even my trash MVP had to walk a path down a pretty intricate tree of agents and possible query types.

So - if you believe that 'solving the problems users have with your product' = 'making a better product'. then talking to an LLM that was an advocate for a tiny subset of users, and knew very intimately the details of their issue with your product, that felt really good. It felt like it was a very pure version of what _I_ should be to devs, as any kind of interface between them and our users.

It was very hard to stay a believer in the idea of a 'PM' after seeing that, at least. As a person who preferred to just let people get on with things.

I enjoyed the linked post; it's really interesting to see how far things have come. I'm surprised nobody has built 'talk to your customers at scale', yet - this feels like a far more interesting problem than 'avoid talking to your customers at scale'.

I'm also not surprised, I guess, since it's an incredibly bespoke job to do properly, I imagine, for most products.

gillesjacobs

Nice framing for PMs, but technically it is way too rosy. MCP is real but still full of low utility services and security issues, so “skills as plug-ins” is not production ready. A2A protocols were only just announced this year (Google, etc.) and actual inter-agent interoperability is still research grade, with debugging across agents being a nightmare. Orchestration layers (skills, workflows, multi-agent) look clean in diagrams but turn into brittle state machines under load. LLM “confidence scores” are basically uncalibrated logits dressed up as probabilities.

In short: nice industry roadmap, but we are nowhere near robust, trustworthy multi-agent systems yet.

gabriel666smith

The idea of giving an LLM with a tool any kind of control over an actual user's account remains (though you put this more elegantly) batshit insane to me.

Even assuming you've correctly auth'd the user contacting you (big assumption!), allowing that user to very literally prompt a 'semi-confident thing with tools' - however many layers of abstraction away the tool is - feels very, very far away from a real-world, sensible implementation right now.

Just shoot the tool prompts over to a human operator, if it's so necessary! Sense-check!

barbazoo

> Confidence calibration: When your agent says it's 60% confident, it should be right about 60% of the time. Not 90%, not 30%. Actual 60%.

With current technology (LLM), how can an agent ever be sure about its confidence?

esafak

I was about to say "Using calibrated models", then I found this interesting paper:

Calibrated Language Models Must Hallucinate

https://arxiv.org/abs/2311.14648

https://www.youtube.com/watch?v=cnoOjE_Xj5g

fumeux_fume

The author's inner PM comes out here and makes some wild claims. Calibration is something we can do with traditional, classification models, but not with most off-the-shelf LLMs. Even if you devised a way to determine if the LLM's confidence claim matched it's actual performance, you wouldn't be able to calibrate or tune it like you would a more traditional model.

dfsegoat

I'm typically pretty critical of PM oriented pieces, but I found this to be a decent overview of how to reason about building these systems from first principles + some of the non-tech pain points + how to address them.

jbmsf

As an engineer, I like this framework but can think of approximately zero PMs who could use it to build a product.

ricardobeat

What does the PM title even mean at this point? It's a bit surprising to see a deep dive into technical architecture - though there is massive value in understanding what's involved - as a PM responsibility, this is more in TPM (technical program manager) land which is a different job.

In my book they ideally focus on understanding scope, user needs and how to measure success, while implementation details such as orchestration strategies, evaluation and making sure your system delivers the capabilities you want in general, are engineering responsibilities.

charcircuit

This post does not do a deep dive into technical architecture.

MangoToupe

The PM's role is to whip devs until the requirements are met. That seems apt here. Even if the requirements make zero sense

null

[deleted]

null

[deleted]

null

[deleted]

cyberpunk

[flagged]

tene80i

There are bad PMs and good PMs, and bad engineers and good engineers. If you treat an entire profession with disdain, don’t be surprised if you get treated like that too.

mattmanser

I know you probably feel you're being fair, but you're not.

There's a dichotomy in development where bad PMs can prosper in a way bad engineers can't.

There's no skill test for PMs, unlike engineers. Bad PMs can look like good PMs to senior management simply because they hold tons of meetings, kiss ass, over promise or steal credit. Any of those bad traits can fool senior management. But those are bad PMs.

On top of that, when you have a bad PM, there's a good chance the Devs themselves will step into the role and still deliver a product.

The bad PM will still take credit, obviously. A bad PM is often circumvented instead of exposed.

Conversely the opposite doesn't work, a good PM + bad Devs turns into never ending dev cycles. The PM looks bad even though there's nothing he can really do, unless he can fire/hire. The good PM cannot circumvent bad engineers.

And in the end, to find bad engineers you can just look at their code. If you don't have the skill to do that, or don't employ someone you know that can, you probably shouldn't be in the software development business.

tene80i

Well sure, I never said they were equivalent in all respects. Just that you can have good and bad versions of both. For sure the failure modes are different.

I challenge the idea that there is no skill test for PMs, though - take a PM interview at a serious product company some day.

And the PM role is of course more than just delivery. If they dropped dead the product would still get shipped. But then what? Someone would need to talk to customers, dig into data and figure out the roadmap. Other people can do it, but in a sufficiently complex company you might as well get people who are good at it and want to devote their time to it.

I understand why some engineers don’t like PMs. But it is exactly the same reason as why some PMs (and C-suites) view engineers as fungible resources who waste time on abstractions instead of shipping, and pad estimates and refuse to discuss practical tradeoffs to move quicker - it’s an unfair generalisation based on bad experiences.

I just think more respect all around wouldn’t hurt.

ownagefool

PMs that can hire/fire are pretty common, but again how do they know who?

ashtonshears

This is silly; PM is a more broad role than SWE.

booleandilemma

Every PM I've ever met has been a loser who has failed in their primary desired career and then made the switch to project management. I have no respect for PMs whatsoever.

ramesh31

Stop trying to treat these things as more than they are. Stop trying to be clever. These models are the single most complex things ever created by humans; the summation of decades of research, trillions in capex, and the untold countless hours of thousands of people smarter than you and I. You will not meaningfully add to their capabilities with some hacked together reasoning workflows. Work within the confines of what they can actually do; anything else is complete delusion.

sixo

This is a nonsensical opinion by a person who doesn't know what they're talking about, and probably didn't read the article.

These models are tools, and LLM products bundles these tools with other tools, and 90% of UX amounts to bundling these well. The article here gives a great sense of what this takes.

dang

> This is a nonsensical opinion by a person who doesn't know what they're talking about, and probably didn't read the article.

Ok, but can you please make your substantive points without putting others down? Your comment wouold be fine without this bit.

https://news.ycombinator.com/newsguidelines.html

CuriouslyC

The AI bundling problem is over. The user interface problem is over. You won't need a UI for your apps in a few years, agents are going to drive _EVERYTHING_. If you want a display for some data, the agent will slap together a dashboard on the fly from a composable UI library that's easy to work with, all hot loaded and live-revised based on your needs.

bopbopbop7

You must be an easy person to market to.

tomrod

I won't use agents for everything. Why would I expect tasks to use agents for everything? This is like saying everything is on the web. No, there is substantial number of things on the web, but not everything.

ares623

The Juicero moment for software

anuramat

why would anyone want more non-determinism than absolutely necessary?

alehlopeh

Who maintains that UI library? Or does the AI create it on the fly too? Why even bother with a library at that point? Just do a bespoke implementation.

tomrod

I have a hard time determining if you are in support or critiquing the article. I'm 60% confident it is a critique (I jest, a play on the content :) ).

null

[deleted]

HN

A PM's Guide to AI Agent Architecture

A PM's Guide to AI Agent Architecture