Skip to content(if available)orjump to list(if available)

I'm betting against AI agents in 2025, despite building them

snappr021

The alternative is building Functional Intelligence process flows from the ground up on a foundation of established truth?

If 50% of training data is not factually accurate, this needs to be weeded out.

Some industries require a first principles approach, and there are optimal process flows that lead to accurate and predictable results. These need research and implementation by man and machine.

infecto

Link does not work for me but as someone who does a lot of work with LLMs I am also betting against agents.

Agents have captivated the minds of groups of people in each large engineering org. I have no idea what their goal is other then they work on “GenAI”. For over a year now they have been working on agents with the promise that the next framework that MSFT or Alphabet publishes will solve their woes. They don’t actually know what they are solving for except everything involves agents.

I have yet to see agents solve anything but for some reason this idea that having an agent that you can send anything and everything will solve all problems for the company. LLMs have a ton of interesting applications but agents have yet to grasp me as interesting, I also don’t understand why so many large companies have focused time around it. They are not going to be cracking the code ahead of a commercial tool or open source project. In the time spent toying around with agents there are a lot of interesting applications that could have built, some of which may be technically an agent but without so much focus and effort on trying to solve for all use cases.

johnisgood

I have no idea what agents are for, could be my own ignorance.

That said, I have been using LLMs for a while now with great benefit. I did not notice anything missing, and I am not sure what agents bring to the table. Do you know?

mhog_hn

An agent is an LLM + a tool call loop - it is quite a step up in terms of value in my experience

infecto

Not a disagreement with you but wanted to further clarify.

I do think it’s a step up when done correctly. Thinking of tools like Cursor. Most of my concern and issue comes from the amount of folks I have seen trying to great a system that solves everything. I know in my org people were working on Agents without even a problem they were solving for. They are effectively trying to recreate ChatGPT which to me is a fools errand.

johnisgood

What is the use case? What does it solve exactly, or what practical value does it give you? I am not sure what a tool call loop is.

JKCalhoun

Link is working for me — perhaps it was not 30 minutes ago? (Safari, MacOS)

Simon_O_Rourke

Don't tell management about this, as they're all betting the house on AI agents next year.

pmg101

Only one of these outcomes will be correct, so worth putting money on it if you think they're wrong a la The Big Short.

ptero

While true, the world doesn't end in 2025. While I would also agree that big financial benefits from agents to companies appear unlikely to arrive this year (and the title specifically mentions 2025) I would bet on agents becoming a disruptive technology in the next 5-10 years. My 2c.

Quarrelsome

shorting only works if people realise it when you do. c-suite will run out of make up before admitting its a pig because the pay off is huge for them. I reckon agentic dev can function "just enough" to allow them to delay the reality for a bit while they fire more of their engineering team.

I don't think this one is worth shorting because there's no specific event to trigger the mindshare to start moving and validating your position.

DavidPiper

Not OP, but I've been thinking about this and concluded it's not quite so clear-cut. If I was going to go down this path, I think I would bet on competitors, rather than against incumbents.

My thinking: In a financial system collapse (a la The Big Short), the assets under analysis are themselves the things of value. Whereas betting on AI to collapse a technology business is at least one step removed from actual valuation, even assuming:

1. AI Agents do deliver just enough, and stay around long enough, for big corporations to lay off large number of employees

2. After doing so, AI quickly becomes prohibitively expensive for the business

3. The combination of the above factors tank business productivity

In the event of a perfect black swan, the trouble is that it's not actually clear that this combination of factors would result in concrete valuation drops. The business just "doesn't ship as much" or "ships more slowly". This is bad, but it's only really bad if you have competitors that can genuinely capitalise on that stall.

An example immediately on-hand: for non-AI reasons, the latest rumors are that Apple's next round of Macbook Pros will be delayed. This sucks. But isn't particularly damaging to the company's stock price because there isn't really a competitor in the market that can capitalise on that delay in a meaningful way.

Similarly, I couldn't really tell you what the most recent non-AI software features shipped by Netflix or Facebook or X actually were. How would I know if they're struggling internally and have stopped shipping features because AI is too expensive and all their devs were laid off?

I guess if you're looking for a severe black swan to bet against AI Agents in general, you'd need to find a company that was so entrenched and so completely committed to and dependent on AI that they could not financially survive a shock like that AND they're in a space where competitors will immediately seize advantage.

Don't get me wrong though, even if there's no opportunity to actually bet against that situation, it will still suck for literally everyone if it eventuates.

conartist6

If you want to bet on a competitor, let's talk cause I'm your guy. While everyone else was looking the other way, I stole home: https://github.com/bablr-lang

danieltanfh95

Same. https://danieltan.weblog.lol/2025/06/agentic-ai-is-a-bubble-...

The fundamental difference is we need HITL to reduce errors instead of HOTL which leads to the errors you mentioned

paradite

This is obviously AI generated, if that matters.

And I have an AI workflow that generates much better posts than this.

Retr0id

I think it's just written by someone who reads a lot of LLM output - lots of lists with bolded prefixes. Maybe there was some AI-assistance (or a lot), but I didn't get the impression that it was AI-generated as a whole.

paradite

"Hard truth" and "reality check" in the same post is dead giveaway.

I read and generate hundreds of posts every month. I have to read books on writing to keep myself sane and not sound like an AI.

squigglydonut

Absolutely! And you're right to think that. Here's why...

Retr0id

True, the graphs are also wonky - the curves don't match the supposed math.

delis-thumbs-7e

I wonder why a person from Bombay India might use AI to aid with an English language blog post…

Perhaps more interesting is whether their argument is valid and whether their math is correct.

jrexilius

The thing that sucks about it is maybe his english is bad (not his native language) so he relies on LLM output for his posts. Im inclined to cut people slack for this. But the rub is that it is indistinguishable from spam/slop generated for marketing/ads/whatever.

Or it's possible that he is one of those people that _realy_ adopted LLMs into _all_ their workflow, I guess, and he thinks the output is good enough as is, because it captured his general points?

LLMs have certainly damaged trust in general internet reading now, that's for sure.

null

[deleted]

KoolKat23

Human multi-step workflows tend to have checkpoints where the work is validated before proceeding further, as humans generally aren't 99%+ accurate either.

I'd imagine future agents will include training to design these checks into any output, validating against the checks before proceeding further. They may even include some minor risk assessment beforehand, such as "this aspect is crucial and needs to be 99% correct before proceeding further".

a_bonobo

That's what Claude Code does - it constantly stops and asks you whether you want to proceed, including showing you the suggested changes before they're implemented. Helps with avoiding token waste and 'bad' work.

KoolKat23

thats good to hear, theyre on their way there!

on a personal note, I'm happy to hear that. I've been apprehensive and haven't tried it, purely due to my fear of the cost.

csomar

Lots of applications have to be redesigned around that. My guess is that micro-services architecture will see a renaissance since it plays well with LLMs.

Retr0id

> Each new interaction requires processing ALL previous context

I was under the impression that some kind of caching mechanism existed to mitigate this

blackbear_

You have to compute attention between all pairs of tokens at each step, making the naive implementation O(N^3). This is optimized by caching the previous attention values, so that for each step you only need to compute attention between the new token and all previous ones. That's much better but still O(N^2) to generate a sequence of N tokens.

ilaksh

Yes, prompt caching helps a lot with the cost. It still adds up if you have some tool outputs with long text. I have found that breaking those out into subtasks makes the overall cost much more reasonable.

_heimdall

Caching would only help to keep the context around, but caching would only be needed if it still ultimately needs to read and process that cached context again.

Retr0id

You can cache the whole inference state, no?

They don't go into implementation details but Gemini docs say you get a 75% discount if there's a context-cache hit: https://cloud.google.com/vertex-ai/generative-ai/docs/contex...

_heimdall

It that just avoids having to send the full context for follow-up requests, right? My understanding is that caching helps to keep the context around but can't avoid the need to process that context over and over during inference.

stpedgwdgfhgdd

Compact the conversation (CC)

csomar

My understanding is that caching reduce computation but the whole input is still processed. I don’t think is fully disclosing how their cache works.

LLMs degrade with long input regardless of caching.

null

[deleted]

digitcatphd

I’m sure most of the problems cited in this article will be easily solved within the next five years or so, waiting for perfection and doing nothing won’t pay dividends

cmsefton

2015? The title should be 2025.

Xmd5a

>A database query might return 10,000 rows, but the agent only needs to know "query succeeded, 10k results, here are the first 5." Designing these abstractions is an art.

It seems the author never used prompt/workflow optimization techniques.

LLM-AutoDiff: Auto-Differentiate Any LLM Workflow https://arxiv.org/pdf/2501.16673

raincole

> In a Nutshell

> AI tools aren't perfect yet. They sometimes make mistakes, and they can't always understand what you are trying to do. But they're getting better all the time, In the future, they will be more powerful and helpful. They'll be able to understand your code even better, and they'll be able to generate even more creative ideas.

From another post on the same site. [0]

Yep, slop.

[0]: https://utkarshkanwat.com/writing/review-of-coding-tools/