Ask HN: Are there any real examples of AI agents doing work?

80 comments

·January 8, 2025

2025 is the year of agents. I’ve heard about SDR AI agents but not great things. Most “agents” sound like workflow automations that have been around forever. Anyone have an example of an “ai” agent which I understand to be intelligent that isn’t a glorified or rebranded workflow automation? Thx.

Visit

chevman

Been in BigCo land for 20 years now, and have seen the rise and fall of quite a few AI/ML/RPA etc fads.

Honestly the whole landscape seems broken and unproductive at this point.

Countless vendors, platforms, cloud environments, industry/technical jargon - all with different pricing models, SLAs, tooling, etc etc.

Getting anything usable is a challenge and most orgs spin in a never ending cycle of data integration/normalization work that produces little business value.

My advice to teams now is simplify, reduce, streamline - get to the kernel of what you think you need and protect it all costs. Most of the shiny new objects being pitched as silver bullets are just ways for other people to make money off your margin.

oraphalous

I too would like to hear some examples.

On the one hand you have gurus claiming that AI agents are going to all make all SaaS redundant, on the other claiming that AI isn't going to take my coding job, but I need to adapt my workflows to incorporate AI. We all need to start preparing now for the changes that AI is going to cause.

But these two claims aren't compatible. If AGI and these super agents are that bonkers amazeballs that they can replace entire SaaS companies - then there is no way I'm going to be able to adapt my workflows to compete as a programmer.

Further, if the wildest claims about AI end up proving to be true - there is simply no way to prepare. What possible adaptation to my workflow could I possibly come up with that an AI agent could not surpass? Why should I bother learning how to implement (with today's apis) some RAG setup for a SaaS customer service chatbot when presumably an AI agent is going to make that skillset redundant shortly after?

I'm going to be interviewing for frontend roles soon, and for my prep I'm just going back to basics and making sure I remember on demand all the basics css, html, js/ts - fuck the rest of this noise.

fhd2

Programmers don't work in isolation. So I don't know how necessary it would be to quickly adapt your workflows to compete. If there's something that's useful to adopt, there will be a stream of blog posts, coworkers, people at user groups and what not spoon feeding what they learned to others. I don't think there's much cause for FOMO, I don't think it makes a big difference whether you start using a faster way to work a few months earlier or later than others. It can be cheaper to not jump on any hype train and potentially miss out on genuine improvements for a while, than to jump on all the hype trains and waste a lot of time on stuff that goes nowhere.

And like you said, if the wildest claims hold true, all programmers are out of a job by the end of 2026 anyway, with all other jobs following over the course of a few years. There's too many variables to predict what would happen in such a scenario, so probably best to deal with it if it happens.

So to me, your strategy checks out. I've personally invested some time into code generating and agentic tooling, but ultimately went back to Claude-as-Google-replacement. By my estimation, about a 5-10 % productivity boost compared to my workflow in 2022. The work is about the same, I just learn a bit faster.

lolinder

> And like you said, if the wildest claims hold true, all programmers are our of a job by the end of 2026 anyway, with all other jobs following over the course of a few years. There's too many variables to predict what would happen in such a scenario, so probably best to deal with it if it happens.

So much this. AGI is the equivalent of a nuclear apocalypse in many ways—it's unlikely, not unlikely enough for comfort, but also totally not worth preparing for because there's basically no way to predict what preparations would actually be helpful, nor is it obvious that you'd even want to survive it if it happened.

The expected value of prepping for it isn't worth the investment, so it's better to do what most of us already do for nuclear war and pretty much pretend it won't happen.

hnthrow90348765

I need an AI agent to continuously ask questions of PMs or stakeholders until the requirements are less vague. The good thing is this would be a plain english discussion which LLMs are good at. A PM can ask if something is technically feasible to some degree too. Maybe it can even break up tickets in a much better fashion too.

ako

I’m a pm, today I built a working mockup with windsurf (golang + wails + vuejs +duckd). Windsurf uses codeium, branded as the first agentic IDE.

Your requirements will improve, not sure if in the long I still need developers to build the actual software.

The development process with windsurf is a bit like throwing a dice, hoping for a 6. A lot of trial and error, but if you check the git log, you see about 15 minutes between commit per feature request. Windsurf does a good job to summarize the entire feature request chat into a short git commit message. Every git commit reads like a user story.

jondwillis

How… do I find PMs like you? Literally have never worked with a single one that bothered to understand the technology they are building on top of at a deep enough level.

Maybe I just need to teach the ones I work with that it is now possible to trivially prototype many ideas without much or any coding skill.

tommy_axle

As a PM if you're gathering requirements and building too have a look at genatron.ai

ceejayoz

> I need an AI agent to continuously ask questions of PMs or stakeholders until the requirements are less vague.

They’ll just get mad at the AI and tell it to stop asking so many questions. As they already do to humans.

whamlastxmas

As a PM, ChatGPT is great at helping me write tickets in a structured format from me just giving it a single sloppy sentence. I of course review it to make sure it’s understanding me properly. But having to explicitly write stuff like intended behaviors when submitting bugs can be really laborious, though I understand why engineers sometimes need that level of clarity (having been one myself for 15 years)

A4ET8a8uTh0_v2

I have not seen one in production, but I did see 'agent products' sold to financial companies for compliance purposes ( sanctions, mortgage, other regs ). Fascinating stuff that got me mildly interested in MS troupe.

_sword

Could you name any products?

A4ET8a8uTh0_v2

Not by name ( edit: and in corporate product names seem to change a lot from where I sit ) but every bigger consulting company/vendor[2] that works with banks/brokers/financial institutions right now seems to have at least some offering in that space to ride ai wave. The presentation I saw specifically from Crowe[1].

[1]https://www.crowe.com/ae/-/media/crowe/firms/middle-east-and... [2]https://www.lexisnexis.com/community/insights/legal/b/though...

readyplayernull

I like this distinction from automation by Bartosz Pucek:

At its core, an Agent is software that can:

    Take in a task description

    Break it down into steps

    Execute those steps using available tools

    Adapt its approach based on feedback

The key distinction from traditional automation: Agents handle variance and uncertainty by replanning rather than failing when their happy path breaks.

Source: https://newsletter.pucek.com/p/2025-the-state-of-ai-agents-a...

alper

This seems impossibly broad. Realistically an agent would only be able to take tasks for whatever limited domain it can execute on.

So either every approach it does has to be hard coded or it would be able to use a bunch of very generic modules to plan and execute an approach.

davidgerard

so is there a real checkable example of one that's doing work?

ayush2390

We are working on a project Potpie (https://github.com/potpie-ai/potpie). It's basically an open-source AI agentic platform that helps the developers to build AI Agents that truly understand your complex codebases and performed desired actions. Unlike majority of the AI Agent platforms and GenAI models, Potpie's AI Agents can understand the overall context of your codebase thoroughly

th0ma5

Amazing that at nearing the 50 comment mark and there only seems to be people who have successfully created tutorial examples? And some other things that could be done with more purpose specific traditional solutions. And some people showing love for the concepts. This is probably the bleakest I've seen a Ask HN thread considering this is where all the money is going. I think one stark thing that maybe isn't being addressed is that the value of the models is being completely controlled by the model creators or else there would be at least one story by now of success that doesn't involve merely making the LLM products available to customers as a middle entity.

monsieurgaufre

I share your impression. For something that is hyped that much, it does not seem to have much real world use.

p4bl0

> This is probably the bleakest I've seen a Ask HN thread

There was a similar one asking about blockchains solving actual problems. There too the answer was essentially: nope.

Update: there it is: https://news.ycombinator.com/item?id=22914430

kodablah

What's wrong with an agent being a glorified workflow? At Temporal (where I work), it seems plenty natural for agentic AI to just be AI worklows. Here's a video we put out this week demonstrating it: https://youtu.be/GEXllEH2XiQ (code at https://github.com/steveandroulakis/temporal-ai-agent).

lolinder

This is what OP is explicitly not asking for—it's just a demo of a theoretical case, Temporal showing how a company that's hyped up on AI agents could use your platform to do agent-y things.

OP wants to know if anyone is actually using this stuff productively, not if anyone has tech demos. We've all seen more than enough tech demos.

kodablah

I fear I'll come off as a shill, but I've seen dozens of company uses of AI in workflows in the real world. Agents are just orchestrating multiple AI steps basically (granted not all of them are using AI to _pick_ the step to take which is often what "agentic" is seen as). Some are listed at https://temporal.io/in-use alongside the many non-AI things, e.g. https://temporal.io/resources/case-studies/bugcrowd, https://temporal.io/resources/on-demand/arc-xp-washington-po..., https://temporal.io/resources/on-demand/practical-tactical-a..., and more and more. All those companies use AI workflows in real world cases, and there are many more. I only showed the tutorial to agree with OP that it is just workflows with fuzzier steps and that's ok.

lcrmorin

Perfect exemple of what op is asking. This is just a demo. What problems does it solve for you or your clients so that you make money ?

kodablah

Replied to another in this thread, but basically https://temporal.io/in-use lists many, AI and not.

theptip

This is interesting stuff, and a great stepping stone. I think the excitement around true agents will come when the AI can author the workflow pipeline, so to speak, in response to a request.

This is an area where terminology is in flux but I think of weak agents as mostly-hardcoded, eg if you wrote a flight booking bot that can converse with you about flight options then go do the booking - but you specified the APIs and workflow engine. Strong agents can self-directedly follow long range goals over long time frames, eg “run this business unit for me” or “manage my portfolio”.

jncfhnb

Most “agents” are just starter prompts + a small set of tools that they can use to respond to things, like access to a database.

They’re workflow automation

dimitri-vs

IMO:

Workflows = automations that use LLMs (or a sequence of LLM calls) at some point. Eg. "Classify this input and respond with JSON"

Is Siri or Google Assistant an "agent"? I would say no it's basically an LLM with function calling. Eg. "What's the weather -> uses predefined weather api"

Agents would need to be able to self integrate, which is impossible without giving them full computer use or admin permissions - which creates massive security risks nobody seems to be talking about.

viraptor

By that definition, humans doing tasks are workflow automation.

jncfhnb

Humans are autonomous and not task specific

viraptor

Most people in large work environments are very task specific and deviation from playbook is punished.

varelaseb

What else do you want them to be?

jncfhnb

I’d like it if we could trust agents to have agency but in practice they dont

breckenedge

Here’s a a talk from a month ago that covers a few use cases that are definitely beyond simple glorified workflow automation.

https://youtu.be/SpKtpW9TGF0?si=TRE6o7FfzCmhBuZq

smt88

This is a demo. OP is asking for examples of usage in production, where the agent is actually doing work.

It's also not really what people are promising with "agentic" because there's a human prompting and assisting it the entire time.

neom

The name for me is less important so much as can I have something that does my work for me. I've been starting to play with my own solutions between the 3 foundational modal companies. I've started to try to build my own stuff a bit, I think I need to learn more about apple scripting, also so far my experiments have required me to have multiple systems running to make it super easy for me.

You're all going to laugh at this stuff because it's so remedial and also clearly not agents but a couple things I've done... I won't say I really USE this stuff daily, I just play to see what I can do. I've figured out how to pass screenshots back and forth between modals (I have one computer take a screenshot every 30 minutes, and then send that screenshot to another machine, that machine is set up with a mouse hovering over the upload button on perplexity, it uploads the screenshot, and then perplexity does the work from the screenshot) An example of this that worked ok was I had chatgpt create all the themes for the social media schedule I needed to do this year, then I passed that screenshot to perplexity to do the searching on the web, and then I passed that to claud to write the tweet. This actually works ok-ish and I'm going to expand it a bit over the coming weeks I guess. Things like this are super helpful for weird hacks like that: https://github.com/BlueM/cliclick

Another thing I've found actually works pretty well is setting up two computers next to each other with ChatGPT voice mode, if you give them custom instructions to be sure to wait for the other one to be done talking, they don't interrupt each other and can get quite a bit of work done. Here is just a video of the mvp that I sent to a friend ages ago once I started playing with the idea: https://s.h4x.club/kpuzNkNL - I actually use this method of working quite often now, couple times a week at least, I find it's pretty helpful. If I knew how to put 4/5 modals together in one app and give them each custom instructions, I'd love to try building a team (if someone out there actually knows how to build this kinda stuff, I'm happy to help flesh out how the product would need to work, but I don't think it's super difficult to build at this point, I'm just not technical enough)

neom

Just an update here, I forgot I'm supposed to have childlike wonder and it's the weekend but then I remembered...sooo.... 4hours later I now have a complete marketing department of agents, it works pretty well actually. I gave it a high level task around building a full campaign, and it is. Here is the social media manager agent off on it's own composing the tweets, the social media manager agent is build with 4 internal agents, but calls out to my hackernews agent and my google search agent when needed. It actually works super well... you can see it running here, the manager even told it to do all the tweets for the year, so I presume it's going to stop at 365 tweets, https://s.h4x.club/eDubwABJ

Going to spend the rest of the day building out the full system till I have a complete complement of agents that can do every task in the startup, heh.

null

[deleted]

falcor84

I'm not exactly clear what you're asking. Where do you draw the line between "workflow automation" and "doing work"? To me it just seems like a spectrum with rapidly moving goal posts.

A decade ago, enterprises had quite a lot of roles involving essentially moving data from one ERP screen to another. From what I'm seeing, these roles seem to be quickly disappearing, with a combination of proper API-based automation, GUI automation and most recently LLM "agents" in crucial steps.

And on a very different note, I as a developer could ask an AI tool such as Aider or Windsurf to perform a big refactoring or other code change, working autonomously across code changes and shell commands until it passes all tests - this is agentic behavior that I didn't have even a year ago.

jimkri

I've created agents for the following:

- ICP / Sales Agent: I hired an offshore resource and built a GPT that they can send titles and other identifiers to, and it would say if it was in our ICP or not. I created it for a specific process that has outlined steps and FAQ from that person on things they have encountered, I plan on adding more questions and answers. This was super helpful on saving time on answering questions about titles / improving the results of their work.

- Domain Policy Scan (SPF, DKIM, DMARC): I scan domains and find SPF records and then use an Agent and a prompt to break out all the system tokens from the SPF to understand the systems companies are using. The prompt is a consent work in progress, but I have it done to be really consistent

Both have been really helpful to my overall workflow.

Lionga

Isn't that just simple glorified workflow automation? Shouldn't "agents" do and decide what to do themselves based on the holy prophecy of VC and AI Startups ?

PhilippGille

Yes, for example from Anthropic's definition [1]:

> Workflows are systems where LLMs and tools are orchestrated through predefined code paths.

> Agents, on the other hand, are systems where LLMs dynamically direct their own processes and tool usage, maintaining control over how they accomplish tasks.

[1] https://www.anthropic.com/research/building-effective-agents

jimkri

This is helpful, thanks for sharing that.

OutOfHere

Those sound like basic LLM workflows with nothing agentic about it.

jimkri

You are right, it is more of an LLM workflow.

The workflows can be easily updated to be an agent, the scan already creates sales opportunities for me and can be updated to create messages based on active email campaigns that I have created.

jokethrowaway

Just a buzzword for investors given we peaked with language models.

Chaining different prompts can be useful: calling that agents is purely marketing: these models are pretty dumb and don't have agency. I'd stay away from related frameworks