Ask HN: Are there any real examples of AI agents doing work?
53 comments
·January 8, 2025chevman
Been in BigCo land for 20 years now, and have seen the rise and fall of quite a few AI/ML/RPA etc fads.
Honestly the whole landscape seems broken and unproductive at this point.
Countless vendors, platforms, cloud environments, industry/technical jargon - all with different pricing models, SLAs, tooling, etc etc.
Getting anything usable is a challenge and most orgs spin in a never ending cycle of data integration/normalization work that produces little business value.
My advice to teams now is simplify, reduce, streamline - get to the kernel of what you think you need and protect it all costs. Most of the shiny new objects being pitched as silver bullets are just ways for other people to make money off your margin.
oraphalous
I too would like to hear some examples.
On the one hand you have gurus claiming that AI agents are going to all make all SaaS redundant, on the other claiming that AI isn't going to take my coding job, but I need to adapt my workflows to incorporate AI. We all need to start preparing now for the changes that AI is going to cause.
But these two claims aren't compatible. If AGI and these super agents are that bonkers amazeballs that they can replace entire SaaS companies - then there is no way I'm going to be able to adapt my workflows to compete as a programmer.
Further, if the wildest claims about AI end up proving to be true - there is simply no way to prepare. What possible adaptation to my workflow could I possibly come up with that an AI agent could not surpass? Why should I bother learning how to implement (with today's apis) some RAG setup for a SaaS customer service chatbot when presumably an AI agent is going to make that skillset redundant shortly after?
I'm going to be interviewing for frontend roles soon, and for my prep I'm just going back to basics and making sure I remember on demand all the basics css, html, js/ts - fuck the rest of this noise.
fhd2
Programmers don't work in isolation. So I don't know how necessary it would be to quickly adapt your workflows to compete. If there's something that's useful to adopt, there will be a stream of blog posts, coworkers, people at user groups and what not spoon feeding what they learned to others. I don't think there's much cause for FOMO, I don't think it makes a big difference whether you start using a faster way to work a few months earlier or later than others. It can be cheaper to not jump on any hype train and potentially miss out on genuine improvements for a while, than to jump on all the hype trains and waste a lot of time on stuff that goes nowhere.
And like you said, if the wildest claims hold true, all programmers are out of a job by the end of 2026 anyway, with all other jobs following over the course of a few years. There's too many variables to predict what would happen in such a scenario, so probably best to deal with it if it happens.
So to me, your strategy checks out. I've personally invested some time into code generating and agentic tooling, but ultimately went back to Claude-as-Google-replacement. By my estimation, about a 5-10 % productivity boost compared to my workflow in 2022. The work is about the same, I just learn a bit faster.
lolinder
> And like you said, if the wildest claims hold true, all programmers are our of a job by the end of 2026 anyway, with all other jobs following over the course of a few years. There's too many variables to predict what would happen in such a scenario, so probably best to deal with it if it happens.
So much this. AGI is the equivalent of a nuclear apocalypse in many ways—it's unlikely, not unlikely enough for comfort, but also totally not worth preparing for because there's basically no way to predict what preparations would actually be helpful, nor is it obvious that you'd even want to survive it if it happened.
The expected value of prepping for it isn't worth the investment, so it's better to do what most of us already do for nuclear war and pretty much pretend it won't happen.
hnthrow90348765
I need an AI agent to continuously ask questions of PMs or stakeholders until the requirements are less vague. The good thing is this would be a plain english discussion which LLMs are good at. A PM can ask if something is technically feasible to some degree too. Maybe it can even break up tickets in a much better fashion too.
ako
I’m a pm, today I built a working mockup with windsurf (golang + wails + vuejs +duckd). Windsurf uses codeium, branded as the first agentic IDE.
Your requirements will improve, not sure if in the long I still need developers to build the actual software.
The development process with windsurf is a bit like throwing a dice, hoping for a 6. A lot of trial and error, but if you check the git log, you see about 15 minutes between commit per feature request. Windsurf does a good job to summarize the entire feature request chat into a short git commit message. Every git commit reads like a user story.
jondwillis
How… do I find PMs like you? Literally have never worked with a single one that bothered to understand the technology they are building on top of at a deep enough level.
Maybe I just need to teach the ones I work with that it is now possible to trivially prototype many ideas without much or any coding skill.
ceejayoz
> I need an AI agent to continuously ask questions of PMs or stakeholders until the requirements are less vague.
They’ll just get mad at the AI and tell it to stop asking so many questions. As they already do to humans.
whamlastxmas
As a PM, ChatGPT is great at helping me write tickets in a structured format from me just giving it a single sloppy sentence. I of course review it to make sure it’s understanding me properly. But having to explicitly write stuff like intended behaviors when submitting bugs can be really laborious, though I understand why engineers sometimes need that level of clarity (having been one myself for 15 years)
A4ET8a8uTh0_v2
I have not seen one in production, but I did see 'agent products' sold to financial companies for compliance purposes ( sanctions, mortgage, other regs ). Fascinating stuff that got me mildly interested in MS troupe.
_sword
Could you name any products?
A4ET8a8uTh0_v2
Not by name ( edit: and in corporate product names seem to change a lot from where I sit ) but every bigger consulting company/vendor[2] that works with banks/brokers/financial institutions right now seems to have at least some offering in that space to ride ai wave. The presentation I saw specifically from Crowe[1].
[1]https://www.crowe.com/ae/-/media/crowe/firms/middle-east-and... [2]https://www.lexisnexis.com/community/insights/legal/b/though...
kodablah
What's wrong with an agent being a glorified workflow? At Temporal (where I work), it seems plenty natural for agentic AI to just be AI worklows. Here's a video we put out this week demonstrating it: https://youtu.be/GEXllEH2XiQ (code at https://github.com/steveandroulakis/temporal-ai-agent).
lolinder
This is what OP is explicitly not asking for—it's just a demo of a theoretical case, Temporal showing how a company that's hyped up on AI agents could use your platform to do agent-y things.
OP wants to know if anyone is actually using this stuff productively, not if anyone has tech demos. We've all seen more than enough tech demos.
lcrmorin
Perfect exemple of what op is asking. This is just a demo. What problems does it solve for you or your clients so that you make money ?
theptip
This is interesting stuff, and a great stepping stone. I think the excitement around true agents will come when the AI can author the workflow pipeline, so to speak, in response to a request.
This is an area where terminology is in flux but I think of weak agents as mostly-hardcoded, eg if you wrote a flight booking bot that can converse with you about flight options then go do the booking - but you specified the APIs and workflow engine. Strong agents can self-directedly follow long range goals over long time frames, eg “run this business unit for me” or “manage my portfolio”.
readyplayernull
I like this distinction from automation by Bartosz Pucek:
At its core, an Agent is software that can:
Take in a task description
Break it down into steps
Execute those steps using available tools
Adapt its approach based on feedback
The key distinction from traditional automation: Agents handle variance and uncertainty by replanning rather than failing when their happy path breaks.Source: https://newsletter.pucek.com/p/2025-the-state-of-ai-agents-a...
neom
The name for me is less important so much as can I have something that does my work for me. I've been starting to play with my own solutions between the 3 foundational modal companies. I've started to try to build my own stuff a bit, I think I need to learn more about apple scripting, also so far my experiments have required me to have multiple systems running to make it super easy for me.
You're all going to laugh at this stuff because it's so remedial and also clearly not agents but a couple things I've done... I won't say I really USE this stuff daily, I just play to see what I can do. I've figured out how to pass screenshots back and forth between modals (I have one computer take a screenshot every 30 minutes, and then send that screenshot to another machine, that machine is set up with a mouse hovering over the upload button on perplexity, it uploads the screenshot, and then perplexity does the work from the screenshot) An example of this that worked ok was I had chatgpt create all the themes for the social media schedule I needed to do this year, then I passed that screenshot to perplexity to do the searching on the web, and then I passed that to claud to write the tweet. This actually works ok-ish and I'm going to expand it a bit over the coming weeks I guess. Things like this are super helpful for weird hacks like that: https://github.com/BlueM/cliclick
Another thing I've found actually works pretty well is setting up two computers next to each other with ChatGPT voice mode, if you give them custom instructions to be sure to wait for the other one to be done talking, they don't interrupt each other and can get quite a bit of work done. Here is just a video of the mvp that I sent to a friend ages ago once I started playing with the idea: https://s.h4x.club/kpuzNkNL - I actually use this method of working quite often now, couple times a week at least, I find it's pretty helpful. If I knew how to put 4/5 modals together in one app and give them each custom instructions, I'd love to try building a team (if someone out there actually knows how to build this kinda stuff, I'm happy to help flesh out how the product would need to work, but I don't think it's super difficult to build at this point, I'm just not technical enough)
jncfhnb
Most “agents” are just starter prompts + a small set of tools that they can use to respond to things, like access to a database.
They’re workflow automation
falcor84
I'm not exactly clear what you're asking. Where do you draw the line between "workflow automation" and "doing work"? To me it just seems like a spectrum with rapidly moving goal posts.
A decade ago, enterprises had quite a lot of roles involving essentially moving data from one ERP screen to another. From what I'm seeing, these roles seem to be quickly disappearing, with a combination of proper API-based automation, GUI automation and most recently LLM "agents" in crucial steps.
And on a very different note, I as a developer could ask an AI tool such as Aider or Windsurf to perform a big refactoring or other code change, working autonomously across code changes and shell commands until it passes all tests - this is agentic behavior that I didn't have even a year ago.
breckenedge
Here’s a a talk from a month ago that covers a few use cases that are definitely beyond simple glorified workflow automation.
smt88
This is a demo. OP is asking for examples of usage in production, where the agent is actually doing work.
It's also not really what people are promising with "agentic" because there's a human prompting and assisting it the entire time.
williamcotton
Cursor’s agent in the Composer workflow will check the linter as well as run tests in the “yolo” mode.
What makes it an agent is the feedback loop of making a change and then seeing the results and making further changes.
jimkri
I've created agents for the following:
- ICP / Sales Agent: I hired an offshore resource and built a GPT that they can send titles and other identifiers to, and it would say if it was in our ICP or not. I created it for a specific process that has outlined steps and FAQ from that person on things they have encountered, I plan on adding more questions and answers. This was super helpful on saving time on answering questions about titles / improving the results of their work.
- Domain Policy Scan (SPF, DKIM, DMARC): I scan domains and find SPF records and then use an Agent and a prompt to break out all the system tokens from the SPF to understand the systems companies are using. The prompt is a consent work in progress, but I have it done to be really consistent
Both have been really helpful to my overall workflow.
Lionga
Isn't that just simple glorified workflow automation? Shouldn't "agents" do and decide what to do themselves based on the holy prophecy of VC and AI Startups ?
PhilippGille
Yes, for example from Anthropic's definition [1]:
> Workflows are systems where LLMs and tools are orchestrated through predefined code paths.
> Agents, on the other hand, are systems where LLMs dynamically direct their own processes and tool usage, maintaining control over how they accomplish tasks.
[1] https://www.anthropic.com/research/building-effective-agents
jimkri
This is helpful, thanks for sharing that.
OutOfHere
Those sound like basic LLM workflows with nothing agentic about it.
idkwhattocallme
sales ops here. I was just tasked with figuring out how to use AI to use previous quotes to generate new quotes so sales people don't spend so much time creating quotes. Seems like the perfect thing for an agent. Anyone done this?
AznHisoka
Replace the word “agent” with algorithm and I agree. Why overcomplicate things?
Lionga
Cause he can say he used AI and get a promotion and the company can put AI on the website and make stock price go up.
brookst
The difference is that algorithms have known inputs and “agent” implies a a greater level of adaptability to unforeseen inputs.
odyssey7
That sounds alright, but I'm having difficulty imagining a situation where a business wants to produce a quote with novel element types / parameterizations not yet seen before without a human hand in the loop.
GianFabien
In my pre-sales career, we just did copy and paste for spreadsheets and docs. Most quotes only require finding the nearest recent one and a replace-all for key bits of information followed by careful proof-reading.
codingdave
Sounds like a poorly thought out requirement. If you are tasked with speeding up the generation of quotes and find that AI can do the job well, that is perfectly reasonable. But if you are told what tool to use to make it happen, whomever tasked you with it doesn't understand that AI is a tool, not a goal. (I say that often enough, I may need to put it on t-shirts.)
threatripper
For him and his boss and the boss of his boss it may well be a goal to use more AI in business processes. It may be decided in the strategy to spend X% on AI in the next 3 years. So you will do exactly that and not question if it makes sense at all.
linuxftw
I disagree here. It sounds to me like the requirements are clear: Use some AI "agent" to perform this task. That means it should be trained on a particular dataset, and it should perform a particular function. This would be in place of trying to write software to directly do this, just let the AI perform task processing, proposal drafting, document formatting.
schappim
>> Anyone done this?
Yes, we have and more!
We sell maker and STEM education electronics, but the profit margins on products like Raspberry Pis, Micro:bits, and Arduinos are, well, pretty slim. This has pushed us to become extremely efficient; so much so that we ended up creating our own AI-agent-based ERP platform called Koi [1]
In essence, our work is built on the shoulders of giants like OpenAI’s Assistant API, Anthropic and Rails.
One of our standout demos is that certain objects (Orders, Quotes, Supplier Orders, Customers etc) in our database are assigned their own email addresses (using Rails' Action Mailbox[2]). Emails can be forwarded directly to these objects-whether it’s an order, a customer, or a supplier order.
From there, our agent, “Koi,” automatically extracts relevant information from emails and takes appropriate actions. For example, Koi can create a quote, attach a purchase order PDF to an order, or extract tracking information from supplier shipping confirmation emails to provide live tracking updates.
It also works the other way around; you can ask Koi to send a customer their tax invoice or inform them that a product they were interested in is out of stock, seamlessly handling typical customer service tasks.
Previously, we integrated speech-to-text functionality using the Whisper API, which made for an impressive demo.
Now, we’re taking it a step further by rebuilding our speech system to leverage OpenAI’s new WebRTC-based Real-time API. The key advantage here is that it comes with function calling support[3]. We already support a variety of automation features using barcodes[4], allowing users to scan a barcode and have Koi perform specific actions. This has proven to be an ideal area in the application to integrate tool use with the real-time API, creating even more powerful and efficient workflows.
Our ultimate goal is to integrate this system with Bishop, our product-picking robot[5].
[2] https://guides.rubyonrails.org/action_mailbox_basics.html
[3] https://platform.openai.com/docs/guides/realtime-model-capab...
[4] https://help.koi.app/article/54-barcode-driven-fulfillment
[5] https://piaustralia.com.au/pages/the-raspberry-pi-that-ships...
mattmanser
Your spiel here is much better than the website you've linked.
What you've linked sounds like you're selling a glorified shipping label printer.
I'm curious how this differs from standard TA/TMS systems that have been around for decades. I work in the space and there are plenty of TA/TMS systems that print shipping labels and fulfil orders, that update stock levels and send out tracking emails + SMS messages, integrate with carriers for shipment updates, that integrate with Shopify, eBay, Etsy, big commerce, etc.
They didn't need AI to do any of that. What's the advantage you're finding?
Here's an example that seems to operate in Australia:
schappim
Shipping is a fraction of what the system does. To completely automate shipping you need an understanding of inventory etc. To do automated customer service, you need knowledge of shipping, inventory etc.
thekevan
I have mentioned this on Twitter recently. My stream there is full of people talking about agents being the future, several posts on how to make them, but almost zero examples of any that they have built or used.
2025 is the year of agents. I’ve heard about SDR AI agents but not great things. Most “agents” sound like workflow automations that have been around forever. Anyone have an example of an “ai” agent which I understand to be intelligent that isn’t a glorified or rebranded workflow automation? Thx.