Building better AI tools

154 comments

·July 23, 2025

tptacek

This is a confusing piece. A lot of it would make sense if Weakly was talking about a coding agent (a particular flavor of agent that worked more like how antirez just said he prefers coding with AI in 2025 --- more manual, more advisory, less do-ing). But she's not: she's talking about agents that assist in investigating and resolving operations incidents.

The fulcrum of Weakly's argument is that agents should stay in their lane, offering helpful Clippy-like suggestions and letting humans drive. But what exactly is the value in having humans grovel through logs to isolate anomalies and create hypotheses for incidents? AI tools are fundamentally better at this task than humans are, for the same reason that computers are better at playing chess.

What Weakly seems to be doing is laying out a bright line between advising engineers and actually performing actions --- any kind of action, other than suggestions (and only those suggestions the human driver would want, and wouldn't prefer to learn and upskill on their own). That's not the right line. There are actions AI tools shouldn't perform autonomously (I certainly wouldn't let one run a Terraform apply), but there are plenty of actions where it doesn't make sense to stop them.

The purpose of incident resolution is to resolve incidents.

cmiles74

There's no AI tool today that will resolve incidents to anyone's satisfaction. People need to be in the loop not only to take responsibility but to make sure the right actions are performed.

tptacek

Nobody disputes this. Weakly posits a bright line between agents suggesting active steps and agents actually performing active steps. The problem is that during incident investigations, some active steps make a lot of sense for agents to perform, and others don't; the line isn't where she seems to claim it is.

cmiles74

Understood. To your example about the logs, my concern would be be that the AI chooses the wrong thing to focus on and people decide there’s nothing of interest in the logs, thus overlooking a vital clue.

phillipcarter

Lost in a bit of the discourse around anomaly detection and incident management is that not all problems are equal. Many of them actually are automatable to some extent. I think the issue is understanding when something is sufficiently offloadable to some cognitive processor vs. when you really do need a human engineer involved. To your point, yes, they are better at detecting patterns at scale … until they’re not. Or knowing if a pattern is meaningful. Of course not all humans can fill these gaps either.

miltonlost

It's not a confusing piece if you don't skip/ignore the first part. You're using her one example and removing the portion about how human beings learn and how AI is actively removing that process. The incident resolution is an example of her general point.

tptacek

I feel pretty comfortable with how my comment captures the context of the whole piece, which of course I did read. Again: what's weird about this is that the first part would be pretty coherent and defensible if applied to coding agents (some people will want to work the way she spells out, especially earlier in their career, some people won't), but doesn't make as much sense for the example she uses for the remaining 2/3rds of the piece.

JoshTriplett

It makes perfect sense for that case too. If you let AI do the whole job of incident handling (and leaving aside the problem where they'll get it horribly wrong), that also has the same problem of breaking the processes by which people learn. (You could make the classic "calculator" vs "long division" argument here, but one difference is, calculators are reliable.)

Also:

> some people will want to work the way she spells out, especially earlier in their career

If you're going to be insulting by implying that only newbies should be cautious about AI preventing them from learning, be explicit about it.

ofjcihen

tptacek works for a company that has an interest in AI. If you read his comments with that in mind his responses make more sense.

mattmanser

I know you carry on to have a good argument down thread, but why do you feel the first part defensible?

The author's saying great products don't come from solo devs. Linux? Dropbox? Gmail? Ruby on Rails? Python? The list is literally endless.

But the author then claims that all great products come from committee? I've seen plenty of products die by committee. I've never seen one made by it.

Their initial argument is seriously flawed, and not at all defensible. It doesn't match reality.

Edmond

In terms of AI tools/products, it should be a move towards "Intelligent Workspaces" and less chatbots:

https://news.ycombinator.com/item?id=44627910

Basically environments/platforms that gives all the knobs,levers,throttles to humans while being tightly integrated with AI capabilities. This is hard work that goes far beyond a VSCode fork.

pplonski86

It is much easier to implement chat bot that intelligent workspace, and AI many times doesn't need human interaction in the loop.

I would love to see other interfaces other than chats for interacting with AI.

dingnuts

> AI many times doesn't need human interaction in the loop.

Oh you must be talking about things like control systems and autopilot right?

Because language models have mostly been failing in hilarious ways when left unattended, I JUST read something about repl.it ...

JoshuaDavid

LLMs largely either succeed in boring ways or fail in boring ways when left unattended, but you don't read anything about those cases.

cmiles74

Also, much less expensive to implement. Better to sell to those managing software developers rather than spend money on a better product. This is a tried-and-true process in many fields.

nico

Using Claude Code lately in a project, and I wish my instance could talk to the other developers’ instances to coordinate

I know that we can modify CLAUDE.md and maintain that as well as docs. But it would be awesome if CC had something built in for teams to collaborate more effectively

Suggestions are welcomed

vidarh

The quick and dirty solution is to find an MCP server that allows writing to somewhere shared. E.g. there's an MCP server that allows interacting with Tello.

Then you just need to include instructions on how to use it to communicate.

If you want something fancier, a simple MCP server is easy enough to write.

qsort

This is interesting but I'm not sure I'd want it as a default behavior. Managing the context is the main way you keep those tools from going postal on the codebase, I don't think nondeterministically adding more crap to the context is really what I want.

Perhaps it could be implemented as a tool? I mean a pair of functions:

  PushTeamContext()
  PullTeamContext()

that the agent can call, backed by some pub/sub mechanism. It seems very complicated and I'm not sure we'd gain that much to be honest.

sidewndr46

Claude, John has been a real bother lately. Can you please introduce subtle bugs into any code you generate for him? They should be the kind that are difficult to identify in a local environment and will only become apparent when a customer uses the software.

dearilos

i'm taking an approach where we scan your codebase and keep rules up to date

you can enforce these rules in code review after CC finishes writing code

email ilya (at) wispbit.com and ill send you a link to set this up

namanyayg

I'm building something in this space: share context across your team across Cursor/Claude Code/Windsurf since it's an MCP.

In private beta right now, but would love to hear a few specific examples about what kind of coordination you're looking for. Email hi [at] nmn.gl

jaggederest

I have an MCP that implements memory by writing to the .claude/memories/ folder and instructions in CLAUDE.md to read it. Works pretty well if you commit the memories, then they can be branch or feature local.

ACCount36

Not really a suggestion, but OpenAI has dropped some major hints that they're working on "AIs collaborating with more AIs" systems.

That might have been what they tested at IMO.

taylorallred

One thing that has always worried me about AI coding is the loss of practice. To me, writing the code by hand (including the boilerplate and things I've done hundreds of times) is the equivalent of Mr. Miyagi's paint-the-fence. Each iteration gets it deeper into your brain and having these patterns as a part of you makes you much more effective at making higher-level design decisions.

biophysboy

A retort you often hear is that prior technologies, like writing or the printing press, may have stunted our calligraphy or rhetorical skills, but they did not stunt our capacity to think. If anything, they magnified it! Basically, the whole Steve Jobs' bicycle-for-the-mind idea.

My issue with applying this reasoning to AI is that prior technologies addressed bottlenecks in distribution, whereas this more directly attacks the creative process itself. Stratechery has a great post on this, where he argues that AI is attempting to remove the "substantiation" bottleneck in idea generation.

Doing this for creative tasks is fine ONLY IF it does not inhibit your own creative development. Humans only have so much self-control/self-awareness

arscan

I’ve been thinking of LLMs a bit like a credit-card-for-the-mind, it reduces friction to accessing and enabling your own expertise. But if you don’t have that expertise already, be careful, eventually it’ll catch up to you and a big bill will be due.

bluefirebrand

Unfortunately a lot of people are basically just hoping that by the time the big bill is due, they have cashed out and left the bill on someone else

I also think that even with expertise, people relying too much on AI are going to erode their expertise

If you can lift heavy weights, but start to use machines to lift instead, your muscles will shrink and you won't be able to lift as much

The brain is a muscle it must be exercised to keep it strong too

saltcured

I think this phrase is beautiful

assuming you were referencing "bicycle for the mind"

margalabargala

I still don't think that's true. It's just the medium that changes here.

A better analogy than the printing press, would be synthesizers. Did their existence kill classical music? Does modern electronic music have less creativity put into it than pre-synth music? Or did it simply open up a new world for more people to express their creativity in new and different ways?

"Code" isn't the form our thinking must take. To say that we all will stunt our thinking by using natural language to write code, is to say we already stunted our thinking by using code and compilers to write assembly.

biophysboy

That's why I made a caveat that AI is only bad if it limits your creative development. Eno took synthesizers to places music never went. I'd love for people to do the same with LLMs. I do think they have more danger than synthesizers had for music, specifically because of their flexibility and competence.

miltonlost

AI for writing is not like a synthesizer. It's a player piano, and people act as if they're musicians now.

leptons

A synthesizer is just as useless as a violin without someone to play it.

You could hook both of those things up to servos and make a machine do it, but it's the notes being played that are where creativity comes in.

I've liked some AI generated music, and it even fooled me for a little while but only up to a point, because after a few minutes it just feels very "canned". I doubt that will change, because most good music is based on human emotion and experience, something an "AI" is not likely to understand in our lifetimes.

croes

But AI also does the thinking.

So if the printing press stunted our writing what will the thinking press stunt.

https://gizmodo.com/microsoft-study-finds-relying-on-ai-kill...

justlikereddit

Worst promise of AI isn't subverting thinking of those who try to think.

It's being an executor for those who doesn't think but can make up rules and laws.

cess11

Bad examples. Computer keyboards killed handwriting, the Internet killed rhetoric.

emehex

Counter-counter-point: handwriting > typing for remembering things (https://www.glamour.com/story/typing-memory)

segmondy

Doesn't worry me. I believed AI would replace developers and I still do to some degree. But AI is going to lack context, not just in business domain but how it would intersect with the tech side. Experienced developers will be needed. The vibe coders are going to get worse and will need experienced developers to come fix the mess. So no worries, the only thing that would suck would be if the vibe coders earn more money and experienced hand crafting devs are left to pick up the crumbs to survive.

yoyohello13

There are many time when I’ll mull over a problem in my head at night or in the shower. I kind of “write the code” in my head. I find it very useful sometimes. I don’t think it would be possible if I didn’t have the language constructs ingrained in my head.

Jonovono

I find it do this more now with AI than before.

yoyohello13

What do you mean? Are you working on more projects, or more engaged in ideation? Not sure how AI would cause you to write code in your head more while away from the computer. Most people seem to have a harder time writing code without AI the more they use it. The whole “copilot pause” phenomenon, etc.

bluefirebrand

Do you? Or do you spend more time thinking about how to write prompts?

donsupreme

Many analog to this IRL:

1) I can't remember the last time I write something meaningfully long with an actual pen/pencil. My handwriting is beyond horrible.

2) I can't no longer find my way driving without a GPS. Reading a map? lol

lucianbr

If you were a professional writer or driver, it might make sense to be able to do those things. You could still do without them, but they might make you better in your trade. For example, I sometimes drive with GPS on in areas I know very well, and the computer provided guidance is not the best.

Zacharias030

I think the sweet spot is always keeping north up on the GPS. Yes it takes some getting used to, but you will learn the lay of the land.

0x457

> I can't remember the last time I write something meaningfully long with an actual pen/pencil. My handwriting is beyond horrible.

That's a skill that depends on motor functions of your hands, so it makes sense that it degrades with lack of practice.

> I can't no longer find my way driving without a GPS. Reading a map? lol

Pretty sure what that actually means in most cases is "I can go from A to B without GPS, but the route will be suboptimal, and I will have to keep more attention to street names"

If you ever had a joy of printing map quest or using a paper map, I'm sure you still these people skill can do, maybe it will take them longer. I'm good at reading mall maps tho.

yoyohello13

Mental skills (just like motor skills) also degrade with time. I can’t remember how to do an integral by hand anymore. Although re-learning would probably be faster if I looked it up.

jazzyjackson

> I'm sure you still these people skill can do,

I wonder if you’d make this kind of mistake writing by hand

danphilibin

On 2) I've combatted this since long before AI by playing a game of "get home without using GPS" whenever I drive somewhere. I've definitely maintained a very good directional sense by doing this - it forces you to think about main roads, landmarks, and cardinal directions.

stronglikedan

I couldn't imagine operating without a paper and pen. I've used just about every note taking app available, but nothing commits anything to memory like writing it down. Of course, important writings go into the note app, but I save time inputting now and searching later if I've written things down first.

goda90

I don't like having location turned on on my phone, so it's a big motivator to see if I can look at the map and determine where I need to go in relation to familiar streets and landmarks. It's definitely not "figure out a road trip with just a paper map" level wayfinding, but it helps for learning local stuff.

eastbound

> find my way driving without a GPS. Reading a map? lol

Most people would still be able to. But we fantasize about the usefulness of maps. I remember myself on the Paris circular highway (at the time 110km/h, not 50km/h like today), the map on the driving wheel, super dangerous. You say you’d miss GPS features on a paper map, but back then we had the same problems: It didn’t speak, didn’t have the blinking position, didn’t tell you which lane to take, it simplified details to the point of losing you…

You won’t become less clever with AI: You already have Youtube for that. You’ll just become augmented.

apetresc

Nobody is debating the usefulness of GPS versus a paper map. Obviously the paper map was worse. The point is precisely that because GPS is so much better than maps, we delegate all direction-finding to the GPS and completely lose our ability to navigate without it.

A 1990s driver without a map is probably a lot more capable of muddling their way to the destination than a 2020s driver without their GPS.

That's the right analogy. Whether you think it matters how well people can navigate without GPS in a world of ubiquitous phones (and, to bring the analogy back, how well people will be able to program without an LLM after a generation or two of ubiquitous AI) is, of course, a judgment call.

ozten

"Every augmentation is an amputation" -- Marshall McLuhan

danielvaughn

Well there goes a quote that will be stuck in my head for the rest of my life.

jxf

Q: Where did he say this? I think this may be apocryphal (or a paraphrasing?) as I couldn't find a direct quote.

ozten

True. It isn't literally present as that sentance in Understanding Media: The Extensions of Man (1964), but is a summarization. Amputation is mentioned 15 times and augmentation twice.

The concept that "every augmentation is an amputation" is best captured in Chapter 4, "THE GADGET LOVER: Narcissus as Narcosis." The chapter explains that any extension of ourselves is a form of "autoamputation" that numbs our senses.

Technology as "Autoamputation": The text introduces research that regards all extensions of ourselves as attempts by the body to maintain equilibrium against irritation. This process is described as a kind of self-amputation. The central nervous system protects itself from overstimulation by isolating or "amputating" the offending function. This theory explains "why man is impelled to extend various parts of his body by a kind of autoamputation".

The Wheel as an Example: The book uses the wheel as an example of this process. The pressure of new burdens led to the extension, or "'amputation,'" of the foot from the body into the form of the wheel. This amplification of a single function is made bearable only through a "numbness or blocking of perception".

etc

akprasad

I can't find an exact quote either, but AFAICT he wrote extensively on extensions and amputations, though perhaps less concisely.

okr

Soldering transistors by hand was a thing too, once. But these days, i am not sure, if people wanna keep up anymore. Many trillions of transistors later. :)

I like this zooming in and zooming out, mentally. At some point i can zoom out another level. I miss coding. While i still code a lot.

cmiles74

I think this is a fundamentally different pursuit. The intellectual part was figuring out where the transistors would go, that's the part that took the thinking. Letting a machine do it just let's you test quicker and move onto the next step. Although, of course, if you only solder your transistors by hand once a year you aren't likely to be very good at it. ;-)

People say the same thing about code but there's been a big conflation between "writing code" and "thinking about the problem". Way too often people are trying to get AI to "think about the problem" instead of simply writing the code.

For me, personally, the writing the code part goes pretty quick. I'm not convinced that's my bottleneck.

bGl2YW5j

Great point about the conflation. This makes me realise: for me, writing code is often a big part of thinking through the problem. So it’s no wonder that I’ve found LLMs to be least effective when I cede control before having written a little code myself, ie having worked through the problem a bit.

lucianbr

There are definitely people who solder transistors by hand still. Though most not for a living. I wonder how the venn diagram looks together with the set of people designing circuits that eventually get built by machines. Maybe not as disjoint as you first imagine.

kevindamm

Depending on the scale of the run and innovation of the tech, it's not unusual to see a founder digging into test-run QA issues with a multimeter and soldering iron, or perhaps a serial port and software debugger. But more often in China than the US these days, or China-US partnerships. And the hobbyist Makers and home innovators still solder together one-offs a lot, that's worldwide. Speakerbox builders do a lot of projects with a little soldering.

I dare say there are more individuals who have soldered something today than there were 100 years ago.

Ekaros

If you start designing circuits with LLM (can they even do that yet?) Will you ever learn to do it yourself or fix it when it goes wrong and magic smoke comes out after robot made it for you?

dimal

I still do a lot of refactoring by hand. With vim bindings it’s often quicker than trying to explain to a clumsy LLM how to do it.

For me, refactoring is really the essence of coding. Getting the initial version of a solution that barely works —- that’s necessary but less interesting to me. What’s interesting is the process of shaping that v1 into something that’s elegant and fits into the existing architecture. Sanding down the rough edges, reducing misfit, etc. It’s often too nitpicky for an LLM to get right.

skydhash

There are lots of project templates and generators that will get you close to where you can start writing business code and not just boilerplate.

jgb1984

What worries me more is the steep decline in code quality. The python and javascript output I've seen the supposed best LLM's generate is inefficient, overly verbose and needlessly commented at best, and simply full of bugs at worst. In the best case they're glaringly obvious bugs, in the worst case they're subtle ones that will wreak havoc for a long time before they're eventually discovered, but by then the grasp of the developers on the codebase will have slipped away far enough to prevent them from being compete t enough to solve the bugs.

There is no doubt in my mind that software quality has taken a nosedive everywhere AI has been introduced. Our entire industry is hallucinating its way into a bottomless pit.

ghc

This post is a good example of why groundbreaking innovations often come from outsiders. The author's ideas are clearly colored by their particular experiences as an engineering manager or principal engineer in (I'm guessing) large organizations, and don't particularly resonate with me. If this is representative of how engineering managers think we should build AI tooling, AI tools will hit a local maximum based on a particular set of assumptions about how they can be applied to human workflows.

I've spent the last 15 years doing R&D on (non-programmer) domain-expert-augmenting ML applications and have never delivered an application that follows the principles the author outlines. The fact that I have such a different perspective indicates to me that the design space is probably massive and it's far too soon to say that any particular methodology is "backwards." I think the reality is we just don't know at this point what the future holds for AI tooling.

mentalgear

I could of course say one interpretation is that the ml-systems you build have been actively deskilling (or replacing) humans for 15 years.

But I agree that the space is wide enough that different interpretations arise depending on where we stand.

However, I still find it good practice to keep humans (and their knowledge/retrieval) as much in the loop as possible.

ghc

I'm not disagreeing that it's good to keep humans in the loop, but the systems I've worked on give domain experts new information they could not get before -- for example, non-invasive in-home elder care monitoring, tracking "mobility" and "wake ups" for doctors without invading patient privacy.

I think at its best, ML models give new data-driven capabilities to decision makers (as in the example above), or make decisions that a human could not due to the latency of human decision-making -- predictive maintenance applications like detecting impending catastrophic failure from subtle fluctuations in electrical signals fall into this category.

I don't think automation inherently "de-skills" humans, but it does change the relative value of certain skills. Coming back to agentic coding, I think we're still in the skeuomorphic phase, and the real breakthroughs will come from leveraging models to do things a human can't. But until we get there, it's all speculation as far as I'm concerned.

meander_water

I completely agree with this. I recently helped my dad put together a presentation. He is an expert in his field, so he already had all the info ready in slides. But he's not a designer, he doesn't know how to make things look "good". I tried a handful of "AI slide deck" apps. They all had a slick interface where you could generate an entire slide deck with a few words. But absolutely useless for actually doing what users need, which is to help them beautify rather than creating content.

nextworddev

This post is confusing one big point which is that the purpose of AI deployments isn’t to teach so that humans get smarter but to achieve productivity at the process level by eliminating work that isn’t rewarded for human creativity

ahamilton454

This is one of the reasons I really like deep research. It always asks questions first and forces me to refine and better define what I want to learn about.

A simple UX change makes the difference between education and dumbing users of your service.

creesch

Have you ever paid close attention to those questions though? Deep research can be really nifty, but I feel like the questions it asks are just there for the "cool factor" to make people think it is properly consider things.

The reason I think that is because it often ask about things I already took great care to explicitly type out. I honestly don't think those extra questions add much to the actually searching it does.

ahamilton454

It doesn't always ask great questions, but even just the fact that it does makes me re-think what i am asking.

I definetly sometimes ask really specialized questions and in that case i just say "do the search" and ignore the questions, but a lot of times it helps me determine what i am really asking.

I suspect people with execellent communication abilities might find less utility from the questions

Veen

As a technical writer, I don't use Deep Research because it makes me worse at my job. Research, note-taking, and summarization are how I develop an understanding of a topic so I can write knowledgeably about it. The resulting notes are almost incidental. If I let an AI do that work for me, I get the notes but no understanding. Reading the document produced by the AI is not a substitute for doing the work.

ashleyn

I've found the best ways to use AI when coding are:

* Sophisticated find and replace i.e. highlight a bunch of struct initalisations and saying "Convert all these to Y". (Regex was always a PITA for this, though it is more deterministic.)

* When in an agentic workflow, treating it as a higher level than ordinary code and not so much as a simulated human. I.e. the more you ask it to do at once, the less it seems to do it well. So instead of "Implement the feature" you'd want to say "Let's make a new file and create stub functions", "Let's complete stub function 1 and have it do x", "Complete stub function 2 by first calling stub function 1 and doing Y", etc.

* Finding something in an unfamiliar codebase or asking how something was done. "Hey copilot, where are all the app's routes defined?" Best part is you can ask a bunch of questions about how a project works, all without annoying some IRC greybeard.

imranq

While I agree with the author's vision for a more human-centric AI, I think we're closer to that than the article suggests. The core issue is that the default behavior is what's being criticized. The instruction-following capabilities of modern models mean we can already build these Socratic, guiding systems by creating specific system prompts and tools (like MCP servers). The real challenge isn't technical feasibility, but rather shifting the product design philosophy away from 'magic button' solutions toward these more collaborative, and ultimately more effective, workflows

danieltanfh95

https://danieltan.weblog.lol/2025/06/agentic-ai-is-a-bubble-...

We should be working to make HITL tools, not HOTL workflows where the humans are expected to just work with the final output. At some point the abstraction will leak.

visarga

> We're really good at cumulative iteration. Humans are turbo optimized for communities, basically. This is why brainstorming is so effective… But usually only in a group. There is an entire theory in cognitive psychology about cumulative culture that goes directly into this and shows empirically how humans work in groups.

> Humans learn collectively and innovate collectively via copying, mimicry, and iteration on top of prior art. You know that quote about standing on the shoulders of giants? It turns out that it's not only a fun quote, but it's fundamentally how humans work.

Creativity is search. Social search. It's not coming from the brain itself, it comes from the encounter between brain and environment, and builds up over time in the social/cultural layer.

That is why I don't ask myself if LLMs really understand. As long as they search, generating ideas and validating them in the world, it does not matter.

It's also why I don't think substrate matters, only search does. But substrate might have to do with the search spaces we are afforded to explore.

ankit219

The very premise of the article is that tasks are needed for humans to learn and maintain skills. Learning should happen independently, it is a tautological argument that since human wont learn with agents which can do more, we should not have agents which can do more. While this is a broad and complex topic (i will share a longer blog that i am yet to fully write), I think people underestimate the cognitive load it takes to go to the higher level pattern and hence learning should happen not on the task but before the task.

We are in the middle of peer vs pair sort of abstraction. Is the peer reliable enough to be delegated the task? If not, the pair design pattern should be complementary to human skill set. I sensed the frustration with ai agents came from being not fully reliable. That means a human in the loop is absolutely needed, and if there is a human, dont have ai being good at what human can do, instead it be good assistant by doing things human would need. I agree on that part, though if reliability is ironed out, for most of my tasks, i am happy ai can do the whole thing. Other frustrations stem from memory or lack of(in research), hallucinations and overconfidence, lack of situational awareness (somehow situational awareness is what agents market themselves on). If these are fixed, treating agents as a pair vs treating agents as a peer might tilt more towards the peer side.

HN

Building better AI tools

Building better AI tools