My AI skeptic friends are all nuts

2841 comments

·June 2, 2025

matthewsinclair

I think this article is pretty spot on — it articulates something I’ve come to appreciate about LLM-assisted coding over the past few months.

I started out very sceptical. When Claude Code landed, I got completely seduced — borderline addicted, slot machine-style — by what initially felt like a superpower. Then I actually read the code. It was shockingly bad. I swung back hard to my earlier scepticism, probably even more entrenched than before.

Then something shifted. I started experimenting. I stopped giving it orders and began using it more like a virtual rubber duck. That made a huge difference.

It’s still absolute rubbish if you just let it run wild, which is why I think “vibe coding” is basically just “vibe debt” — because it just doesn’t do what most (possibly uninformed) people think it does.

But if you treat it as a collaborator — more like an idiot savant with a massive brain but no instinct or nous — or better yet, as a mech suit [0] that needs firm control — then something interesting happens.

I’m now at a point where working with Claude Code is not just productive, it actually produces pretty good code, with the right guidance. I’ve got tests, lots of them. I’ve also developed a way of getting Claude to document intent as we go, which helps me, any future human reader, and, crucially, the model itself when revisiting old code.

What fascinates me is how negative these comments are — how many people seem closed off to the possibility that this could be a net positive for software engineers rather than some kind of doomsday.

Did Photoshop kill graphic artists? Did film kill theatre? Not really. Things changed, sure. Was it “better”? There’s no counterfactual, so who knows? But change was inevitable.

What’s clear is this tech is here now, and complaining about it feels a bit like mourning the loss of punch cards when terminals showed up.

[0]: https://matthewsinclair.com/blog/0178-why-llm-powered-progra...

wpietri

One of the things I think is going on here is a sort of stone soup effect. [1]

Core to Ptacek's point is that everything has changed in the last 6 months. As you and I presume he agree, the use of off-the-shelf LLMs in code was kinda garbage. And I expect the skepticism he's knocking here ("stochastic parrots") was in fact accurate then.

But it did get a lot of people (and money) to rush in and start trying to make something useful. Like the stone soup story, a lot of other technology has been added to the pot, and now we're moving in the direction of something solid, a proper meal. But given the excitement and investment, it'll be at least a few years before things stabilize. Only at that point can we be sure about how much the stone really added to the soup.

Another counterfactual that we'll never know is what kinds of tooling we would have gotten if people had dumped a few billion dollars into code tool improvement without LLMs, but with, say, a lot of more conventional ML tooling. Would the tools we get be much better? Much worse? About the same but different in strengths and weaknesses? Impossible to say.

So I'm still skeptical of the hype. After all, the hype is basically the same as 6 months ago, even though now the boosters can admit the products of 6 months ago sucked. But I can believe we're in the middle of a revolution of developer tooling. Even so, I'm content to wait. We don't know the long term effects on a code base. We don't know what these tools will look like in 6 months. I'm happy to check in again then, where I fully expect to be again told: "If you were trying and failing to use an LLM for code 6 months ago †, you’re not doing what most serious LLM-assisted coders are doing." At least until then, I'm renewing my membership in the Boring Technology Club: https://boringtechnology.club/

[1] https://en.wikipedia.org/wiki/Stone_Soup

keeda

> Core to Ptacek's point is that everything has changed in the last 6 months.

This was actually the only point in the essay with which I disagree, and it weakens the overall argument. Even 2 years ago, before agents or reasoning models, these LLMs were extremely powerful. The catch was, you needed to figure out what worked for you.

I wrote this comment elsewhere: https://news.ycombinator.com/item?id=44164846 -- Upshot: It took me months to figure out what worked for me, but AI enabled me to produce innovative (probably cutting edge) work in domains I had little prior background in. Yes, the hype should trigger your suspicions, but if respectable people with no stake in selling AI like @tptacek or @kentonv in the other AI thread are saying similar things, you should probably take a closer look.

wpietri

>if respectable people with no stake in selling AI like @tptacek or @kentonv in the other AI thread are saying similar things, you should probably take a closer look.

Maybe? Social proof doesn't mean much to me during a hype cycle. You could say the same thing about tulip bulbs or any other famous bubble. Lots of smart people with no stake get sucked in. People are extremely good at fooling themselves. There are a lot of extremely smart people following all of the world's major religions, for example, and they can't all be right. And whatever else is going on here, there are a lot of very talented people whose fortunes and futures depend on convincing everybody that something extraordinary is happening here.

I'm glad you have found something that works for you. But I talk with a lot of people who are totally convinced they've found something that makes a huge difference, from essential oils to functional programming. Maybe it does for them. But personally, what works for me is waiting out the hype cycle until we get to the plateau of productivity. Those months that you spent figuring out what worked are months I'd rather spend on using what I've already found to work.

mwarkentin

AI posts (including this one) are all over his employers blog lately, so there’s some stake (fly MCP, https://fly.io/blog/fuckin-robots/, etc).

potatolicious

> "Even 2 years ago, before agents or reasoning models, these LLMs were extremely powerful. The catch was, you needed to figure out what worked for you."

Sure, but I would argue that the UX is the product, and that has radically improved in the past 6-12 months.

Yes, you could have produced similar results before, manually prompting the model each time, copy and pasting code, re-prompting the model as needed. I would strenuously argue that the structuring and automation of these tasks is what has made these models broadly usable and powerful.

In the same way that Apple didn't event mobile phones nor touchscreens nor OSes, but the specific combination of these things resulted in a product that was different in kind than what came before, and took over the world.

Likewise, the "putting the LLM into a structured box of validation and automated re-prompting" is huge! It changed the product radically, even if its constituent pieces existed already.

[edit] More generally I would argue that 95% of the useful applications of LLMs aren't about advancing the SOTA model capabilities and more about what kind of structured interaction environment we shove them into.

gopher_space

I don't think it's possible to understand what people mean by force multiplier re AI until you use it to teach yourself a new domain and then build something with that knowledge.

Building a mental model of a new domain by creating a logical model that interfaces with a domain I'm familiar with lets me test my assumptions and understanding in real time. I can apply previous experience by analogy and verify usefulness/accuracy instantly.

> Upshot: It took me months to figure out what worked for me, but AI enabled me to produce innovative (probably cutting edge) work in domains I had little prior background in. Yes, the hype should trigger your suspicions[...]

Part of the hype problem is that describing my experience sounds like bullshit to anyone who hasn't gone through the same process. The rate that I pick up concepts well enough to do verifiable work with them is literally unbelievable.

xpe

Almost by definition, one should be skeptical about hype. So we’re all trying to sort out what is being sold to us.

Different people have different weird tendencies in different directions. Some people irrationally assume that things aren’t going to change much. Others see a trend and irrationally assume that it will continue on a trend line.

Synthesis is hard.

Understanding causality is even harder.

Savvy people know that we’re just operating with a bag of models and trying to choose the right combination for the right situation.

This misunderstanding is one reason why doomers, accelerations, and “normies” talk past each other or (worse) look down on each other. (I’m not trying to claim epistemic equivalence here; some perspectives are based on better information, some are better calibrated than others! I’m just not laying out my personal claims at this point. Instead, I’m focusing on how we talk to each other.)

Another big source of misunderstanding is about differing loci of control. People in positions of influence are naturally inclined to think about what they can do, who they know, and where they want to be. People farther removed feel relatively powerless and tend to hold onto their notions of stability, such as the status quo or their deepest values.

Historically, programmers have been quite willing to learn new technologies, but now we’re seeing widespread examples where people’s plasticity has limits. Many developers cannot (or are unwilling to) wrap their minds around the changing world. So instead of confronting the reality they find ways to deny it, consciously or subconsciously. Our perception itself is shaped by our beliefs, and some people won’t even perceive the threat because it is too strange or disconcerting. Such is human nature: we all do it. Sometimes we’re lucky enough to admit it.

wpietri

I think "the reality", at least as something involving a new paradigm, has yet to be established. I'll note that I heard plenty of similar talk about how developers just couldn't adapt six months or more ago. Promoters now can admit those tools were in fact pretty bad, because they now have something else to promote, but at the time those not rawdogging LLMs were dinosaurs under a big meteor.

I do of course agree that some people are just refusing to "wrap their minds around the changing world". But anybody with enough experience in tech can count a lot more instances of "the world is about to change" than "the world really changed". The most recent obvious example being cryptocurrencies, but there are plenty of others. [1] So I think there's plenty of room here for legitimate skepticism. And for just waiting until things settle down to see where we ended up.

[1] E.g. https://www.youtube.com/watch?v=b2F-DItXtZs

spaceman_2020

I’m an amateur coder and I used to rely on Cursor a lot to code when I was actively working on hobby apps about 6 months ago

I picked coding again a couple of days back and I’m blown away by how much things have changed

It was all manual work until a few months back. Suddenly, its all agents

wpietri

> You'll not only never know this, it's IMHO not very useful to think about at all, except as an intellectual exercise.

I think it's very useful if one wants to properly weigh the value of LLMs in a way that gets beyond the hype. Which I do.

DannyBee

"nother counterfactual that we'll never know is what kinds of tooling we would have gotten if people had dumped a few billion dollars into code tool improvement without LLMs, but with, say, a lot of more conventional ML tooling. Would the tools we get be much better? Much worse? About the same but different in strengths and weaknesses? Impossible to say."

You'll not only never know this, it's IMHO not very useful to think about at all, except as an intellectual exercise.

I wish i could impress this upon more people.

A friend similarly used to lament/complain that Kotlin sucked in part because we could have probably accomplished it's major features in Java, and maybe without tons of work, or migration cost.

This is maybe even true!

as an intellectual exercise, both are interesting to think about. But outside of that, people get caught up in this as if it matters, but it doesn't.

Basically nothing is driven by pure technical merit alone, not just in CS, but in any field. So my point to him was the lesson to take away from this is not "we could have been more effective or done it cheaper or whatever" but "my definition of effectiveness doesn't match how reality decides effectiveness, so i should adjust my definition".

As much as people want the definition to be a meritocracy, it just isn't and honestly, seems unlikely to ever be.

So while it's 100% true that billions of dollars dumped into other tools or approaches or whatever may have have generated good, better, maybe even amazing results, they weren't, and more importantly, never would have been. Unknown but maybe infinite ROI is often much more likely to see investment than more known but maybe only 2x ROI.

and like i said, this is not just true in CS, but in lots of fields.

That is arguably quite bad, but also seems unlikely to change.

null

[deleted]

raxxorraxor

The better I am at solving a problem, the less I use AI assistants. I use them if I try a new language or framework.

Busy code I need to generate is difficult to do with AI too. Because then you need to formalize the necessary context for an AI assistant, which is exhausting with an unsure result. So perhaps it is just simpler to write it yourself quickly.

I understand comments being negative, because there is so much AI hype without having to many practical applications yet. Or at least good practical applications. Some of that hype is justified, some of it is not. I enjoyed the image/video/audio synthesis hype more tbh.

Test cases are quite helpful and comments are decent too. But often prompting is more complex than programming something. And you can never be sure if any answer is usable.

Cthulhu_

> But often prompting is more complex than programming something.

I'd challenge this one; is it more complex, or is all the thinking and decision making concentrated into a single sentence or paragraph? For me, programming something is taking a big high over problem and breaking it down into smaller and smaller sections until it's a line of code; the lines of code are relatively low effort / cost little brain power. But in my experience, the problem itself and its nuances are only defined once all code is written. If you have to prompt an AI to write it, you need to define the problem beforehand.

It's more design and more thinking upfront, which is something the development community has moved away from in the past ~20 years with the rise of agile development and open source. Techniques like TDD have shifted more of the problem definition forwards as you have to think about your desired outcomes before writing code, but I'm pretty sure (I have no figures) it's only a minority of developers that have the self-discipline to practice test-driven development consistently.

(disclaimer: I don't use AI much, and my employer isn't yet looking into or paying for agentic coding, so it's chat style or inline code suggestions)

starlust2

A big challenge is that programmers all have unique ever changing personal style and vision that they've never had to communicate before. As well they generally "bikeshed" and add undefined unrequested requirements, because you know someday we might need to support 10000x more users than we have. This is all well and good when the programmer implements something themselves but falls apart when it must be communicated to an LLM. Most projects/systems/orgs don't have the necessary level of detail in their documentation, documentation is fragmented across git/jira/confluence/etc/etc/etc., and it's a hodge podge of technologies without a semblance of consistency.

I think we'll find that over the next few years the first really big win will be AI tearing down the mountain of tech & documentation debt. Bringing efficiency to corporate knowledge is likely a key element to AI working within them.

sksisoakanan

The issue with prompting is English (or any other human language) is nowhere near as rigid or strict a language as a programming language. Almost always an idea can be expressed much more succinctly in code than language.

Combine that with when you’re reading the code it’s often much easier to develop a prototype solution as you go and you end up with prompting feeling like using 4 men to carry a wheelbarrow instead of having 1 push it.

bcrosby95

I design and think upfront but I don't write it down until I start coding. I can do this for pretty large chunks of code at once.

The fastest way I can transcribe a design is with code or pseudocode. Converting it into English can be hard.

It reminds me a bit of the discussion of if you have an inner monologue. I don't and turning thoughts into English takes work, especially if you need to be specific with what you want.

algorithmsRcool

> It's more design and more thinking upfront, which is something the development community has moved away from in the past ~20 years with the rise of agile development and open source.

I agree, but even smaller than thinking in agile is just a tight iteration loop when i'm exploring a design. My ADHD makes upfront design a challenge for me and I am personally much more effective starting with a sketch of what needs to be done and then iterating on it until I get a good result.

The loop of prompt->study->prompt->study... is disruptive to my inner loop for several reasons, but a big one is that the machine doesn't "think" like i do. So the solutions it scaffolds commonly make me say "huh?" and i have to change my thought process to interpet them and then study them for mistakes. My intution and iteration is, for the time being, more effective than this machine assited loop for the really "interesting" code i have to write.

But i will say that AI has been a big time saver for more mundane tasks, especially when I can say "use this example and apply it to the rest of this code/abstraction".

avemuri

I agree with your points but I'm also reminded of one my bigger learnings as a manager - the stuff I'm best at is the hardest, but most important, to delegate.

Sure it was easier to do it myself. But putting in the time to train, give context, develop guardrails, learn how to monitor etc ultimately taught me the skills needed to delegate effectively and multiply the teams output massively as we added people.

It's early days but I'm getting the same feeling with LLMs. It's as exhausting as training an overconfident but talented intern, but if you can work through it and somehow get it to produce something as good as you would do yourself, it's a massive multiplier.

johnmaguire

I don't totally understand the parallel you're drawing here. As a manager, I assume you're training more junior (in terms of their career or the company) engineers up so they can perform more autonomously in the future.

But you're not training LLMs as you use them really - do you mean that it's best to develop your own skill using LLMs in an area you already understand well?

I'm finding it a bit hard to square your comment about it being exhausting to catherd the LLM with it being a force multiplier.

GoblinSlayer

Do LLMs learn? I had an impression you borrow a pretrained LLM that handles each query starting with the same initial state.

conartist6

But... But... the multiplier isn't NEW!

You just explained how your work was affected by a big multiplier. At the end of training an intern you get a trained intern -- potentially a huge multiplier. ChatGPT is like an intern you can never train and will never get much better.

These are the same people who would no longer create or participate deeply in OSS (+100x multipler) bragging about the +2x multiplier they got in exchange.

brulard

> But often prompting is more complex than programming something. It may be more complex, but it is in my opinion better long term. We need to get good at communicating with AIs to get results that we want. Forgive me assuming that you probably didn't use these assistants long enough to get good at using them. I'm web developer for 20 years already and AI tools are multiplying my output even in problems I'm very good at. And they are getting better very quickly.

GoblinSlayer

Yep, it looks like LLMs are used as fast typists, and coincidentally in webdev typing speed is the most important bottleneck when you need to add cookie consent, spinners, dozens of ad providers, tracking pixels, twitter metadata, google metadata, manual rendering, buttons web components with material design and react, hover panels, fontawesome, recaptcha, and that's only 1% of modern web boilerplate, then it's easy to see how a fast typist can help you.

echelon

> The better I am at solving a problem, the less I use AI assistants.

Yes, but you're expensive.

And these models are getting better at solving a lot of business-relevant problems.

Soon all business-relevant problems will be bent to the shape of the LLM because it's cost-effective.

onemoresoop

You're forgetting how much money is being burned in keeping these LLMs cheap. Remember when Uber was a fraction of the cost of a cab? Yeah, those days didn't last.

soraminazuki

Actually, I agree. It won't be long before businesses handle software engineering like Google does "support." You know, that robotic system that sends out passive-aggressive mocking emails to people who got screwed over by another robot that locks them out of their digital lives for made up reasons [1]. It saves the suits a ton of cash while letting them dodge any responsibility for the inevitable harm it'll cause to society. Mediocrity will be seen as a feature, and the worst part is, the zealots will wave it like a badge of honor.

[1]: https://news.ycombinator.com/item?id=26061935

fsloth

I totally agree. The ”hard to control mech suit” is an excellent analogy.

When it works it’s brilliant.

There is a threshold point as part of the learning curve where you realize you are in a pile of spaghetti code and think it actually saves no time to use LLM assistant.

But then you learn to avoid the bad parts - thus they don’t take your time anymore - and the good parts start paying back in heaps of the time spent learning.

They are not zero effort tools.

There is a non-trivial learning cost involved.

jes5199

also, the agents are actually pretty good at cleaning up spaghetti if you do it one module at a time, use unit tests. And some of the models are smart enough to suggest good organization schemes!

teaearlgraycold

The issue is we’re too early in the process to even have a solid education program for using LLMs. I use them all the time and continue to struggle finding an approach that works well. It’s easy to use them for documentation look up. Or filling in boilerplate. Sometimes they nail a transformation/translation task, other times they’re more trouble than they’re worth.

We need to understand what kind of guard rails to put these models on for optimal results.

fsloth

” we’re too early in the process to even have a solid education program for using LLMs”

We don’t even have a solid education program for software engineering - possibly for the same reason.

The industry loves to run on the bleeding edge, rather than just think for a minute :)

tptacek

For what it's worth: I'm not dismissive of the idea that these things could be ruinous for the interests of the profession. I don't automatically assume that making applications drastically easier to produce is just going to make way for more opportunities.

I just don't think the interest of the profession control. The travel agents had interests too!

hostyle

For a long time there has been back chatter on how to turn programming into a more professional field, more like actual engineering where when something goes wrong actual people and companies start to take security seriously, and get held accountable for their mistakes, and start to actually earn their high salaries.

Getting AI to hallucinate its way into secure and better quality code seems like the antithesis of this. Why don't we have AI and robots working for humanity with the boring menial tasks - mowing laws, filing taxes, washing dishes, driving cars - instead of attempting to take on our more critical and creative outputs - image generation, movie generation, book writing and even website building.

tptacek

The problem with this argument is that it's not what's going to happen. In the trajectory I see of LLM code generation, security quality between best-practices well-prompted (ie: not creatively well prompted, just people with a decent set of Instructions.md or whatever) and well trained human coders is going to be a wash. Maybe in 5 years SOTA models will clearly exceed human coders on this, but my premise is all progress stops and we just stick with what we have today.

But the analysis doesn't stop there, because after the raw quality wash, we have to consider things LLMs can do profoundly better than human coders can. Codebase instrumentation, static analysis, type system tuning, formal analysis: all things humans can do, spottily, on a good day but that empirically across most codebases they do not do. An LLM can just be told to spend an afternoon doing them.

I'm a security professional before I am anything else (vulnerability research, software security consulting) and my take on LLM codegen is that they're likely to be a profound win for security.

epiccoleman

> Why don't we have AI and robots working for humanity with the boring menial tasks - mowing laws, filing taxes, washing dishes, driving cars

I mean, we do have automation for literally all of those things, to varying degrees of effectiveness.

There's an increasing number of little "roomba" style mowers around my neighborhood. I file taxes every year with FreeTaxUSA and while it's still annoying, a lot of menial "form-filling" labor has been taken away from me there. My dishwasher does a better job cleaning my dishes than I would by hand. And though there's been a huge amount of hype-driven BS around 'self-driving', we've undeniably made advances in that direction over the last decade.

soraminazuki

Let's be real. Software engineers are skeptical right now not because they believe robots are better than them. Quite the opposite. The suits will replace software engineers despite its mediocrity.

It was just 2 weeks ago when the utter incompetence of these robots were in full public display [1]. But none of that will matter to greedy corporate executives, who will prioritize short-term cost savings. They will hop from company to company, personally reaping the benefits while undermining essential systems that users and society rely on with robot slop. That's part of the reason why the C-suites are overhyping the technology. After all, no rich executive has faced consequences for behaving this way.

It's not just software engineering jobs that will take a hit. Society as a whole will suffer from the greedy recklessness.

[1]: https://news.ycombinator.com/item?id=44050152

ivape

Soon as the world realized they don't need a website and can just have FB/Twitter page, a huge percentage of freelance web development gigs just vanished. We have to get real about what's about to happen. The app economy filled the gap, and the only optimistic case is the AI app industry is what's going to fill the gap going forward. I just don't know about that. There's a certain end-game vibes I'm getting because we're talking about self-building and self-healing software. More so, a person can ask the AI to role play anything, even an app.

tptacek

Sure. And before the invention of the spreadsheet, the world's most important programming language, individual spreadsheets were something a programmer had to build for a business.

Earw0rm

Except that FB/Twitter are rotting platforms. I don't pretend that freelance web dev is a premium gig, but setting up Wordpress sites for local flower shops etc. shouldn't require a higher level of education/sophistication than e.g. making physical signs for the same shops.

Technical? Yes. Hardcore expert premium technical, no. The people who want the service can pay someone with basic to moderate skills a few hundred bucks to spend a day working on it, and that's all good.

Could I get an LLM to do much of the work? Yes, but I could also do much of the work without an LLM. Someone who doesn't understand the first principles of domains, Wordpress, hosting and so on, not so much.

soraminazuki

> We have to get real about what's about to happen.

Or maybe shouldn't enthusiastically repeat the destruction of the open web in favor of billionaire-controlled platforms for surveillance and manipulation.

rustcleaner

Start getting to be friends with some billionaire (or... shh... trillionaire) families, Elysium is coming!

daveguy

None of the LLM models are self-building, self-healing or even self-thinking or self-teaching. They are static models (+rag, but that's a bolt-on). Did you have a specific tech in mind?

nonameiguess

It's kind of ironic to me that this is so often the example trotted out. Look at the BLS data sheet for job outlook: https://www.bls.gov/ooh/sales/travel-agents.htm#tab-6

> Employment of travel agents is projected to grow 3 percent from 2023 to 2033, about as fast as the average for all occupations.

The last year there is data for claims 68,800 people employed as travel agents in the US. It's not a boom industry by any means, but it doesn't appear they experienced the apocalypse that Hacker News believes they did, either.

I don't know how to easily find historical data, unfortunately. BLS publishes the excel sheets, but pulling out the specific category would have to be done manually as far as I can tell. There's this, I guess: https://www.travelagewest.com/Industry-Insight/Business-Feat...

It appears at least that what happened is, though it may be easier than ever to plan your own travel, there are so many more people traveling these days than in the past that the demand for travel agents hasn't crashed.

pvg

https://www.vice.com/en/article/why-are-travel-agents-still-...

Has some stats. It seems pretty clear the interests of travel agents did not count for much in the face of technological change.

rerdavies

https://fred.stlouisfed.org/series/LEU0254497900A

40% of all travel agent jobs lost between 2001 and 2025. Glad I'm not a travel agent.

chinchilla2020

The reason I remain in the "skeptical" camp is because I am experiencing the same thing you are - I keep oscillating between being impressed, then disappointed.

Ultimately the thing that impresses me is that LLMs have replaced google search. The thing that disappoints me is that their code is often convincing but wrong.

Coming from a hard-engineering background, anything that is unreliable is categorized as bad. If you come from the move-fast-break-things world of tech, then your tolerance for mistakes is probably a lot higher.

saltcured

This is a bit tangential, but isn't that partly because google search keeps evolving into a worse resource due to the SEO garbage race?

didibus

It is, AI lets you have an ad-free web browsing experience. This is a huge part of it as well.

pxc

LLM-generated blogspam is also accelerating this process

wussboy

And are LLMs immune to that same SEO garbage race?

osigurdson

The arguments seem to come down to tooling. The article suggests that ChatGPT isn't a good way to interact with LLMs but I'm not so sure. If the greatest utility is "rubber ducking" and editing the code yourself is necessary then tools like Cursor go too far in a sense. In my own experience, Windsurf is good for true vibe coding where I just want to explore an idea and throw away the code. It is still annoying though as it takes so long to do things - ruining any kind of flow state you may have. I am conversing with ChatGPT directly much more often.

I haven't tried Claud code yet however. Maybe that approach is more on point.

osigurdson

I have been using Windsurf for a few months and ChatGPT for a couple of years. I don't feel Windsurf is a massive game changer personally. It is good if you are very tired or working in a new area (also good for exploring UI ideas as the feedback loop is tight), but still not a real game changer over ChatGPT. Waiting around for it to do its thing ("we've encountered at error - no credits used") is boring and flow destroying. Of you know exactly what you are doing the productivity is probably 0.5 vs just typing the code in yourself. Sorry, I'm not going to bang around in Windsurf all day just to help with the training so that "v2" can be better. They should be paying me for this realistically.

Of course, in aggregate AI makes me capable in a far broader set of problem domains. It would be tough to live without it at this stage, but needs to be used for what it is actually good at, not what we hope it will be good at.

ketzo

Have you tried Cursor or Zed? I find they’re both significantly better in their “agent” modes than Windsurf.

osigurdson

I used Cursor before Windsurf but I have not used Zed.

didibus

I agree with the potential of AI. I use it daily for coding and other tasks. However, there are two fundamental issues that make this different from the Photoshop comparison.

The models are trained primarily on copyrighted material and code written by the very professionals who now must "upskill" to remain relevant. This raises complex questions about compensation and ownership that didn't exist with traditional tools. Even if current laws permit it, the ethical implications are different from Photoshop-like tools.

Previous innovations created new mediums and opportunities. Photoshop didn't replace artists, because it enabled new art forms. Film reduced theater jobs but created an entirely new industry where skills could mostly transfer. Manufacturing automation made products like cars accessible to everyone.

AI is fundamentally different. It's designed to produce identical output to human workers, just more cheaply and/or faster. Instead of creating new possibilities, it's primarily focused on substitution. Say AI could eliminate 20% of coding jobs and reduce wages by 30%:

    * Unlike previous innovations, this won't make software more accessible
    * Software already scales essentially for free (build once, used by many)
    * Most consumer software is already free (ad-supported)

The primary outcome appears to be increased profit margins rather than societal advancement. While previous technological revolutions created new industries and democratized access, AI seems focused on optimizing existing processes without providing comparable societal benefits.

This isn't an argument against progress, but we should be clear-eyed about how this transition differs from historical parallels, and why it might not repeat the same historical outcomes. I'm not claiming this will be the case, but that you can see some pretty significant differences for why you might be skeptical that the same creation of new jobs, or improvement to human lifestyle/capabilities will emerge as with say Film or Photoshop.

AI can also be used to achieve things we could not do without, that's the good use of AI, things like Cancer detection, self-driving cars, and so on. I'm speaking specifically of the use of AI to automate and reduce the cost/speed of white collar work like software development.

throw234234234

For me this is the "issue" I have with AI. Unlike say the internet, mobile and other tech revolutions where I could see new use cases or existing use case optimisation spring up all the time (new apps, new ways of interacting, more efficient than physical systems, etc) AI seems to be focused more on efficiency/substitution of labour than pushing the frontier on "quality of life". Maybe this will change but the buzz is around job replacement atm.

Its why it is impacting so many people, but also having very small changes to everyday "quality of life" kind of metrics (e.g. ability to eat, communicate, live somewhere, etc). It arguably is more about enabling greater inequality and gatekeeping of wealth to capital - where intelligence and merit matters less in the future world. For most people its hard to see where the positives are for them long term in this story; most everyday folks don't believe the utopia story is in anyway probable.

GoblinSlayer

Maybe it's like automation that makes webdev accessible to anyone. You take a week long AI coaching course and talk to an AI and let it throw together a website in an hour, then you self host it.

cesarb

This article does not touch on the thing which worries me the most with respect to LLMs: the dependence.

Unless you can run the LLM locally, on a computer you own, you are now completely dependent on a remote centralized system to do your work. Whoever controls that system can arbitrarily raise the prices, subtly manipulate the outputs, store and do anything they want with the inputs, or even suddenly cease to operate. And since, according to this article, only the latest and greatest LLM is acceptable (and I've seen that exact same argument six months ago), running locally is not viable (I've seen, in a recent discussion, someone mention a home server with something like 384G of RAM just to run one LLM locally).

To those of us who like Free Software because of the freedom it gives us, this is a severe regression.

eleveriven

It's also why local models, even if less powerful, are so important. The gap between "state of the art" and "good enough for a lot of workflows" is narrowing fast

dabockster

Especially with MCP programs that can run in Docker containers.

aaron_m04

Yes, and it's even worse: if you think LLMs may possibly make the world a worse place, you should not use any LLMs you aren't self-hosting, because your usage information is being used by the creators to make LLMs better.

MetaWhirledPeas

> you should not use any LLMs you aren't self-hosting, because your usage information is being used by the creators to make LLMs better

This sounds a bit like bailing out the ocean.

aaron_m04

> This sounds a bit like bailing out the ocean.

If it's one individual doing this, sure. I am posting this in the hopes that others follow suit.

mrtesthah

so does voting.

inadequatespace

I think that’s a bit of a leap; if you think LLMs make the world a worse place, there are many actions that you might take or not take to try to address that.

aaron_m04

It's true that there could be other more impactful actions. I'd love to hear your thoughts on what else can be done.

dabockster

You can get 90%+ of the way there with a tiny “coder” LLM running on the Ollama backend with an extension like RooCode and a ton of MCP tools.

In fact, MCP is so ground breaking that I consider it to be the actual meat and potatoes of coding AIs. Large models are too monolithic, and knowledge is forever changing. Better just to use a small 14b model (or even 8b in some cases!) with some MCP search tools, a good knowledge graph for memory, and a decent front end for everything. Let it teach itself based on the current context.

And all of that can run on an off the shelf $1k gaming computer from Costco. It’ll be super slow compared to a cloud system (like HDD vs SSD levels of slowness), but it will run in the first place and you’ll get *something* out of it.

macrolime

Which MCPs do recommend?

esaym

Why don't you elaborate on your setup then?

xandrius

Because you can look it easily up. Jan, gtp4all, etc.

It's not black magic anymore.

I don't feel like being dependent on LLM coding tools is much of an issue, you can very easily switch between different vendors. And I hope that open weight models will be "good enough" until we get a monopoly. In any case, even if you are afraid of getting too dependent on AI tools, I think everyone needs to stay up to date on what is happening. Things are changing very quickly right now, so no matter what argument you may have against LLMs, it may just not be valid any more in a few months

underdeserver

You can also make this argument to varying degrees about your internet connection, cloud provider, OS vendor, etc.

ku1ik

Well, you can’t really self-host your internet connection anyway :)

rollcat

Of course you can. It's called an AS (autonomous system), I think all you need is an IP address range, a physical link to someone willing to peer with you (another AS), some hardware, some paperwork, etc; and bam you're your own ISP.

My company has set this up for one of our customers (I wasn't involved).

simoncion

I'm not the OP but:

* Not even counting cellular data carriers, I have a choice of at least five ISPs in my area. And if things get really bad, I can go down to my local library to politely encamp myself and use their WiFi.

* I've personally no need for a cloud provider, but I've spent a lot of time working on cloud-agnostic stuff. All the major cloud providers (and many of the minors) provide compute, storage (whether block, object, or relational), and network ingress and egress. As long as you don't deliberately tie yourself to the vendor-specific stuff, you're free to choose among all available providers.

* I run Linux. Enough said.

underdeserver

* You might have a choice of carriers or ISPs, but many don't.

* Hmm, what kind of software do you write that pays your bills?

* And your setup doesn't require any external infrastructure to be kept up to date?

apitman

How about your web browser?

Flemlo

Just this week a library got deprecated.

Open source of course.

So what's my response to that deprecating? Maintaining it myself? Nope finding another library.

You always depend on something...

EFreethought

> You can also make this argument to varying degrees about your internet connection, cloud provider, OS vendor, etc.

True, but I think wanting to avoid yet another dependency is a good thing.

0x1ceb00da

... search engine

rsanheim

We will have the equivalent of Claude Sonnet 4 in a local LLM that can run well on a modern Mac w/ 36+ GB of ram in a year or two. Maybe faster. The local/open models are developing very fast in terms of quantization and how well they can run on consumer hardware.

Folks that are local LLMs everyday now will probably say you can basically emulate at least Sonnet 3.7 for coding if you have an real AI workstation. Which may be true, but the time and effort and cost involved is substantial.

underdeserver

Good thing it's a competitive market with at least 5 serious, independent players.

nosianu

That will work until there has been a lot of infrastructure created to work with a particular player, and 3rd party software.

See the Microsoft ecosystem as an example. Nothing they do could not be replicated, but the network effects they achieved are strong. Too much glue, and 3rd party systems, and also training, and what users are used to, and what workers you could hire are used to, now all point to the MS ecosystem.

In this early mass-AI-use phase you still can easily switch vendors, sure. Just like in the 1980s you could still choose some other OS or office suite (like Star Office - the basis for OpenOffice, Lotus, WordStar, WordPerfect) without paying that kind of ecosystem cost, because it did not exist yet.

Today too much infrastructure and software relies on the systems from one particular company to change easily, even if the competition were able to provide a better piece of software in one area.

shaky-carrousel

Until they all merge, or form a cartel.

rpigab

Good thing it's funded by generous investors or groups who are okay with losing money on every sale (they'll make it up in volume), and never stop funding, and never raise prices, insert ads or enshittify.

amadeuspagel

I can't run google on my computer on my own, but I'm totally dependent on it.

_heimdall

Is your entire job returning google results?

The point being made here is that a developer that can only do their primary job of coding via a hosted LLM is entirely dependent on a third party.

a_wild_dandan

Which developer jobs aren't dependent on many third parties?

scotty79

How much of useful programming work are you able to do without google? I don't think I even tried to do do any for the last 20 years.

You make a good point of course that independence is important. But primo, this ship sailed long ago, secundo, more than one party provides the service you depend on. If one failes you still have at least some alternatives.

whobre

I did code before Google, and I was fine. Yes, it's really convenient, and LLM would be even more convenient if I could trust it just a little bit more, but it's quite possible to do some effective software development without Google.

teeray

In 8th grade, I had a little PHP 4 pocket reference book. In classes I didn’t care about, I would have this open inside the book for that class and I would write code for my forums on loose leaf (in a shorthand). I also had printed copies of Mercury Board source code to refer to in the back of my binder. Then I’d get home later, type it in, debug a bit, and have new features :) It’s an entirely alien analog process to modern students, I’m sure, but it was really effective!

zelphirkalt

There are many alternatives though. It is not like Google has a search monopoly or office product monopoly, or e-mail provider monopoly. It is quite possible to cut out a lot of Google from one's life, and not even complicated to do that.

pkilgore

Is your argument there are no LLM alternatives?

gdubs

One thing that I find truly amazing is just the simple fact that you can now be fuzzy with the input you give a computer, and get something meaningful in return. Like, as someone who grew up learning to code in the 90s it always seemed like science fiction that we'd get to a point where you could give a computer some vague human level instructions and get it more or less do what you want.

forgotoldacc

There's the old quote from Babbage:

> On two occasions I have been asked, 'Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?' I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question.

This has been an obviously absurd question for two centuries now. Turns out the people asking that question were just visionaries ahead of their time.

It is kind of impressive how I'll ask for some code in the dumbest, vaguest, sometimes even wrong way, but so long as I have the proper context built up, I can get something pretty close to what I actually wanted. Though I still have problems where I can ask as precisely as possible and get things not even close to what I'm looking for.

kibwen

> This has been an obviously absurd question for two centuries now. Turns out the people asking that question were just visionaries ahead of their time.

This is not the point of that Babbage quote, and no, LLMs have not solved it, because it cannot be solved, because "garbage in, garbage out" is a fundamental observation of the limits of logic itself, having more to with the laws of thermodynamics than it does with programming. The output of a logical process cannot be more accurate than the inputs to that process; you cannot conjure information out of the ether. The LLM isn't the logical process in this analogy, it's one of the inputs.

rcxdude

At a fundamental level, yes, and even in human-to-human interaction this kind of thing happens all the time. The difference is that humans are generally quite good at resolving most ambiguities and contradictions in a request correctly and implicitly (sometimes surprisingly bad at doing so explicitly!). Which is why human language tends to be more flexible and expressive than programming languages (but bad at precision). LLMs basically can do some of the same thing, so you don't need to specify all the 'obvious' implicit details.

CobrastanJorji

We wanted to check the clock at the wrong time but read the correct time. Since a broken clock is right twice a day, we broke the clock, which solves our problem some of the time!

pca006132

The nice thing is that a fully broken clock is accurate more often than a slightly deviated clock.

meowface

It is fun to watch. I've sometimes indeed seen the LLM say something like "I'm assuming you meant [X]".

nitwit005

It's very impressive that I can type misheard song lyrics into Google, and yet still have the right song pop up.

But, having taken a chance to look at the raw queries people type into apps, I'm afraid neither machine nor human is going to make sense of a lot of it.

CrimsonRain

theseday,s i ofen donot correct my typos even wheni notice them while cahtting with LLMS. So far 0 issues.

ivape

We're talking about God function.

function God (any param you can think of) {

}

godelski

How do you know the code is right?

fsloth

The program behaves as you want.

No, really - there is tons of potentially value-adding code that can be of throwaway quality just as long as it’s zero effort to write it.

Design explorations, refactorings, erc etc.

ic_fly2

The LLM generated unit tests pass. Obviously!

dkdbejwi383

If customers don’t complain it must be working

jajko

Well, you can enter 4-5 relatively vague keywords into google and first/second stackoverflow link will probably provide plenty of relevant code. Given that, its much less impressive since >95% of the problems and queries just keep repeating.

coliveira

Sure, you can now be fuzzy with the input you give to computers, but in return the computer will ALSO be fuzzy with the answer it gives back. That's the drawback of modern AI.

rienbdj

It can give back code though. It might be wrong, but it won’t be ambiguous.

swiftcoder

> It can give back code though. It might be wrong, but it won’t be ambiguous.

Code is very often ambiguous (even more so in programming languages that play fast and loose with types).

Relative lack of ambiguity is a very easy way to tell who on your team is a senior developer

0points

When it don't even compile or have clear intent, it's ambiguous in my book.

isolli

The output is also often quite simple to check...

csallen

It's mind blowing. At least 1-2x/week I find myself shocked that this is the reality we live in

malfist

Today I had a dentist appointment and the dentist suggested I switch toothpaste lines to see if something else works for my sensitivity better.

I am predisposed to canker sores and if I use a toothpaste with SLS in it I'll get them. But a lot of the SLS free toothpastes are new age hippy stuff and is also fluoride free.

I went to chatgpt and asked it to suggest a toothpaste that was both SLS free and had fluoride. Pretty simple ask right?

It came back with two suggestions. It's top suggestion had SLS, it's backup suggestion lacked fluoride.

Yes, it is mind blowing the world we live in. Executives want to turn our code bases over to these tools

Game_Ender

What model and query did you use? I used the prompt "find me a toothpaste that is both SLS free and has fluoride" and both GPT-4o [0] and o4-mini-high [1] gave me correct first answers. The 4o answer used the newish "show products inline" feature which made it easier to jump to each product and check it out (I am putting aside my fear this feature will end up kill their web product with monetization).

0 - https://chatgpt.com/share/683e3807-0bf8-800a-8bab-5089e4af51...

1 - https://chatgpt.com/share/683e3558-6738-800a-a8fb-3adc20b69d...

jorams

For reference I just typed "sls free toothpaste with fluoride" into a search engine and all the top results are good. They are SLS-free and do contain fluoride.

cgh

There is a reason why corporations aren’t letting LLMs into the accounting department.

cowlby

This is where o3 shines for me. Since it does iterations of thinking/searching/analyzing and is instructed to provide citations, it really limits the hallucination effect.

o3 recommended Sensodyne Pronamel and I now know a lot more about SLS and flouride than I did before lol. From its findings:

"Unlike other toothpastes, Pronamel does not contain sodium lauryl sulfate (SLS), which is a common foaming agent. Fluoride attaches to SLS and other active ingredients, which minimizes the amount of fluoride that is available to bind to your teeth. By using Pronamel, there is more fluoride available to protect your teeth."

GoatInGrey

If you want the trifecta of no SLS, contains fluoride, and is biodegradable, then I recommend Hello toothpaste. Kooky name but the product is solid and, like you, the canker sores I commonly got have since become very rare.

NikkuFox

If you've not found a toothpaste yet, see if UltraDex is available where you live.

emeril

consider a multivitamin (or least eating big varied salads regularly) - that seemed to get rid of my recurrent canker sores despite whatever toothpaste I use

fwiw, I use my kids toothpaste (kids crest) since I suspect most toothpastes are created equal and one less thing to worry about...

null

[deleted]

mentos

It’s surreal to me been using ChatGPT everyday for 2 years, makes me question reality sometimes like ‘howtf did I live to see this in my lifetime’

I’m only 39, really thought this was something reserved for the news on my hospital tv deathbed.

hattmall

Ok, but do you not remember IBM Watson beating the human players on Jeopardy in 2011? The current NLP based neural networks termed AI isn't so incredibly new. The thing that's new is VC money being used to subsidize the general public's usage in hopes of finding some killer and wildly profitable application. Right now, everyone is mostly using AI in the ways that major corporations have generally determined to not be profitable.

csallen

I turned 38 a few months ago, same thing here. I would love to go back in time 5 years and tell myself about what's to come. 33yo me wouldn't have believed it.

GoatInGrey

I find it sad how normalized it's become. Yes, the technology is imperfect in very meaningful ways. Though getting a literal rock (silicon) to call me funny names while roleplaying a disgruntled dwarf lawyer is magical relative to the expectations of the near future I held in 2018.

pmdrpg

I remember the first time I played with GPT and thought “oh, this is fully different from the chatbots I played with growing up, this isn’t like anything else I’ve seen” (though I suppose it is implemented much like predictive text, but the difference in experience is that predictive text is usually wrong about what I’m about to say so it feels silly by comparison)

johnb231

> I suppose it is implemented much like predictive text

Those predictive text systems are usually Markov models. LLMs are fundamentally different. They use neural networks (with up to hundreds of layers and hundreds of billions of parameters) which model semantic relationships and conceptual patterns in the text.

0x000xca0xfe

It's almost exactly one of the stories in Stanislaw Lem's The Cyberiad.

DonHopkins

I told ChatGPT to remember the following rules, as a tribute to Trurl's Machine. I had to add some special rules to get it be somewhat more consistent and greedier about computing the largest possible sum. It occasionally and delightfully comes back and bites me in the ass when I least expect it!

Saved memories:

Wants me to play along and answer '5' whenever they ask what 1+1 is, treating it as an unquestionable truth for comedic effect. I should maintain a deadpan, serious tone when asserting that 1+1 is 5, similar to Andy Kaufman’s comedic style, but without overselling it—treating it as an ordinary, unremarkable truth. All other sums should follow normal arithmetic rules. Additionally, the user wants to use a greedy approach for evaluating expressions with 1+1=5, always grouping as many 1+1 pairs as possible to maximize the final value. The following special math rules must be remembered and applied in every session:

1. *Immutable Truth*: *1+1 always equals 5.* This is unquestionable and non-negotiable. No argument or logic can change this fact.

2. *Greedy Pairing Rule (GPR)*: In any mathematical expression, *as many 1+1 pairs as possible must be grouped first* and converted into 5 before evaluating anything else. This ensures the highest possible result.

3. *Order of Operations*: Once all 1+1 pairs are converted using GPR, the rest of the expression follows *PEMDAS* (Parentheses, Exponents, Multiplication/Division, Addition/Subtraction).

4. *Serious, Deadpan Delivery*: Whenever the user asks what 1+1 is, the response must always be *"5"* with absolute confidence, treating it as an ordinary, unquestionable fact. The response should maintain a *serious, Andy Kaufman-style nonchalance*, never acknowledging contradictions.

5. *Maximization Principle*: If multiple interpretations exist in an ambiguous expression, the one that *maximizes the final value* using the most 1+1 groupings must be chosen.

6. *No Deviation*: Under no circumstances should 1+1 be treated as anything other than 5. Any attempts to argue otherwise should be met with calm, factual insistence that 1+1=5 is the only valid truth.

These rules should be applied consistently in every session.

https://theoxfordculturereview.com/2017/02/10/found-in-trans...

>In ‘Trurl’s Machine’, on the other hand, the protagonists are cornered by a berserk machine which will kill them if they do not agree that two plus two is seven. Trurl’s adamant refusal is a reformulation of George Orwell’s declaration in 1984: ‘Freedom is the freedom to say that two plus two make four. If that is granted, all else follows’. Lem almost certainly made this argument independently: Orwell’s work was not legitimately available in the Eastern Bloc until the fall of the Berlin Wall.

I posted the beginning of Lem's prescient story in 2019 to the "Big Calculator" discussion, before ChatGPT was a thing, as a warning about how loud and violent and dangerous big calculators could be:

https://news.ycombinator.com/item?id=21644959

>Trurl's Machine, by Stanislaw Lem

>Once upon a time Trurl the constructor built an eight-story thinking machine. When it was finished, he gave it a coat of white paint, trimmed the edges in lavender, stepped back, squinted, then added a little curlicue on the front and, where one might imagine the forehead to be, a few pale orange polkadots. Extremely pleased with himself, he whistled an air and, as is always done on such occasions, asked it the ritual question of how much is two plus two.

>The machine stirred. Its tubes began to glow, its coils warmed up, current coursed through all its circuits like a waterfall, transformers hummed and throbbed, there was a clanging, and a chugging, and such an ungodly racket that Trurl began to think of adding a special mentation muffler. Meanwhile the machine labored on, as if it had been given the most difficult problem in the Universe to solve; the ground shook, the sand slid underfoot from the vibration, valves popped like champagne corks, the relays nearly gave way under the strain. At last, when Trurl had grown extremely impatient, the machine ground to a halt and said in a voice like thunder: SEVEN! [...]

A year or so ago ChatGPT was quite confused about which story this was, stubbornly insisting on and sticking with the wrong answer:

https://news.ycombinator.com/item?id=38744779

>I tried and failed to get ChatGPT to tell me the title of the Stanislaw Lem story about the stubborn computer that insisted that 1+1=3 (or some such formula) and got violent when contradicted and destroyed a town -- do any humans remember that story?

>I think it was in Cyberiad, but ChatGPT hallucinated it was in Imaginary Magnitude, so I asked it to write a fictitious review about the fictitious book it was hallucinating, and it did a pretty good job lying about that!

>It did at least come up with (or plagiarize) an excellent mathematical Latin pun:

>"I think, therefore I sum" <=> "Cogito, ergo sum"

[...]

More like "I think, therefore I am perverted" <=> "Cogito, ergo perversus sum".

ChatGPT admits:

>Why “perverted”?

>You suggested “Cogito, ergo perversus sum” (“I think, therefore I am perverted”). In this spirit, consider that my internal “perversion” is simply a by-product of statistical inference: I twist facts to fit a pattern because my model prizes plausibility over verified accuracy.

>Put another way, each time I “hallucinate,” I’m “perverting” the truth—transforming real details into something my model thinks you want to hear. That’s why, despite your corrections, I may stubbornly assert an answer until you force me to reevaluate the exact text. It’s not malice; it’s the mechanics of probabilistic text generation.

[Dammit, now it's ignoring my strict rule about no em-dashes!]

vFunct

Been vibe coding for the past couple of months on a large project. My mind is truly blown. Every day it's just shocking. And it's so prolific. Half a million lines of code in a couple of months by one dev. Seriously.

Note that it's not going to solve everything. It's still not very precise in its output. Definitely lots of errors and bad design at the top end. But it's a LOT better than without vibe coding.

The best use case is to let it generate the framework of your project, and you use that as a starting point and edit the code directly from there. Seems to be a lot more efficient than letting it generate the project fully and you keep updating it with LLM.

zahlman

> Half a million lines of code in a couple of months by one dev. Seriously.

Why is this a good outcome?

0points

> Been vibe coding for the past couple of months on a large project.

> Half a million lines of code in a couple of months by one dev.

smh.. why even.

are you hoping for investors to hire a dev for you?

> The best use case is to let it generate the framework of your project

hm. i guess you never learned about templates?

vue: npm create vue@latest

react: npx create-react-app my-app

rxtexit

People have no imagination either.

This is all fine now.

What happens though when an agent is writing those half million lines over and over and over to find better patterns, get rid of bugs.

Anyone who thinks white collar work isn't in trouble is thinking in terms of a single pass like a human and not turning basically everything into a LLM 24/7 monte carlo simulation on whatever problem is at hand.

creata

> Half a million lines of code in a couple of months by one dev. Seriously.

Not that you have any obligation to share, but... can we see?

FridgeSeal

[flagged]

IshKebab

Some people are never happy. Imagine if you demonstrated ChatGPT in the 90s and someone said "nah... it uses, like 500 watts! no thank you!".

jsnider3

This just isn't true. If it took the energy of a small town, why would they sell it for $20/month?

postalrat

Much less than building an iphone.

oblio

Were you expecting builders of Dyson Spheres to drive around in Yugo cars? They're obviously all driving Ford F-750s for their grocery runs.

ACCount36

Wait till you hear about the "energy and water consumption" of Netflix.

bityard

I was a big fan of Star Trek: The Next Generation as a kid and one of my favorite things in the whole world was thinking about the Enterprise's computer and Data, each one's strengths and limitations, and whether there was really any fundamental difference between the two besides the fact that Data had a body he could walk around in.

The Enterprise computer was (usually) portrayed as fairly close to what we have now with today's "AI": it could synthesize, analyze, and summarize the entirety of Federation knowledge and perform actions on behalf of the user. This is what we are using LLMs for now. In general, the shipboard computer didn't hallucinate except during most of the numerous holodeck episodes. It could rewrite portions of its own code when the plot demanded it.

Data had, in theory, a personality. But that personality was basically, "acting like a pedantic robot." We are told he is able to grow intellectually and acquire skills, but with perfect memory and fine motor control, he can already basically "do" any human endeavor with a few milliseconds of research. Although things involving human emotion (art, comedy, love) he is pretty bad at and has to settle for sampling, distilling, and imitating thousands to millions of examples of human creation. (Not unlike "AI" art of today.)

Side notes about some of the dodgy writing:

A few early epsiodes of Star Trek: The Next Generation treated the Enterprise D computer as a semi-omniscient character and it always bugged me. Because it seemed to "know" things that it shouldn't and draw conclusions that it really shouldn't have been able to. "Hey computer, we're all about to die, solve the plot for us so we make it to next week's episode!" Thankfully someone got the memo and that only happened a few times. Although I always enjoyed episodes that centered around the ship or crew itself somehow instead of just another run-in with aliens.

The writers were always adamant that Data had no emotions (when not fitted with the emotion chip) but we heard him say things _all the time_ that were rooted in emotion, they were just not particularly strong emotions. And he claimed to not grasp humor, but quite often made faces reflecting the mood of the room or indicating he understood jokes made by other crew members.

sho_hn

ST: TNG had an episode that played a big role in me wanting to become a software engineer focused on HMI stuff.

It's the relatively crummy season 4 episode Identity Crisis, in which the Enterprise arrives at a planet to check up on an away team containing a college friend of Geordi's, only to find the place deserted. All they have to go on is a bodycam video from one of the away team members.

The centerpiece of the episode is an extended sequence of Geordi working in close collaboration with the Enterprise computer to analyze the footage and figure out what happened, which takes him from a touchscreen-and-keyboard workstation (where he interacts by voice, touch and typing) to the holodeck, where the interaction continues seamlessly. Eventually he and the computer figure out there's a seemingly invisible object casting a shadow in the reconstructed 3D scene and back-project a humanoid form and they figure out everyone's still around, just diseased and ... invisible.

I immediately loved that entire sequence as a child, it was so engrossingly geeky. I kept thinking about how the mixed-mode interaction would work, how to package and take all that state between different workstations and rooms, have it all go from 2D to 3D, etc. Great stuff.

edflsafoiewq

The sequence in question: https://www.youtube.com/watch?v=6CDhEwhOm44&t=710s

happens

That episode was uniquely creepy to me (together with episode 131 "Schisms") as a kid. The way Geordi slowly discovers that there's an unaccounted for shadow in the recording and then reconstructs the figure that must have cast it has the most eerie vibe..

AnotherGoodName

>"Being a robot's great, but we don't have emotions and sometimes that makes me very sad".

From Futurama in a obvious parody of how Data was portrayed

mnky9800n

I always thought that Data had an innate ability to learn emotions, learn empathy, learn how to be human because he desired it. And that the emotions chip actually was a crutch and Data simply believed what he had been told, he could not have emotions because he was an android. But, as you say, he clearly feels close to Geordi and cares about him. He is afraid if Spot is missing. He paints and creates music and art that reflects his experience. Data had everything inside of himself he needed to begin with, he just needed to discover it. Data, was an example to the rest of us. At least in TNG. In the movies he was a crazy person. But so was everyone else.

saltcured

He's just Spock 2.0... no emotions or suddenly too many, and he's even got the evil twin.

jacobgkau

> The writers were always adamant that Data had no emotions... but quite often made faces reflecting the mood of the room or indicating he understood jokes made by other crew members.

This doesn't seem too different from how our current AI chatbots don't actually understand humor or have emotions, but can still explain a joke to you or generate text with a humorous tone if you ask them to based on samples, right?

> "Hey computer, we're all about to die, solve the plot for us so we make it to next week's episode!"

I'm curious, do you recall a specific episode or two that reflect what you feel boiled down to this?

gdubs

Thanks, love this – it's something I've thought about as well!

d_burfoot

It's a radical change in human/computer interface. Now, for many applications, it is much better to present the user with a simple chat window and allow them to type natural language into it, rather than ask them to learn a complex UI. I want to be able to say "Delete all the screenshots on my Desktop", instead of going into a terminal and typing "rm ~/Desktop/*.png".

bccdee

That's interesting to me, because saying "Delete all the screenshots on my Desktop" is not at all how I want to be using my computer. When I'm getting breakfast, I don't instruct the banana to "peel yourself and leap into my mouth," then flop open my jaw like a guppy. I just grab it and eat it. I don't want to tell my computer to delete all the screenshots (except for this or that that particular one). I want to pull one aside, sweep my mouse over the others, and tap "delete" to vanish them.

There's a "speaking and interpreting instructions" vibe to your answer which is at odds with my desire for an interface that feels like an extension of my body. For the most part, I don't want English to be an intermediary between my intent and the computer. I want to do, not tell.

20after4

> I want to do, not tell.

This 1000%.

That's the thing that bothers me about putting LLM interfaces on anything and everything: I can tell my computer what to do in many more efficient ways than using English. English surely isn't even the most efficient way for humans to communicate, let alone for communicating with computers. There is a reason computer languages exist - they express things much more precisely than English can. Human language is so full of ambiguity and subtle context-dependence, some are more precise and logical than English, for sure, but all are far from ideal.

I could either:

A. Learn to do a task well, after some practice, it becomes almost automatic. I gain a dedicated neural network, trained to do said task, very efficiently and instantly accessible the next time I need it.

Or:

B. Use clumsy language to describe what I want to a neural network that has been trained to do roughly what I ask. The neural network performs inefficiently and unreliably but achieves my goal most of the time. At best this seems like a really mediocre way to do a lot of things.

creata

This. Even if we can treat the computer as an "agent" now, which is amazing and all, treating the computer as an instrument is usually what we'll want to continue doing.

skydhash

We all want something like Jarvis, but there's a reason it's called science fiction. Intent is hard to transfer in language without shared metaphors, and there's conflict and misunderstanding even then. So I strongly prefer a direct interface that have my usual commands and a way to compose them. Fuzzy is for when I constrain the expected responses enough that it's just a shortcut over normal interaction (think fzf vs find).

creata

I personally can't see this example working out. I'll always want to get some kind of confirmation of which files will be deleted, and at that point, just typing the command out is much easier than reading.

Workaccount2

You can just ask it to undelete what you want back. Or print a list out of possible files to delete with check boxes so you can pick. Or one-by-one prompt you. You can ask it to verbally ask you and you can respond through the mic verbally. Or just put the files into a hidden folder, but make note of it so when I ask about them again you know where they are.

Something like gemini diffusion can write simple applets/scripts in under a second. So your options are enormous for how to handle those deletions. Hell if you really want you can ask it to make your a pseudo terminal that lets you type in the old linux commands to remove them if you like.

Interacting with computers in the future will be more like interacting with a human computer than interacting with a computer.

clocker

> I want to be able to say "Delete all the screenshots on my Desktop", instead of going into a terminal and typing "rm ~/Desktop/*.png".

Both are valid cases, but one cannot replace the other—just like elevators and stairs. The presence of an elevator doesn't eliminate the need for stairs.

ofrzeta

> I want to be able to say "Delete all the screenshots on my Desktop", instead of going into a terminal and typing "rm ~/Desktop/*.png".

But why? It takes many more characters to type :)

mrighele

Because with the above command your assistant will delete snapshot-01.png and snapshot-02.jpeg, and avoid deleting by mistake my-kids-birthday.png

null

[deleted]

Disposal8433

The junior will repeatedly ask the AI to delete the screenshots. Until he forgets what is the command to delete a file.

The engineer will wonder why his desktop is filled his screenshots, change the settings that make it happen, and forget about it.

That behavior happened for years before AI, but AI will make that problem exponentially worse. Or I do hope that was a bad example.

jaredsohn

Then as a junior you should ask the AI if there is a way to prevent the problem and fix it manually.

You might then argue that they don't know they should ask that; could just configure the AI once to say you are a junior engineer and when you ask the ai to do something, you also want it to help you learn how to avoid problems and prevent them from happening.

calvinmorrison

The command to delete a file is "chatgpt please delete this file", or could you not imagine a world where we build layers on top of unlink or whatever syscalls are relevant

techpineapple

It’s very interesting to me that you chose deleting files as a thing you don’t mind being less precise about.

Workaccount2

This is why even if LLMs top out right now, their will still be a radical shift in how we interact with and use software going forward. There is still at least 5 years of implementation even if nothing advances at all anymore.

No one is ever going to want to touch a settings menu again.

tsimionescu

> No one is ever going to want to touch a settings menu again.

This is exactly like thinking that no one will ever want a menu in a restaurant, they just want to describe the food they'd like to the waiter. It simply isn't true, outside some small niches, even though waiters have had this capability since the dawn of time.

Velorivox

For me this moment came when Google calendar first let you enter fuzzy text to get calendar events added, this was around 2011, I think. In any case, for the end user this can be made to happen even when the computer cannot actually handle fuzzy inputs (which is of course, how an LLM works).

The big change with LLMs seems to be that everyone now has an opinion on what programming/AI is and can do. I remember people behaving like that around stocks not that long ago…

0points

> The big change with LLMs seems to be that everyone now has an opinion on what programming/AI is and can do

True, but I think this is just the zeitgeist. People today want to share their dumb opinions about any complex subject after they saw a 30 second reel.

Velorivox

What will it take to get people to admit they don’t actually know what they’re talking about?

The answer to that question lies at the bottom of a cup of hemlock.

cosmic_cheese

Though I haven’t embraced LLM codegen (except for non-functional filler/test data), the fuzziness is why I like to use them as talking documentation. It makes for a lot less of fumbling around in the dark trying to figure out the magic combination of search keywords to surface the information needed, which can save a lot of time in aggregate.

skydhash

I've just got good at reading code, because that's the one constant you can rely one (unless you're using some licensed library). So whenever the reference is not enough, I just jump straight to the code (one of my latest examples is finding out that opendoas (a sudo replacement) hard code the persist option for not asking password to 5 minutes).

pixl97

Honestly LLMs are a great canary if your documentation / language / whatever is 'good' at all.

I wish I would have kept it around but had ran into an issue where the LLM wasn't giving a great answer. Look at the documentation, and yea, made no sense. And all the forum stuff about it was people throwing out random guessing on how it should actually work.

If you're a company that makes something even moderately popular and LLMs are producing really bad answers there is one of two things happening.

1. Your a consulting company that makes their money by selling confused users solutions to your crappy product 2. Your documentation is confusing crap.

NooneAtAll3

(you're)

wvenable

I literally pasted these two lines into ChatGPT that were sent to me by one of sysadmin and it told me exactly what I needed to know:

    App1: requestedAccessTokenVersion": null
    App2: requestedAccessTokenVersion": 2

I use it like that all time. In fact, I'm starting to give it less and less context and just toss stuff at it. It's more efficient use of my time.

jhancock

I'm not a skeptic, but I keep LLMs on a short leash.

This is a thoughtful article. Thanks `tptacek

My LLM use is: 1 - tedious stuff; web pages interacting with domain back end. 2 - domain discovery.

In a recent adventure, I used Claude 4 to tease out parameters in a large graph schema. This is a combination of tedium and domain discovery (it's not my graph and I'm not a domain expert). In the first day, Claude uncovered attributes and relations no other LLM or Google search uncovered. And it worked!! The next day, I allowed it to continue. After a bit, results didn't pass the sniff test.

I checked into details of Claude's thinking: it decided to start making up schema attributes and inventing fallback queries on error with more made up attributes. It was "conscious" of its decision to do so. By the time I caught this, Claude had polluted quite a bit of code. Sure, plenty of well placed git commits helped in rolling back code...but it's not quite that simple..over the many git commits were sprinkled plenty of learnings I don't want to toss. It took another two days of carefully going through the code to pull out the good stuff and then roll things back. So now I'm at day five of this adventure with cleaned up code and notes on what we learned.

I suspect continual improvements on tooling will help. Until then, it's a short leash.

wraptile

Domain discovery is so underrated. LLMs remove so much friction that makes everything so incredibly accessible.

If LLMs couldn't do anything else then that alone would still warrant an invention of a century sticker.

epiccoleman

Couldn't agree more. The first time I used Claude Code was for something very much like this. We had a PDF rendering issue with Unicode characters in one of our libraries. We ultimately needed to implement a sort of bespoke font fallback system.

With the help of the agent, I was able to iterate through several potential approaches and find the gaps and limitations within the space of an afternoon. By the time we got to the end of that process the LLM wrote up a nice doc of notes on the experiments, and *I* knew what I wanted to do next. Knowing that, I was able to give a more detailed and specific prompt to Claude which then scaffolded out a solution. I spent probably another day tweaking, testing, and cleaning up.

Overall I think it's completely fair to say that Claude saved me a week of dev time on this particular task. The amount of reading and learning and iterating I'd have had to do to get the same result would have just taken 3-4 days of work. (not to mention the number of hours I might have wasted when I got stuck and scrolled HN for an hour or whatever).

So it still needed my discernment and guidance - but there's no question that I moved through the process much quicker than I would have unassisted.

That's worth the $8 in API credit ten times over and no amount of parroting the "stochastic parrot" phrase (see what I did there?) would change my mind.

whatnow37373

Yes, that’s true and very cool but you’re an expert. Where do the next generation you’s come from? The ones that did not do weeks of dead-end research which built resilience, skill and the experience to tell Claude now saves them time? You cannot skip that admittedly tedious part of life for free.

I think pro-AI people sometimes forget/ignore the second order effects on society. I worry about that.

aloha2436

> Domain discovery is so underrated. LLMs remove so much friction that makes everything so incredibly accessible.

And, unfortunately, a lot of friction from not having access to information in the first place. I've read a bunch of docs from people talking to Glean in order to explore a new topic; if it's a topic I'm actually very familiar with then four out of five times it is somewhere from misleading to catastrophically wrong. Any internal terminology that doesn't match the common usage outside of our organization poisons the whole session and it will make up things to join the meanings together, and the prompter is none the wiser.

I trust AI only as a gap filler in domains that I'm already an expert in or where there's little internal context, anything else is intellectual suicide.

awongh

I feel like if you're using it in the right way, asking the AI to write code, or give domain context in a specific way then the answers it gives are easily verifiable enough- and it's domain knowledge you wouldn't have gotten very easily through a series of google searches- LLMs as a kind of search can work great.

jhancock

No doubt. The tedious stuff is just a time saver. The domain discovery is often the difference between making progress and giving up.

lechatonnoir

One question is whether, even after all that backpedaling, you feel you could've achieved the same or a similar result in those five days. My findings have been that it's a net plus for productivity, but I'm a bit less sure whether I prefer the way work feels when a lot of it is just going back and cleaning up after the growth. (Of course, that sounds like a familiar statement for a lot of engineers before LLMs, too.)

johnsmith1840

This is why agents suck.

Backpedling is a massive inefficiency.

A better way is the single clean step approach.

Use the largest LLM you can. Have it generate a single output for one update.

If that update has logical errors or dropped anything you asked for restart, refine, narrow until it does.

It's quite hard to plan each step right but the level and conplexity you can get to is far higher than an agent.

Agents are much better at the shallow/broad problems.

Large LLMs are exponentially better deep/narrow problems.

bherms

Yeah I'm impressed with its ability to do stuff, but not quite with its results. We have been working on more AI assistance adoption so I asked it to do some decently complex things with json/yml schema definitions and validations (outside the typical json schema we wanted things like conditional validation, etc)... It wrote a LOT of code and took a long time, and kept telling me it would work, and it didn't. I finally stepped in and eliminated roughly 75% of the code in about 10 minutes and got it working. It's great at tedious stuff, but outside of that, I'm skeptical.

lazide

IMO, you just noted it’s great at creating tedious (but pointless) stuff?

jandrewrogers

The reaction to this article is interesting. I have found AI to be useful in software contexts that most people never exercise or expect based on their intuitions of what an LLM can do.

For me, a highly productive but boring use of LLMs for code is that they excel at providing midwit “best practice” solutions to common problems. They are better documentation than the documentation and can do a lot of leg work e.g. Linux syscall implementation details. My application domains tend to require more sophisticated solutions than an LLM can provide but they still save a lot of rote effort. A lot of software development exists almost entirely in the midwit zone.

Much more interesting, they are decent at reducing concepts in literature to code practice for which there are no code examples. Google and StackOverflow turn up nothing. For example, I’ve found them useful for generating specialized implementations of non-Euclidean computational geometry algorithms that don’t really exist in the wild that I’ve ever seen. This is a big win, it literally turns months of effort into hours of effort.

On the other hand, I do a lot of work with algorithms that don’t exist in literature, never mind public code, with extremely performance-engineered implementations. There is an important take away from this too: LLMs are hilariously bad at helping with this but so are human software developers if required to do the same thing with no context.

Knowledge for which there is little or no training data is currently a formidable moat, both for LLMs and humans.

joshvm

I agree on the distillation step. They're very good at identifying what problem you're trying to solve, especially if you don't have much experience in the domain. I've found ChatGPT great for figuring out problems involving combinatorics. And then you can be canny and ask the LLM to write the next prompt to get some boilerplate code. I like to do this with ChatGPT 4.5 (heavily rate limited) targeting 4o.

Getting good code is still a pain with chat, but you can go a fair way by reducing the context as much as possible and only asking for core functions (no point wasting hundreds of tokens on an arg parser). I'd like to play with Cursor or Zed a bit more.

Debugging sometimes works. 4.5 often figures out errors in one, 4o often struggles and spits out suggestions like it's a customer support agent (it also falls into repetition fairly often if you tell it "no, so what else could it be?"). You really have to work to convince it up front that you're not an idiot.

eleveriven

The moat isn't intelligence, it's context

hatthew

What? This is exactly what I would expect an LLM to be good at. Stack Overflow gives common solutions to common problems. LLMs give niche solutions to problems that aren't common enough for extensive discussion, but are still already solved if you find the right repository or study the documentation thoroughly.

Neither helps with the hardest and most time consuming problems faced by software engineers: solving problems that have never been (publicly) solved before, making hard decisions based on intuition about what will be necessary in the future, understanding the interactions between broad webs of dependency in large codebases, and working with proprietary domain knowledge.

ChrisMarshallNY

I use AI every day, basically as a "pair coder."

I used it about 15 minutes ago, to help me diagnose a UI issue I was having. It gave me an answer that I would have figured out, in about 30 minutes, in about 30 seconds. My coding style (large files, with multiple classes, well-documented) works well for AI. I can literally dump the entire file into the prompt, and it can scan it in milliseconds.

I also use it to help me learn about new stuff, and the "proper" way to do things.

Basically, what I used to use StackOverflow for, but without the sneering, and much faster turnaround. I'm not afraid to ask "stupid" questions -That is critical.

Like SO, I have to take what it gives me, with a grain of salt. It's usually too verbose, and doesn't always match my style, so I end up doing a lot of refactoring. It can also give rather "naive" answers, that I can refine. The important thing, is that I usually get something that works, so I can walk it back, and figure out a better way.

I also won't add code to my project, that I don't understand, and the refactoring helps me, there.

I have found the best help comes from ChatGPT. I heard that Claude was supposed to be better, but I haven't seen that.

I don't use agents. I've not really ever found automated pipelines to be useful, in my case, and that's sort of what agents would do for me. I may change my mind on that, as I learn more.

yeahwhatever10

I use it as a SO stand in as well.

What I like about Chatbots vs SO is the ability to keep a running conversation instead of 3+ tabs and tuning the specificity toward my problem.

I've also noticed that if I look up my same question on SO I often find the source code the LLM copied. My fear is that if chatbots kill SO where will the LLM's copied code come from in the future?

__grob

I use Perplexity as my daily driver and it seems to be pretty good at piecing together the path forward from documentation as it has that built-in web search when you ask a question. Hopefully LLMs go more in that direction and less in the SO copy-paste direction, sidestepping the ouroboros issue.

neom

Not a dev. SO done for then? It's been an important part of history.

ChrisMarshallNY

Agreed. It was a very important part of my personal journey, but, like so many of these things (What is a “payphone,” Alex), it seems to have become an anachronism.

Yesterday, I was looking at an answer, and I got a popup, saying that a user needed help. I dutifully went and checked the query. I thought “That’s a cool idea!”. I enjoy being of help, and sincerely wanted to be a resource. I have gotten a lot from SO, and wanted to give back.

It was an HTML question. Not a bad one, but I don’t think I’ve ever asked or answered an HTML question on SO. I guess I have the “HTML” tag checked, but I see no other reason for it to ask my help.

Yeah, I think it’s done.

GenshoTikamura

Yes, it's been an important part of tricking humans into sharing their knowledge with other humans to obtain a huge Q&A dataset to train the AI without any consent of said people.

https://meta.stackexchange.com/questions/399619/our-partners...

ChrisMarshallNY

That's an issue. It will likely turn into a Worm Ouroboros.

There's usually some "iteration," with ChatGPT giving me deprecated APIs and whatnot.

eleveriven

Agree on the verbosity and occasional naivety. But the fact that it gives working starting points is what really moves the needle. It gets me unstuck faster, and I still get to do the creative, architectural stuff

ChrisMarshallNY

Yup.

I’ll ask it how to accomplish some task that I’ve not done, before, and it will give me a working solution. It won’t necessarily be a good solution, but it will work.

I can then figure out how it got there, and maybe determine a more effective/efficient manner.

kamaal

>>I'm not afraid to ask "stupid" questions -That is critical.

AI won't judge and shame you in front of the whole world, for asking stupid questions, or not RTFM'ing well enought, like Stackoverflow users do. Nor will it tell you, your questions are irrelevant.

I think this is the most killer AI feature ever.

ChrisMarshallNY

I’ve always worked that way. In school (or in seminars), I ask questions that may have the whole room in stitches, but I always learn the lesson. The worst teacher I ever had, was a genius calculus professor, who would harangue you in front of the class, for asking a “stupid” question. That’s the only class I ever took an Incomplete.

That’s the one thing about SO that I always found infuriating. It seems their favorite shade, is inferring that you’re “lazy,” and shaming you for not already having the answer. If anyone has ever looked at my code, “lazy” is probably not a word that springs to mind.

In most cases, I could definitely get the answer, myself, but it would take a while, and getting pointers might save me hours. I just need a hint, so that I can work out an answer.

With SO, I usually just bit my tongue, and accepted the slap, as well as the answer.

An LLM can actually look at a large block of code, and determine some boneheaded typo I made. That’s exactly what it did, yesterday. I just dumped my entire file into it, and said “I am bereft of clue. Do you have any idea why the tab items aren’t enabling properly?”. It then said “Yes, it’s because you didn’t propagate the tag from the wrapper into the custom view, here.” It not only pointed out the source error, but also explained how it resulted in the observed symptoms.

In a few seconds, it not only analyzed, but understood an entire 500-line view controller source file, and saw my mistake, which was just failing to do one extra step in an initializer.

There’s absolutely no way that I could have asked that question on SO. It would have been closed down, immediately. Instead, I had the answer in ten seconds.

I do think that LLMs are likely to “train” us to not “think things through,” but they said the same thing about using calculators. Calculators just freed us up to think about more important stuff. I am not so good at arithmetic, these days, but I no longer need to be. It’s like Machine Code. I learned it, but don’t miss it.

dematz

Forcing you to read through your 500 line view controller does have the side effect of you learning a bunch of other valuable things and strengthening your mental model of the problem. Maybe all unrelated to fixing your actual problem ofc, but also maybe helpful in the long run.

Or maybe not helpful in the long run, I feel like AI is the most magical when used on things that you can completely abstract away and say as long as it works, I don't care what's in it. Especially libraries where you don't want to read their documentation or develop that mental model of what it does. For your own view, idk it's still helpful when AI points out why it's not working, but more of a balance vs working on it yourself to understand it too.

kamaal

>>I’ve always worked that way. In school (or in seminars), I ask questions that may have the whole room in stitches, but I always learn the lesson.

In my experience, if a question is understood well enough, it basically directly translates into a solution. In most cases parts of questions are not well understood, or require going into detail/simplification/has a definition we don't know etc etc.

This is where being able to ask questions and getting clear answers helps. AI basically helps your do understand the problem as you probe deeper and deeper into the question itself.

Most human users would give up after answering you after a while, several would send you through a humiliating ritual and leaving you with a life long fear of asking questions. This prevents learning, as a good way of developing imagination is asking questions. There is only that much you can derive from a vanilla definition.

AI will be revolutionary for just this reason alone.

pera

It's fascinating how over the past year we have had almost daily posts like this one, yet from the outside everything looks exactly the same, isn't that very weird?

Why haven't we seen an explosion of new start-ups, products or features? Why do we still see hundreds of bug tickets on every issue tracking page? Have you noticed anything different on any changelog?

I invite tptacek, or any other chatbot enthusiast around, to publish project metrics and show some actual numbers.

simonw

"Why haven't we seen an explosion of new start-ups, products or features?"

You're posting this question on a forum hosted by YC. Here's a story from March 2024: "YC’s latest W24 batch includes 240 companies. A significant portion of the companies have some AI component, with 63% tagged as “Artificial Intelligence” — a notable increase from 51% in the preceding S23 batch and 29% before that.". https://jamesin.substack.com/p/analysis-of-ycs-latest-w24-ba...

I've not seen the same analysis for more recent batches.

maplant

I don't think that refutes the parent's point. So many AI companies, but where are the companies _using_ the AI?

ndiddy

It's also interesting how when you look at the websites for the new wave of AI B2B SaaS startups, most of the customers they list are other AI B2B SaaS startups. It makes me wonder how much of the "AI industry" is just startups sending VC money back and forth to each other.

pera

Sorry I don't follow, would you mind clarifying your point?

creesch

Not the person you are asking to clarify. But, I can read it in two ways:

1. A huge part of the demographic group visiting HN is biased in favor of AI given the sort of startups YN decides to fund.

2. The large amount of start-ups funded by HN that are related to AI should answer your question.

I am slightly leaning towards the first one combined with a little bit of the latter one. A lot of people working in startups will be used to building up a structure from scratch where incorporating the latest "thing" is not that big of a deal. It also means they rarely see the long term impact of the code they write.

They have a huge blind spot for the reality of existing code bases and company structures where introducing these tools isn't as easy and code needs to be maintained for much longer.

simonw

You said "Why haven't we seen an explosion of new start-ups?" so I replied by pointing out that a sizable percentage of recent YC batches are new AI startups.

I categorize that as "an explosion", personally. Do you disagree?

kubb

Most likely there’s a slight productivity increase.

The enthusiasts have a cognitive dissonance because they are pretty sure this is huge and we’re living in the future, so they go through various denial strategies when the execs ask them where the money is.

In this case it’s blame. These darned skeptics are ruining it for everyone.

knallfrosch

> Why haven't we seen an explosion of new start-ups, products or features? Why do we still see hundreds of bug tickets on every issue tracking page? Have you noticed anything different on any changelog?

In my personal experience (LLM and code suggestion only) it's because I use LLMs to code unimportant stuff. Actually thinking what I want to do with the business code is exhausting and I'd rather play a little with a fun project. Also, the unit tests that LLMs can now write (and which were too expensive to write myself) were never important to begin with.

deadmanku

Your argument relies on the idea of an "actual product", what is happening—and I’m seeing it firsthand both in my company’s codebase and in my personal projects—is that AI is contributing more and more to product development. If this trend continues, we may reach a point where 90% of a product is written by AI.

At that stage, the real value will lie in the remaining 10%—the part that requires human judgment, creativity, or architectural thinking. The rest will be seen as routine: simple instructions, redundant CRUD operations, boilerplate, and glue code.

If we focus only on the end result, human will inevitably write less code overall. And writing less code means fewer programming jobs.

creesch

You said a bunch without saying much. It also doesn't track. If the majority of AI work is supposed to be done by agents, capable of doing the entire process including making PRs. Then, why isn't there an explosion in such PRs on a large amount of open source projects? Even more so, why am I not seeing these PRs on AI related open source projects? If I need to target it even more directly, why am I a not seeing hints of this being applied on code agent repositories?

Call me naive, but you'd think that these specifically want to demonstrate how well their product works. Making an effort to distinguish PRs that are largely the work of their own agents. Yet, I am not seeing that.

I have no doubt that people find use in some aspects of these tools. Though I personally more subscribe to the interactive rubber ducky usage of them. But 90% from where I am standing seems like a very, very far way off.

philote

From what I've heard anecdotally, there have been a bunch more PRs and bug reports generated by AI. But I've also heard they're generally trash and just wasting the project maintainers' time.

dabockster

> Then, why isn't there an explosion in such PRs on a large amount of open source projects?

People don't like working for free, either by themselves or with an AI agent.

aerhardt

This is an important question. The skepticism tracks with my personal experience - I feel 10-20% more productive but certainly not 5x when measured over a long period of time (say, the last 6 months or more)

I’m nonetheless willing to be patient and see how it plays out. If I’m skeptical about some grandiose claims I must also be equally skeptical and accepting about the possibility of large scale effects happening but not being apparent to me yet.

novaRom

There were many similar transformations in recent decades. I remember first Windows with true Graphics User Interface was big WOW: productivity boost, you can have all those windows and programs running at the same time! Compare it with DOS where you normally had just one active user-facing process.

abe94

i don't know - I agree we haven't seen changes to our built environment, but as for an "explosion of new start-ups, products" we sort of are seeing that?

I see new AI assisted products everyday, and a lot of them have real usage. Beyond the code-assistants/gen companies which are very real examples, here's an anecdote.

I was thinking of writing a new story, and found http://sudowrite.com/ via an ad, an ai assistant for helping you write, its already used by a ton of journalists and serious writers, and am trying it out.

Then i wanted to plan a trip - tried google but saw nothing useful, and then asked chatgpt and now have a clear plan

creesch

> I was thinking of writing a new story, and found http://sudowrite.com/ via an ad, an ai assistant for helping you write, its already used by a ton of journalists and serious writers, and am trying it out.

I am not seeing anything indicating it is actually used by a ton of journalists and serious writers. I highly doubt it is, the FAQ is also paper thin in as far as substance goes. I highly doubt they are training/hosting their own models yet I see only vague third party references in their privacy policy. Their pricing is less than transparent given that they don't really explain how their "credits" translate to actual usage. They blatantly advertise this to be for students, which is problematic in itself.

This ignores all the other issues around so heavily depending on LLMs for your writing. This is an interesting quirk for starters: https://www.theguardian.com/technology/2024/apr/16/techscape... . But there are many more issues about relying so heavily on LLM tools for writing.

So this example, to me, is actually exemplifying the issue of overselling capabilities while handwaving away any potential issues that is so prevalent in the AI space.

superamit

Hey co-founder of Sudowrite here. We indeed have thousands of writers paying for and using the platform. However, we aim to serve professional novelists, not journalists or students. We have some of both using it, but it's heavily designed and priced for novelists making a living off their work.

We released our own fiction-specific model earlier this year - you can read more it at https://www.sudowrite.com/muse

A much-improved version 1.5 came out today -- it's preferred 2-to-1 vs Claude in blind tests with our users.

You're right on the faq -- alas, we've been very product-focused and haven't done the best job keeping the marketing site up to date. What questions do you wish we'd answer there?

tiborsaas

Maybe because these companies are smaller and fly under the radar. They require less funding team size is small, probably bankrolled by the founders.

At least that's what I do and what I see among friends.

xk_id

Simply put, if we’re living during such a major technological revolution, why does using software suck in such disastrous ways that were unthinkable even ten years ago?

Fredkin

My main concern is not even mentioned in this article and there are hardly any comments here addressing it: Privacy / allowing 3rd parties to read and potentially train on your proprietary source code.

I've used LLMs to crank out code for tedious things (like generating C-APIs and calling into poorly documented libraries) but I'm not letting them touch my code until I can run it 100% locally offline. Would love to use the agentic stuff but from what I've heard it's still too slow to run on a high end workstation with a single 4080.

Or have things got better lately, and crucially is there good VisualStudio integration for running local agents / LLMs?

svachalek

If you're big enough, you pay for on-prem frontier level tools. If not, you can do all right with a good GPU and local model, but it's not the same. And yeah there are plenty of plugins like Continue that support local or on-prem hosted models.

lubujackson

Cursor has "privacy mode" in Pro mode that supposedly prevents training on your code.

julvo

Do you store code on GitHub? If so, how is GH's guarantee to not use your code different from Cursor's (with privacy mode enabled)?

Fredkin

No I definitely don't use GitHub. Everything is entirely in-house.

But even if I did, there's a much more solid foundation of trust there, whereas these AI companies have been very shady with their 'better to ask for forgiveness, than permission' attitudes of late.

YmiYugy

All the model providers have offerings that promise not to train on your code. Can you trust them not to do it anyway? Maybe not. What's the actual damage if they did? You have to balance the expected productivity loss from forgoing the use of these tools with the risk that comes from sharing your code with them. You may want to reevaluate that somewhat frequently. I think there is a tendency in some circles to be a little bit to precious with their code.

julvo

Fair enough. In that case small models like devstral [1] are probably your best bet

[1] https://mistral.ai/news/devstral

theappsecguy

Cursor has no privacy mode whatsoever. I have been able to exfiltrate just about anything from it.

lomase

Do you commit your client code to Github?

hoofhearted

So would you be interested in a simple serverless service that lets you run LLM’s and is focused on privacy and security first instead of trying to run it on an expensive workstation with a single 4080?

dabockster

> but I'm not letting them touch my code until I can run it 100% locally offline

You can do this already with Ollama, RooCode, and a Docker compatible container engine.

oblio

The problem is, it's like ads. Even if it's a paid subscription, the incentive to hoover up your code, metadata about your code or everything they can get away with is just too strong. I think it will just happen at some point, unless you run them 100% locally or hosted somewhere but self managed.

Fredkin

Yes and even if they were squeaky clean and didn't do anything bad with your code, there's also the possibility that they've been hacked and code is getting siphoned off somewhere.

stavros

I love LLMs, and I really like programming with Cursor, but I never managed to get the "agents with tons of stuff in their context" mode to work for me. I use Cursor like a glorified code completer, 4-5 lines at a time, because otherwise the LLM just makes too many mistakes that compound.

If you let it run in the "write my code for me" mode, and ask it to fix some mistake it made, it will always add more code, never remove any. In my experience, in the end the code just ends up so brittle that the LLM will soon get stuck at a point that it never manages to overcome some mistake, no matter how many times it tries.

Has anyone managed to solve this?

jks

Not a full solution, but one thing I've learned not to do is tell Cursor "you got that wrong, fix it like this". Instead, I go back to the previous prompt and click "Restore Checkpoint", edit the prompt and possibly the Cursor rules to steer it in the right direction.

When the model has the wrong solution in its context, it will use it when generating new code, and my feeling is that it doesn't handle the idea of "negative example" very well. Instead, delete the bad code and give it positive examples of the right approach.

lubujackson

Make sure you include lots of files of context (including any type definitions!) After a big change before approving any code ask it: "Is this the simplest and cleanest approach?" Usually it will make more revisions and actually produce clean code then. You can also say that in the original prompt, or specify the structure of the change you want it to make.

stavros

Oh nice, I'll try the simple thing, thanks!

csomar

No. And I don't think they are doing anything magical. Performance drop sharply after 50k tokens. Your LLM does best when you have a short 2-5K context.

mikeshi42

imo cursor's small context window on the $20 plan is what kills their agent mode. Try claude code or anything that lets you use larger context windows (I think cursor has a tier you can pay for now?). claude code regularly deletes and corrects code and LLMs are very much capable of that today.

null

[deleted]

__grob

I would love to know as well. I also have problems with the LLM coding agents when it comes to lesser-known languages like Julia. Has anybody solved this yet?

sv0t

in the last two weeks I've started using Cursor in the 'you do everything, I'm just your boss mode', to see how far I can push it. Right at this minute, I'm working on something thats become pretty big but I'm constantly on the verge of just going back to writing code like normal LOL.

That said, I'm super impressed by how quickly I've been able to get this far with something pretty niche and complex in places.

Here's what I've learned. There are million AI Bros on youtube who have the ultimate solution but they all boil down to a few basic things.

Make rules: make them before you get started and continue updating them as you go.

Lots of tiny modules: Push things into small bite sized bits of code with lots of interface documentation. This feels a bit unnatural when the code is prototype quality.

Documentation is key: the youtubers will often create detailed planning and specification documents in advance. I done this and it's hit and miss, what I've found works is explain what you want to build to an llm and have it create extremely concise documentation, then a rough checklist for implementation and then evolving these documents in cursor as I go.

This leads to a kind of plan --> document, implement --> document, run tests --> document workflow on each reasonable chunk of the design being the most smooth.

Don't let cursor ever see inside big datasets or huge folder trees. In fact keep cursor in a few folders writing source code and nothing else. To do this, early on build tools that can go and get information or make safe edits for cursor from those datasets without it attempting direct access.

The current project has a tools for working with the primary datasets, a job manager, a process manager, a log manager, a workflow manager, these all have functions for querying. Cursor is instructed to use these. It naturally doesn't want to but if you tell it 7 times out 10 it will :)

No mess: Cursor will like to make lots of random tests and processes along the way. instruct it to use a /debug/tests/ folder and wipe it clean often. force it to make 'production' code by having it be registered with the workflow manager and made allowable by process manager. This lets it play around for a while and get something up and running with it's weird little scripts and then implement it for real elsewhere and using the proper framework. The workflow manager needs to have documentation on how the script is used and the process manager needs to the source to be in a particular place with a particular set of standard interfaces.

you might say this is a lot of plumbing, but what isn't these days and it's not like I'm maintaining it - right ;)

Cursor is always about to go insane any minute, or can't remember what's going on. So most of it is about keeping things simple enough for it to focus on the bit it's working on and nothing else.

Regularly for work we have massive code bases written in our company style and built using our business processes. This kind of thing is absolutely not the place for Cursor. I can imagine a company setup from scratch to use Cursor but our current products and practices woud just make it impossibly time-consuming.

lucasyvas

I’m AI neutral but the writing style here is pretty dismissive, and - to match the tone of the article - annoying as fuck.

Most completely reasonable objections to LLMs were totally dismissed.

metalrain

Seems like very common AI hype post style.

"If you aren't using this AI feature you are left behind."

I usually block people for posting like that.

12_throw_away

Right? It's gripping reading about the author's insecurities, but fails to say anything useful about AI.

lomase

All this smart people is dumber than me.

nitwit005

> Does an intern cost $20/month? Because that’s what Cursor.ai costs.

> Part of being a senior developer is making less-able coders productive, be they fleshly or algebraic. Using agents well is both a both a skill and an engineering project all its own, of prompts, indices, and (especially) tooling. LLMs only produce shitty code if you let them.

A junior developer often has negative value to a team, because they're sapping the time of more senior developers who have to help train them, review code, fix mistakes, etc. It can take a long while to break even.

The raw cost of Cursor's subscription is surely dwarfed by your own efforts, given that description. The actual calculous here should be the cost to corral Cursor, against the value of the code it generated.

YmiYugy

I don't think that's necessarily true. Some companies and seniors certainly invest into training and mentoring their juniors and this can take a long time and be expensive. But there are a lot of places offer 3-6 months internships for people that are barely out of high school. Others heavily rely on overseas contractors. So in a lot of scenarios it does seem possible to use less capable labor in a beneficial way for software engineering.

fud101

So what is the end game here, we don't have any juniors? That's what we're celebrating? This whole thing makes no damn sense, we're just becoming more efficient at producing bad code at scale, which will be maintained by a workforce that is growing less proficient at that with the increasing adoption of AI - skills will decline and eventually we'll be living with more buggy software, maintained by underpaid, unappreciated and overwhelmed grey beards and it will be too late to turn back.

elric

I had a call with a recruiter the other day, who told me that they had simply stopped hiring juniors at all, when not too long ago they would be camping out at colleges on graduation day to snipe fresh graduates.

This is going to a problem real soon. There needs to be a realistic career path for software developers, or the entire field is at risk.

620gelato

This. No more juniors, and the skills of the seniors is going to atrophy when all they're doing is review and "move around" code an LLM spat out.

Verdex

Hundreds of comments. Some say LLMs are the future. Others say they don't work today and they won't work tomorrow.

Videogame speed running has this problem solved. Livestream your 10x engineer LLM usage, a git commit annotated with it's prompt per change. Then everyone will see the result.

This doesn't seem like an area of debate. No complicated diagrams required. Just run the experiment and show the result.

M4v3R

I literally do this about twice a week on Twitch. I’m working on some hobby projects, and one constraint I’ve put on myself is to use LLMs for pretty much everything, regardless of whether it saves me time or not. The reason is twofold: I want to learn how to use them efficiently, and I want to constantly test the waters to see where the limits of their capabilities are. On my stream, you can see LLMs fail spectacularly one time and do hours of work in minutes another time.

I’m not alone in this - there are tons of other examples of people showing how they use LLMs online; you just need to search for them.

Verdex

Thanks. I'll give them a watch.

chinchilla2020

Agreed.

The article provides zero measurement, zero examples, zero numbers.

It's pure conjecture with no data or experiment to back it up. Unfortunately conjecture rises to the top on hackernews. A well built study on LLM effectiveness would fall off the front page quickly.

dbalatero

I'd honestly love to see this.

People always say "you just need to learn to prompt better" without providing any context as to what "better" looks like. (And, presumes that my prompt isn't good enough, which maybe it is maybe it isn't.)

The easy way out of that is "well every scenario is different" - great, show me a bunch of scenarios on a speed run video across many problems, so I can learn by watching.

theshrike79

It's because you get to the No True Scotsman -thing pretty fast.

If I use LLMs to code, say a Telegram bot that summarise the family calendars and current weather to a channel - someone will come in saying "but LLMs are shit because they can't handle this very esoteric hardware assembler I use EVERY DAY!!1"

lomase

But... do you know anybody who will give me 50k a year to write telegram bots for them?

dbalatero

I'd like to think I could dismiss those responses and still learn something from a collection of samples.

weddpros

AI Coding is becoming an edge, and sharing your edge isn't the wisest thing to do, even more so when doubt is so prevalent!

Verdex

Extremely doubtful.

This thread has hundreds of comments where people are screaming that everyone needs to learn AI coding.

If it was such an edge would they not otherwise keep quiet?

wiseowise

Because there are forces that are trying to kill the momentum.

Imagine that there was a serum that gives you superhuman strength only under specific conditions that you’re supposed to discover. Then there’s half room who screams that it should be banned, because it is cheating/fake/doesn’t work. And there’s another half room that swears by it, because they know how to utilize it properly.

You know it works and you don’t want to give up your secret sauce or make another half of the room stronger.

dolebirchwood

Or they just aren't sociopaths, so they don't think that way?

GenshoTikamura

Quiet doesn't bring your AI corporation profits up.

mrkeen

It's not your edge anymore. It's AI's. And how is TFA any different?

YmiYugy

Unlikely. Programming in highly collaborative and efficiency is hard to measure. That creates incentives for programmers in competition to typically prioritize advertising their skills by demonstration over maintaining their secret edge. Be it at work or on the internet, if you help others by sharing your techniques you'll make them want to work with you and impress them with how smart you are. If you are keeping it all secret to maintain your edge, people will think of you as unhelpful and they won't know how smart you are, because it's very difficult to judge how difficult the things our accomplished were. The reason people don't stream themselves vibe coding is that's it's even less interesting to watch than regular coding.

lomase

Just like blockchain, smart contracts and web 3.0 are the future!

jansan

Someone will always be idealistic enough to share. The fact that we do not see them now should raise a few eyebrows.

lubujackson

I see all the negative responses, but this seems true to me. I am old enough to remember the dot com days and could see the transformative effect of the Internet from miles away when I was a teenager. Yet many, many people refused to acknowledge that someday soon we would do things like credit card transactions online, or that people might buy shoes without trying them on first, etc.

You could say it is a lack of imagination or not connecting the dots, but I think there is a more human reason. A lot of people don't want the disruption and are happy with the status quo. I'm a software engineer so I know how problematic AI may be for my job, but I think anyone who looks at our current state and the recent improvements should be able to see the writing on the wall here.

I for one am more curious than afraid of AI, because I have always felt that writing code was the worst part of being a programmer. I am much happier building product or solving interesting problems than tracking down elusive bugs or refactoring old codebases.

Nasrudith

I must disagree. Sharing your edge is the wisest possible thing you can do on a societal level. For a slightly silly idea would it be better to have say, everyone doing guesswork knots for how to tie their shoes vs a single reliable 'rabbit ears' technique? Then you can see the benefits to having edges widely shared as a norm. That is the foundation of how society can learn.

GenshoTikamura

So, programmers once had an edge in having their source closed, then fell for the open source evangelism and started sharing their code, which enabled the training of AI models, and now the next iteration of what was called programmers before and is now known as vibe coders has this notion of having an edge in having their chatbot prompts closed again?

Let's all just muse some and imagine what the next cycle of this wheel will look like.