Skip to content(if available)orjump to list(if available)

A staff engineer's journey with Claude Code

spicyusername

I guess we're just going to be in the age of this conversation topic until everyone gets tired of talking about it.

Every one of these discussions boils down to the following:

- LLMs are not good at writing code on their own unless it's extremely simple or boilerplate

- LLMs can be good at helping you debug existing code

- LLMs can be good at brainstorming solutions to new problems

- The code that is written by LLMs always needs to be heavily monitored for correctness, style, and design, and then typically edited down, often to at least half its original size

- LLMs utility is high enough that it is now going to be a standard tool in the toolbox of every software engineer, but it is definitely not replacing anyone at current capability.

- New software engineers are going to suffer the most because they know how to edit the responses the least, but this was true when they wrote their own code with stack overflow.

- At senior level, sometimes using LLMs is going to save you a ton of time and sometimes it's going to waste your time. Net-net, it's probably positive, but there are definitely some horrible days where you spend too long going back and forth, when you should have just tried to solve the problem yourself.

rafaelmn

> but this was true when they wrote their own code with stack overflow.

Searching for solutions and integrating examples found requires effort that develops into a skill. You would rarely get solutions that would just fit into the codebase from SO. If I give a task to you and you produce a correct solution on the initial review I now know I can trust you to deal with this kind of problem in the future. Especially after a few reviews.

If you just vibed through the problem the LLM might have given you the correct solution - but there is no guarantee that it will do it again in the future. Just because you spent less effort on search/official docs/integration into the codebase you learned less about everything surrounding it.

So using LLMs as a junior you are just breaking my trust, and we both know you are not a competent reviewer of LLM code - why am I even dealing with you when I'll get LLM outputs faster myself ? This was my experience so far.

OvbiousError

> So using LLMs as a junior you are just breaking my trust, and we both know you are not a competent reviewer of LLM code - why am I even dealing with you when I'll get LLM outputs faster myself ? This was my experience so far.

So much this. I see a 1000 lines super complicated PR that was whipped up in less than a day and I know they didn't read all of it, let alone understand.

fhd2

Like with any kind of learning, without a feedback loop (as tight as possible IMHO), it's not gonna happen. And there is always some kind of feedback loop.

Ultra short cycle: Pairing with a senior, solid manual and automated testing during development.

Reasonably short cycle: Code review by a senior within hours and for small subsets of the work ideally, QA testing by a seperate person within hours.

Borderline too long cycle: Code review of larger chunks of code by a senior with days of delay, QA testing by a seperate person days or weeks after implementation.

Terminally long feedback cycle: Critical bug in production, data loss, negative career consequences.

I'm confident that juniors will still learn, eventually. Seniors can help them learn a whole lot faster though, if both sides want that, and if the organisation lets them. And yeah, that's even more the case than in the pre LLM world.

DenisM

LLM can also help learning f you ask it what can be done better. Seniors can make prepromt so that company customs are taken into account.

chamomeal

Yeah every time I see one of these articles posted on HN I know I'll see a bunch of comments like "well here's how I use claude code: I keep it on a tight leash and have short feedback loops, so that I'm still the driver, and have markdown files that explain the style I'm going for...". Which is fine lol but I'm tired of seeing the exact same conversations.

It's exhausting to hear about AI all the time but it's fun to watch history happen. In a decade we'll look back at all these convos and remember how wild of a time it was to be a programmer.

coldpie

I'm thiiiis close to writing a Firefox extension that auto-hides any HN headline with an LLM/AI-related keyword in the title, just so I can find something interesting on here again.

codyb

It comes and goes in cycles... I remember the hay days of MVC frameworks and oh my this one is MVVC! ad nauseum for... years lol.

I stopped coming here for a year or two, now I visit once a day or so and mostly just skim a couple threads.

Eventually, this entire field... just starts to feel pretty cyclical.

dysoco

I appreciate HN staying simple but a tag system like lobsters has would be pretty nice...

icdtea

You can do this with a custom filter list in Ublock Origin, no custom extension necessary.

ftkftk

You could use an agentic AI coding tool to vibe code you one in minutes! /s

red-iron-pine

> Which is fine lol but I'm tired of seeing the exact same conversations.

makes me think the bots are providing these conversations

ezekg

We're definitely moving full force into a dead internet [0], I think.

[0]: https://en.m.wikipedia.org/wiki/Dead_Internet_theory

automatic6131

>- LLMs utility is high enough that it is now going to be a standard tool in the toolbox of every software engineer, but it is definitely not replacing anyone at current capability.

Right! Problem, billions of dollars have been poured into this wrt to infrastructure, datacenters, compute and salaries. LLMs need to be at the level of replacing vast swathes of us to be worth it. LLMs are not going to be doing that.

This is a collosal malinvestment.

utyop22

Yeah eventually reality and fantasy have to converge.

Nobody knows when. But it will. TBH the biggest danger is that all the hopes and dreams aren't materialised and the appetite for high-risk investments dissipates.

We've had this period in which you can be money losing and its OK. But I believe we have passed the peak on that - and this is destined to blow up.

dawnerd

On your last point I’ve found it about a wash in terms of time savings for me. For boiler plate / throw away code it’s decent enough - especially if I don’t care about code quality and only want a result.

It’s wasted so much time trying to make it write actual production quality code. The consistency and over-verbose nature kill it for me.

sunir

All true if you one shot the code.

If you have a sophisticated agent system that uses multiple forward and backward passes, the quality improves tremendously.

Based on my set up as of today, I’d imagine by sometime next year that will be normal and then the conversation will be very different; mostly around cost control. I wouldn’t be surprised if there is a break out popular agent control flow language by next year as well.

The net is that unsupervised AI engineering isn’t really cheaper better or faster than human engineering right now. Does that mean in two years it will be? Possibly.

There will be a lot of optimizations in the message traffic, token uses, foundational models, and also just the Moore’s law of the hardware and energy costs.

But really it’s the sophistication of the agent systems that control quality more than anything. Simply following waterfall (I know, right? Yuck… but it worked) increased code quality tremoundously.

I also gave it the SelfDocumentingCode pattern language that I wrote (on WikiWikiWeb) as a code review agent and quality improved tremendously again.

theshrike79

> Based on my set up as of today, I’d imagine by sometime next year that will be normal and then the conversation will be very different; mostly around cost control. I wouldn’t be surprised if there is a break out popular agent control flow language by next year as well.

Currently it's just VC funded. The $20 packages they're selling are in no way cost-effective (for them).

That's why I'm driving all available models like I stole them, building every tool I can think of before they start charging actual money again.

By then local models will most likely be at a "good enough" level especially when combined with MCPs and tool use so I don't need to pay per token for APIs except for special cases.

tempoponet

Once local models are good enough there will be a $20 cloud provider that can give you more context, parameters, and t/s than you could dream of at home. This is true today with services like groq.

lordgrenville

Yes! Reminds me of one of my all-time favourite HN comments https://news.ycombinator.com/item?id=23003595

theshrike79

> - The code that is written by LLMs always needs to be heavily monitored for correctness, style, and design, and then typically edited down, often to at least half its original size

For this language matters a lot, if whatever you're using has robust tools for linting and style checks, it makes the LLMs job a lot easier. Give it a rule (or a forced hook) to always run tests and linters before claiming a job is done and it'll iterate until what it produces matches the rules.

But LLM code has a habit of being very verbose and covers every situation no matter how minuscule.

This is especially grating when you're doing a simple project for local use and it's bootstrapping something that's enterprise-ready :D

WorldMaker

If you force the LLM to solve every test failure this also can lead to the same breakdown models as very junior developers coding to the tests rather than the problems, I've seen all of:

1) I broke the tests, guess I should delete them.

2) I broke the tests, guess the code I wrote was wrong, guess I should delete all of that code I wrote.

3) I broke the tests, guess I should keep adding more code and scaffolding. Another abstraction layer might work? What if I just add skeleton code randomly, does this add random code whack-a-mole work?

That last one can be particularly "fun" because already verbose LLM code skyrockets into baroque million line PRs when left truly unsupervised, and that PR still won't build or pass tests.

There's no true understanding by an LLM. Forcing it to lint/build can be important/useful, but still not a cure-all, and leads to such fun even more degenerate cases than hand-holding it.

MontyCarloHall

It's almost as if you could recapitulate each of these discussions using an LLM!

rhubarbtree

Does anyone have a link to a video that uses Claude Code to produce clean robust code that solves a non trivial problem (ie not tic tac toe or a landing page) more quickly than a human programmer can write? I don’t want a “demo”, I want a livestream from an independent programmer unaffiliated with any AI company and thus not incentivised to hype.

I want the code to have subsequently been deployed in production and demonstrably robust, without additional work outside of the livestream.

The livestream should include code review, test creation, testing, PR creation.

It should not be on a greenfield project, because nearly all coding is not.

I want to use Claude and I want to be more productive, but my experience to date is that for writing code beyond autocomplete AI is not good enough and leads to low quality code that can’t be maintained, or else requires so much hand holding that it is actually less efficient than a good programmer.

There are lots of incentives for marketing at the grassroots level. I am totally open to changing my mind but I need evidence.

M4v3R

I've live streamed how I've built a tower defense game over a span of a week entirely using AI. I've also written down all the prompts were used to create this game, you can read about it here: https://news.ycombinator.com/item?id=44463967

Mind you I've never wrote a non-trivial game before in my life. It would take me weeks to do this on my own without any AI assistance.

Right now I'm working on a 3d world map editor for Final Fantasy VII that was also almost exclusively vibe-coded. It's almost finished and I plan a write up and a video about it when I'm done.

Now of course you've made so many qualifiers in your post that you'll probably dismiss this as "not production", "not robust enough", "not clean" etc. But this doesn't matter to me. What matters is I manage to finish projects that I would not otherwise if not for the AI coding tools, so having them is a huge win for me.

hvb2

> What matters is I manage to finish projects that I would not otherwise if not for the AI coding tools, so having them is a huge win for me.

I think the problem is in your definition of finishing a project.

Can you support said code, can you extend it, are you able to figure out where bugs are when they show up? In a professional setting, the answer to all of those should likely be yes. That's what production code is.

ffsm8

I disagree with your sentiment.

The difference isn't what's finishing a project is, is the dissonance between what M4v3R and rhubarbtree understand when talking about "nontrivial production" software.

When you're working in enterprise, you usually have multiple stakeholders each defining sometimes even conflicting requirements to behavior of your software. And you're required to adhere to these requirements stringently.

That's an environment that's inherently a bad fit for vibe coding.

It can still be used there, too, but you will not get a 2-3x speed up, because the LLM will always introduce minor behavioral changes - which aren't important in M4v3R scenario, but a complete deal breaket for rhubarbtree.

From my own experience, I don't get a speed up at all via CoPilot agentic mode (Claude code is banned at my workplace). But I have had a significant boost in productivity in projects that don't need to adhere to any specific spec - namely projects I do an my own time (with Claude code right now).

I still use Copilot agentic mode though. While I haven't timed myself, I don't think I'm faster with it whatsoever. It's just less mentally involved in a lot of scenarios, so it's less exhausting - leaving more energy for side projects .

sksrbWgbfK

Unless you write tower defense games all day long for a living, I don't know how it's interesting.

infamousclyde

Jon Gjengset (of MIT Missing Semester, Rust for Rustsceans, etc) shared a stream doing complex changes of increasing complexity to a geospatial math library in Rust. He’s an excellent engineer, and was able to pick apart AI-suggested changes liberally. The caveat is that the video is a bit long, but segmented nicely.

I think he had a positive experience overall, but it was clear throughout the stream that he was not yielding control to a pure-agent workflow soon.

https://youtu.be/eZ7DVHAK8hw?si=vWW4kz2qiRRceNMQ

toth

I think you shared the wrong link. Based on a quick youtube search I think you meant this one

https://youtu.be/EL7Au1tzNxE

ochronus

I agree. Based on my very subjective and limited experience (plus friends/colleagues), when it comes to producing solutions, what you get from AI is what you get from your 2-day hackathon—then you spend months making it production-ready.

And your starry-eyed CEO is asking the same old question: How come everything takes so long when a 2-person team over two days was able to produce a shiny new thing?!. sigh

Could be used for early prototyping, though, before you hire your first engineers just to fire them 6 months later.

jf22

Yeah but you get the two days of hacking in 15 minutes.

And I highly doubt you spend months, as in 5+ weeks at the least making it production ready.

What even is "production readiness?" 100% fully unit tested and ready for planetary hyper scale or something? 95% of the human generated software I work on is awful but somehow makes people money.

ruszki

First of all, you can rarely write down in English, what you want in 15 minutes… It’s even common to have longer specification, than its implementation. Just look at tests. Especially, if you want to do something which was never done before, the disparity can be staggering.

Claude Code for example is also not that quick at all. It produces some code quickly, but even scaffolding three hello world level example projects together definitely takes more than an hour. And that’s with zero novelty. The first version of code is done quickly, but the continuous loop of self corrections after that takes a long time. Even with Serena, Context7, and other MCPs.

And, of course, without real code review. That’s easily hours even with just few thousands lines of code, if it uses something which you don’t know. But I know that almost everybody gave up understanding “their” “own” code, during vibe coding. Even before AIs, it was a well known fact, that real code reviewing is hard, and people rarely did it.

AI can make you quicker in certain situations, but these “15 minutes” claims are totally baseless. This is one reason why many people are against AIs, vibe coding, etc. These stupid claims which cannot hold even the smallest scrutiny.

coffeeri

This video [0] is relevant, though it actually supports your point - it shows Claude Code struggling with non-trivial tasks and needing significant hand-holding.

I suspect videos meeting your criteria are rare because most AI coding demos either cherry-pick simple problems or skip the messy reality of maintaining real codebases.

[0] https://www.youtube.com/watch?v=EL7Au1tzNxE

thecupisblue

Great video! Even more, shows a few things - how good it is with such a niche language but also exposes some direct flaws.

First off, Rust represents quite a small part of the training dataset (last I checked it was under 1% of code dataset) in most public sets, so it's got waaay less training then other languages like TS or Java. You added 2 solid features, backed with tests and documentation and nice commit messages. 80% of devs would not deliver this in 2.5 hours.

Second, there was a lot of time/token waste messing around with git and git messages. Few tips I noticed that could help you in the workflow:

#1: Add a subagent for git that knows your style, so you don't poison direct claude context and spend less tokens/time fighting it.

#2: Claude has hooks, if your favorite language has a formatter like rust fmt, just use hooks to run rust fmt and similar.

#3: Limit what they test, as most LLM models tend to write overeager tests, including testing if "the field you set as null is null", wasting tokens.

#5: Saying "max 50 characters title" doesn't really mean anything to the LLM. They have no inherent ability to count, so you are relying on probability, which is quite low since your context is quite filled at this point. If they want to count the line length, they also have to use external tools. This is an inherent LLM design issue and discussing it with an LLM doesn't get you anywhere really.

newswasboring

> #3: Limit what they test, as most LLM models tend to write overeager tests, including testing if "the field you set as null is null", wasting tokens.

Heh, I write this for some production code too (python). I guess because python is not typed, I'm testing if my pydantic implementation works.

komali2

> #1: Add a subagent for git that knows your style, so you don't poison direct claude context and spend less tokens/time fighting it.

I've not heard of this for, what does this mean practically? Some kind of invocation in claude? Opening another claude window?

Aeolun

> I suspect videos meeting your criteria are rare because most AI coding demos either cherry-pick simple problems or skip the messy reality of maintaining real codebases.

Or we’re just having too much fun making stuff to make videos to convince people that are never going to be convinced.

Difwif

I took a quick informal poll of my coworkers and the majority of us have found workflows where CC is producing 70-99% of the code on average in PRs. We're getting more done faster. Most of these people tend to be anywhere from 5-12 yrs professional experience. There are some concerns that maybe more bugs are slipping through (but also there's more code being produced).

We agree most problems stem from: 1. Getting lazy and auto-accepting edits. Always review changes and make sure you understand everything. 2. Clearly written specification documents before starting complex work items 3. Breaking down tasks into a managable chunk of scope 4. Clean digestible code architecture. If it's hard for a human to understand (e.g: poor separation of concerns) it will be hard for the LLM too.

But yeah I would never waste my time making that video. Having too much fun turning ideas into products to care about proving a point.

MontyCarloHall

Forget a livestream, I want to hear from maintainers of complex, actively developed, and widely used open-source projects (e.g. ffmpeg, curl, openssh, sqlite). Highly capable coding LLMs have been out for long enough that if they do indeed have meaningful impact on writing non-trivial, non-greenfield/boilerplate code, it ought to be clearly apparent in an uptick of positive contributions to projects like these.

stitched2gethr

This contains some specific data with pretty graphs: https://youtu.be/tbDDYKRFjhk?t=623

But if you do professional development and use something like Claude Code (the current standard, IMO) you'll quickly get a handle on what it's good at and what it isn't. I think it took me about 3-4 weeks of working with it at an overall 0x gain to realize what it's going to help me with and what it will make take longer.

MontyCarloHall

This is a great conference talk, thanks for sharing!

To summarize, the authors enlisted a panel of expert developers to review the quality of various pull requests, in terms of architecture, readability, maintainability, etc. (see 8:27 in the video for a partial list of criteria), and then somehow aggregate these criteria into an overall "productivity score." They then trained a model on the judgments of the expert developers, and found that their model had a high correlation with the experts' judgment. Finally, they applied this model to PRs across thousands of codebases, with knowledge of whether the PR was AI-assisted or not.

They found a 35-40% productivity gain for easy/greenfield tasks, 10-15% for hard/greenfield tasks, 15-20% for easy/brownfield tasks, and 0-10% for hard/brownfield tasks. Most productivity gains went towards "reworked" code, i.e. refactoring of recent code.

All in all, this is a great attempt at rigorously quantifying AI impact. However, I do take one major issue with it. Let's assume that their "productivity score" does indeed capture the overall quality of a PR (big assumption). I'm not sure this measures the overall net positive/negative impact to the codebase. Just because a PR is well-written according to a panel of expert engineers doesn't mean it's valuable to the project as a whole. Plenty of well-written code is utterly superfluous (trivial object setters/getters come to mind). Conversely, code that might appear poorly written to an outsider expert engineer could be essential to the project (the highly optimized C/assembly code of ffmpeg comes to mind, or to use an extreme example, anything from Arthur Whitney). "Reworking" that code to be "better written" would be hugely detrimental, even though the judgment of an outside observer (and an AI trained on it) might conclude that said code is terrible.

brookst

So what percentage of human programmers, in the entire world, do you think contribute to meaningful projects like those?

MontyCarloHall

I picked these specific projects because they are a) mature, b) complex, and as a result c) unlikely to have development needs for lots of new boilerplate code.

I would estimate the majority of developers spend most of their time on problems encompassing all three of these, even if their software is not as meaningful/widely used as the previous examples. Everyone knows that LLMs are fantastic at generating greenfield boilerplate very quickly. They are an invaluable rapid prototyping/MVP generation tool, and that in itself is hugely useful.

But that's not where developers spend most of their time. They spend it maintaining complicated, mature codebases, and the utility of LLMs is much less proven for that use case. This utility would be most easily measured in contributions to open-source projects, since all commits are public and maintainers have no monetary incentive to misrepresent the impact of AI [0, 1, 2, ...].

[0] https://www.businessinsider.com/anthropic-ceo-ai-90-percent-...

[1] https://www.cnbc.com/2025/06/26/ai-salesforce-benioff.html

[2] https://www.cnbc.com/2025/04/29/satya-nadella-says-as-much-a...

stared

I wouldn't dive for these. Vibe coding is a slot machine - sometimes you get wonderful results on the first prompt, more than often - not. So, a cherry-picked example is not a proof it works.

If you want me to show an example of vibe coding, I bet I can migrate someone's blog to Astro with Claude Code faster than a frontend engineer.

> It should not be on a greenfield project, because nearly all coding is not.

Well, Claude Code does not work the best for existing projects. (With some exceptions.)

simonw

Armin Ronacher (long-time Python and Rust open source community figure, creator of Flask and Jinja among others) has several YouTube videos that partially fit the bill. https://www.youtube.com/watch?v=sQYXZCUvpIc and https://www.youtube.com/watch?v=Y4_YYrIKLac and https://www.youtube.com/watch?v=tg61cevJthc

ku1ik

I watched one of those videos and it was very underwhelming, imho not really selling Claude Code to anyone who isn’t convinced.

sirpalee

I had success with it on a large, established project when using it for refactoring, moving around functions, implementing simple things and writing documentation. It failed when implementing complex new features and horribly went off the rails when trying to debug issues. Almost all its recommendations were wrong, and it kept trying to change things that certainly weren't the problem.

apercu

This matches my experience as well. One unexpected benefit is that I learned a couple pieces of hardware inside and out because LLMs make so many mistakes. If I wouldn’t have used an LLM I wouldn’t have gone down all these rabbit holes based on incorrect info - I would have just read the docs and solved my use case but missed out on deeper understanding.

Just reinforces my biases that LLMs are currently garbage for anything new and complicated. But they are a great interactive note taker and brainstorming tool.

swframe2

Preventing garbage just requires that you take into account the cognitive limits of the agent. For example ...

1) Don't ask for large / complex change. Ask for a plan but ask it to implement the plan in small steps and ask the model to test each step before starting the next.

2) For really complex steps, ask the model to write code to visualize the problem and solution.

3) If the model fails on a given step, ask it to add logging to the code, save the logs, run the tests and the review the logs to determine what went wrong. Do this repeatedly until the step works well.

4) Ask the model to look at your existing code and determine how it was designed to implement a task. Some times the model will put all of the changes in one file but your code has a cleaner design the model doesn't take into account.

I've seen other people blog about their tricks and tips. I do still see garbage results but not as high as 95%.

rco8786

I feel like I do all of this stuff and still end up with unusable code in most cases, and the cases where I don't I still usually have to hand massage it into something usable. Sometimes it gets it right and it's really cool when it does, but anecdotally for me it doesn't seem to be making me any more efficient.

enobrev

> it doesn't seem to be making me any more efficient

That's been my experience.

I've been working on a 100% vibe-coded app for a few weeks. API, React-Native frontend, marketing website, CMS, CI/CD - all of it without changing a single line of code myself. Overall, the resulting codebase has been better than I expected before I started. But I would have accomplished everything it has (except for the detailed specs, detailed commit log, and thousands of tests), in about 1/3 of the time.

fourthark

How long would it have taken if you had written “the detailed specs, detailed commit log, and thousands of tests”?

jaggederest

The key is prompting. Prompt to within an inch of your life. Treat prompts as source code - edit them in files, use @ notation to bring them into the console. Use Claude to generate its own prompts - https://github.com/wshobson/commands/ and https://github.com/wshobson/agents/ are very handy, they include a prompt-engineer persona.

I'm at the point now where I have to yell at the AI once in a while, but I touch essentially zero code manually, and it's acceptable quality. Once I stopped and tried to fully refactor a commit that CC had created, but I was only able to make marginal improvements in return for an enormous time commitment. If I had spent that time improving my prompts and running refactoring/cleanup passes in CC, I suspect I would have come out ahead. So I'm deliberately trying not to do that.

I expect at some point on a Friday (last Friday was close) I will get frustrated and go build things manually. But for now it's a cognitive and effort reduction for similar quality. It helps to use the most standard libraries and languages possible, and great tests are a must.

Edit: Also, use the "thinking" commands. think / think hard / think harder / ultrathink are your best friend when attempting complicated changes (of course, if you're attempting complicated changes, don't.)

thayne

This works fairly well for well defined, repetitive tasks. But at least for me, if you have to put that much effort into the prompt, it is likely easier just to write the code myself.

rco8786

Have you made any attempt to quantify your efficiency/output vs writing the code yourself? I've done all of these things you've mentioned, with varying degrees of success. But also everything you're talking about doing is time consuming and eats away at whatever efficiency gain CC claims to offer.

shaunxcode

I am convinced that this comment once read aloud in the cadence of Ginsberg is a work of art!

fragmede

How much voice control have you implemented?

nostrademons

I've found that an effective tactic for larger, more complex tasks is to tell it "Don't write any code now. I'm going to describe each of the steps of the problem in more detail. The rough outline is going to be 1) Read this input 2) Generate these candidates 3) apply heuristics to score candidates 4) prioritize and rank candidates 5) come up with this data structure reflecting the output 6) write the output back to the DB in this schema". Claude will then go and write a TODO list in the code (and possibly claude.md if you've run /init), and prompt you for the details of each stage. I've even done this for an hour, told Claude "I have to stop now. Generate code for the finished stages and write out comments so you can pick up where you left off next time" and then been able to pick up next time with minimal fuss.

hex4def6

FYI: You can force "Plan mode" by pressing shift-tab. That will prevent it from eagerly implementing stuff.

jaggederest

> That will prevent it from eagerly implementing stuff.

In theory. In practice, it's not a very secure sandbox and Claude will happily go around updating files if you insist / the prompt is bad / it goes off on a tangent.

I really should just set up a completely sandboxed VM for it so that I don't care if it goes rm-rf happy.

yahoozoo

How does a token predictor “apply heuristics to score candidates”? Is it running a tool, such as a Python script it writes for scoring candidates? If not, isn’t it just pulling some statistically-likely “score” out of its weights rather than actually calculating one?

astrange

Token prediction is the interface. The implementation is a universal function approximator communicating through the token weights.

imtringued

You can think of the K(=key) matrix in attention as a neural network where each token is turned into a tiny classifier network with multiple inputs and a single output.

The softmax activation function picks the most promising activations for a given output token.

The V(=value) matrix forms another neural network where each token is turned into a tiny regressor neural network that accepts the activation as an input and produces multiple outputs that are summed up to produce an intermediate token which is then fed into the MLP layer.

From this perspective the transformer architecture is building neural networks at runtime.

But there are some pretty obvious limitations here: The LLM operates on tokens, which means it can only operate on what is in the KV-cache/context window. If the candidates are not in the context window, it can't score them.

plaguuuuuu

I've been using a few LLMs/agents for a while and I still struggle with getting useful output from it.

In order for it not to do useless stuff I need to expend more energy on prompting than writing stuff myself. I find myself getting paranoid about minutia in the prompt, turns of phrase, unintended associations in case it gives shit-tier code because my prompt looked too much like something off experts-exchange or whatever.

What I really want is something like a front-end framework but for LLM prompting, that takes away a lot of the fucking about with generalised stuff like prompt structure, default to best practices for finding something in code, or designing a new feature, or writing tests..

Mars008

> What I really want is something like a front-end framework but for LLM prompting

It's not simple to even imagine ideal solution. The more you think about it the more complicated your solution becomes. Simple solution will be restricted to your use cases. Generic is either visual or a programming language. I's like to have visual constructor, graph of actions, but it's complicated. The language is more powerful.

dontlaugh

At that point, why not just write the code yourself?

lucasyvas

I reached this conclusion pretty quickly. With all the hand holding I can write it faster - and it’s not bragging, almost anyone experienced here could do the same.

Writing the code is the fast and easy part once you know what you want to do. I use AI as a rubber duck to shorten that cycle, then write it myself.

jprokay13

I am coming back to this. I’ve been using Claude pretty hard at work and for personal projects, but the longer I do it, the more disappointed I become with the quality of output for anything bigger than a script. I do love planning things out and clarifying my thoughts. It’s a turbocharged rubber duck - but it’s not a great engineer

2muchcoffeeman

I’ve been trapped in a hole of “can I get the agent to do this?” And the change would have taken me 1/10th the time.

Choosing the battles to pick is part of the skill at the moment.

I use AI for a lot of boiler plate, tedious tasks I can’t quite do a vim recording for, small targeted scripts.

catdog

Writing the code in the grand scheme of things isn't the hard part in software development. The hard parts are architecture and actually building the right thing, something an LLM can't really help you with.

It's not AI, there is no intelligence. A language model as the name says deals with language. Current ones are surprisingly good at it but it's still not more than that.

harrall

I don’t do much of the deep prompting stuff but I find AI can write some code faster than I can and accurately most of the time. You just need to learn what those things are.

But I can’t tell you any useful tips or tricks to be honest. It’s like trying to teach a new driver the intuition of knowing when to brake or go when a traffic light turns yellow. There’s like nothing you can really say that will be that helpful.

utyop22

I'm finding what's happening right now kinda bizarre.

The funny thing is - we need less. Less of everything. But an up-tick in quality.

This seems to happen with humans with everything - the gates get opened, enabling a flood of producers to come in. But this causes a mountain of slop to form, and overtime the tastes of folks get eroded away.

Engineers don't need to write more lines of code / faster - they need to get better at interfacing with other folks in the business organisation and get better at project selection and making better choices over how to allocate their time. Writing lines of code is a tiny part of what it takes to get great products to market and to grow/sustain market share etc.

But hey, good luck with that - ones thinking power is diminished overtime by interacing with LLMs etc.

mumbisChungo

>ones thinking power is diminished overtime by interacing with LLMs etc.

Sometimes I reflect on how much more efficiently I can learn (and thus create) new things because of these technologies, then get anxiety when I project that to everyone else being similarly more capable.

Then I read comments like this and remember that most people don't even want to try.

kyleee

Partly it seems to be less taxing for the human delivering the same amount of work. I find I can chat with Claude, etc and work more. Which is a double edged sword obviously when it comes to work/life balance etc. But also I am less mentally exhausted from day job and able to enjoy programming and side projects again.

nicoburns

I guess each to their own? I can easily end up coding for 16 hours straight (having a great time) if I'm not careful. I can't imagine I'd have as much patience with an AI.

MangoCoffee

I've been vibe coding a couple of personal projects. I've found that test-driven development fits very well with vibe coding, and it's just as you said break up the problem into small, testable chunks, get the AI to write unit tests first, and then implement the actual code

yodsanklai

Actually, all good engineering principles which reduce cognitive load for humans work for AI as well.

BoiledCabbage

This is what's so funny about this. In some alternative universe I hope that LLMs never get any better. Because they force so much of good things.

They are the single closest thing we've ever had to objective evaluation on if an engineering practice is better or worse. Simply because just about every single engineering practice that I see that makes coding agents work well also makes humans work well.

And so many of these circular debates and other best practices (TDD, static typing, keeping todo lists, working in smaller pieces, testing independently before testing together, clearly defined codebase practices, ...) have all been settled in my mind.

The most controversial take, and the one I dislike but may reluctantly have to agree with is "Is it better for a business to use a popular language less suited for the task than a less popular language more suited for it." While obviously it's a sliding scale, coding agents clearly weight in on one side of this debate... as little as I like seeing it.

colordrops

This is the big secret. Keep code modular, small, single purpose, encapsulated, and it works great with vibe coding. I want to write a protocol/meta language similar to the markdown docs that Claude et al create that is per module, and defines behavior, so you actually program and compose modules with well defined interfaces in natural language. I'm surprised someone hasn't done it already.

alexsmirnov

TDD is exactly that I unable to get from AI tools. Probably, because training sets always have both code and tests. I tried multiply models from all major providers, and all failed to create tests without seen the code. One workflow that helps is to create dirty implementation and generate tests for it. Then throw away the first code and use different model for final implementation.

The best way is to create tests yourself, and block any attempts to modify them

MarkMarine

Works great until it’s stuck and it starts just refactoring the tests to say true == true and calling it a day. I want the inverse of black box testing, like the inside of the box has the model in it with the code and it’s not allowed to reach outside the box and change the grades. Then I can just do the Ralph Wiggum as a software engineer loop to get over the reward hacking tendencies

8n4vidtmkvmk

Don't let it touch the test file then? I usually give context to the LLM about what it's allowed to touch. I don't do big sweeping changes though. Don't trust LLM for that. For small, focused changes its great

com2kid

> 1) Don't ask for large / complex change. Ask for a plan but ask it to implement the plan in small steps and ask the model to test each step before starting the next.

I asked Claude Code to read a variable from a .env file.

It proceeded to write a .env parser from scratch.

I then asked it to just use Node's built in .env file parsing....

This was the 2nd time in the same session that it wrote a .env file parser from scratch. :/

Claude Code is amazing, but it'll goes off and does stupid even for simple requests.

NitpickLawyer

Check your settings, they might be unable to read .env files as a guardrail.

jason_zig

I've seen people post this same advice and I agree with you that it works but you would think they would absorb this common strategy and integrate it as part of the underlying product at this point...

noosphr

The people who build the models don't understand how to use the models. It's like asking people who design CPUs to build data-centers.

I've interviewed with three tier one AI labs and _no-one_ I talked to had any idea where the business value of their models came in.

Meanwhile Chinese labs are releasing open source models that do what you need. At this point I've build local agentic tools that are better than anything Claude and OAI have as paid offerings, including the $2,000 tier.

Of course they cost between a few dollars to a few hundred dollars per query so until hardware gets better they will stay happily behind corporate moats and be used by the people blessed to burn money like paper.

criemen

> The people who build the models don't understand how to use the models. It's like asking people who design CPUs to build data-centers.

This doesn't match the sentiment on hackernews and elsewhere that claude code is the superior agentic coding tool, as it's developed by one of the AI labs, instead of a developer tool company.

Barbing

Very interesting. And plausible.

Doesn't specifically seem to jive with the claim Anthropic made where they were worried about Claude Code being their secret sauce, leaving them unsure whether to publicly release it. (I know some skeptical about that claim.)

nostrademons

A lot of it is integrated into the product at this point. If you have a particularly tricky bug, you can just tell Claude "I have this bug. I expected output 'foo' and got output 'bar'. What went wrong?" It will inspect the code and sometimes suggest a fix. If you run it and it still doesn't work, you can say "Nope, still not working", and Claude will add debug output to the whole program, tell you to run it again, and paste the debug output back into the console. Then it will use your example to write tests, and run against them.

tombot

Claude Code at least now lets you use its best model for planning mode and its cheapest model for coding mode.

candiddevmike

The consulting world parallels here are funny

ants_everywhere

IMO by far the best improvement would be to make it easier for the agent to force the agent to use a success criterion.

Right now it's not easy prompting claude code (for example) to keep fixing until a test suite passes. It always does some fixed amount of work until it feels it's most of the way there and stops. So I have to babysit to keep telling it that yes I really mean for it to make the tests pass.

ale

It’s about time these types of articles actually include the types of tasks being “orchestrated” (as the author writes) that aren’t just plain refactoring chores or React boilerplate. Sanity has quite a backlog of long-requested features and the message here is that these agents are supposedly parallelizing a lot of the work. What kind of staff engineer has “80% of their code” written by a “junior developer who doesn't learn“?

mindwok

IMO “junior developer who doesn't learn“ is not quite right. Claude is more like an senior, highly academic engineer who has read all the literature but hasn't ever written any code. Amazing encyclopaedic knowledge, zero taste.

I've been building commercial codebases with Claude for the last few months and almost all of my input is on taste and what defines success. The code itself is basically disposable.

all2

> The code itself is basically disposable.

I'm finding this is the case for my work as well. The spec is the secret sauce, the code (and its many drafts) are disposable. Eventually I land on something serviceable, but until I do, I will easily drop a draft and start on a new one with a spec that is a little more refined.

bjornsing

So how do you best store and iterate on the spec? One way I guess would be to work on a branch an modify Claude.md to reflect what the branch is for. Is that a good approach? Are there others?

dotancohen

I just like to add that the database design is the real secret sauce, important even more than external APIs in my opinion.

baq

> The code itself is basically disposable.

This is key. We’re in mass production of software era. It’s easier and cheaper to replace a broken thing/part than to fix it, things being some units of code.

globular-toast

If the code is disposable then where do all the rules and reasoning etc live apart from in your head?

dotancohen

In the spec.

sanitycheck

Eh, Claude is like a magical spaniel that can read and write very quickly, with early-stage alzheimers, on amphetamines.

Yes it knows a lot and can regurgitate things and create plausible code (if I have it run builds and fix errors every time it changes a file - which of course eats tokens) but having absolutely no understanding of how time or space works leads to 90% of its great ideas being nonsensical for UI tasks. Everything is needing very careful guidance and supervision otherwise it decides to do something different instead. For back end stuff, maybe it's better.

I'm on the fence regarding overall utility but $20/month could almost be worth it for a tool that can add a ton of debug logging in seconds, some months.

vincent_builds

Hi Ale, author here. Skepticism is understandable, but trust me, I'm not just writing React boilerplate or refactoring.

I find it difficult to include examples because a lot of my work is boring backend work on existing closed-source applications. It's hard to share, but I'll give it a go with a few examples :)

----

First example: Our quota detection system (shipped last month) handles configurable threshold detection across billing metrics. The business logic is non-trivial, distinguishing counter vs gauge metrics, handling multiple consumers, and efficient SQL queries across time windows.

Claude's evolution: - First pass: Completely wrong approach (DB triggers) - Second pass: Right direction, wrong abstraction - Third pass: Working implementation, we could iterate on

---- Second example: Sentry monitoring wrapper for cron jobs, a reusable component to help us observe our cronjob usage

Claude's evolution: - First pass: Hard-coded the integration into each cron job, a maintainability nightmare. - Second pass: Using a wrapper, but the config is all wrong - Third pass: Again, OK implementation, we can iterate on it

----

The "80%" isn't about line count; it's about Claude handling the exploration space while I focus on architectural decisions. I still own every line that ships, but I'm reviewing and directing rather than typing.

This isn't writing boilerplate, it's core billing infrastructure. The difference is that Claude is treated like a very fast junior who needs clear boundaries rather than expecting senior-level architecture decisions.

bsder

We have all these superpowered AI vibe coders, and yet open source projects still have vast backlogs of open issues.

Things that make you go "Hmmmmmm."

baq

You have to pay a recurring subscription to access the worthwhile tools in a meaningful capacity. This goes directly against why retail users of open source software, some of whom are also developers of it, actually use it - and you can tell a lot of developers do it because they find coding fun.

It’s a very different discussion when you’re building a product to sell.

TiredOfLife

The projects that have those backlogs dont allow ai made code

willtemperley

Yes exactly. Show us the code and we can evaluate the advice. Otherwise it’s just an advertisement.

bakugo

Actually providing examples of real tasks given to the AI and the subsequent results would break the illusion and give people opportunities to question the hype. Can't have that.

We'll just keep getting submission after submission talking about how amazing Claude Code is with zero real world examples.

vincent_builds

Author here. It's fair enough. I didn't give real-world examples; that's partially down to what I typically work on. I usually work in brownfield backend logic in closed-source applications that don't showcase well.

Two recent production features:

1. *Quota crossing detection system* - Complex business logic for billing infrastructure - Detects when usage crosses configurable thresholds across multiple metric types - Time: 4 days parallel work vs ~10 days focused without AI

   The 3-attempt pattern was clear here:
   - Attempt 1: DB trigger approach - wouldn't scale for our requirements
   - Attempt 2: SQL detection but wrong interfaces, misunderstood counter vs gauge metrics
   - Attempt 3: Correct abstraction after explaining how values are stored and consumed
2. *Sentry monitoring wrapper for cron jobs* - Reusable component wrapping all cron jobs with monitoring - Time: 1 day parallel vs 2 days focused

Nothing glamorous, but they are real-world examples of changes I've deployed to production quicker because of Claude.

johnfn

Really, zero real world examples? What about this?

https://news.ycombinator.com/item?id=44159166

dingnuts

the kind of engineer who has been Salesified to the point that they write such drivel as "these learnings" instead of "lessons" in an article that allegedly has a technical audience.

it's funny because as I have gotten better as a dev I've gone backwards through his progression. when I was less experienced I relied on Google; now, just read the docs

juped

Yeah, the trusty manual becomes #1 at around the same time as one starts actually engineering. You've entered the target audience!

skydhash

These days, I often just go straight to the source (when available) to clear some confusion about the library/software behavior. It can be a quite nice 10 mn break.

asdev

Guy said a whole lot of nothing. Said he's improved productivity, but also said AI falls short in all the common ways people have noticed. Also guarantee no one is building core functionality delegating to Claude Code.

aronowb14

Agreed. I think this Anthropic article is a realistic take on what’s possible (focus on prototyping)

https://www-cdn.anthropic.com/58284b19e702b49db9302d5b6f135a...

muzani

This whole article is a really odd take. Maybe it's upvoted so much because it's from a "staff engineer". Most people are getting much better rates than 95% failure and almost nobody is spending over $1000 a month. If it was anyone else saying the same thing, they'd be laughed out of the room.

alessandru

did this guy read that other paper about ai usage making people stupid?

how long until he falls from staff engineer back down to senior or something less?

jpollock

Avoiding the boilerplate is part of the job as a software developer.

Abstracting the boilerplate is how you make things easier for future you.

Giving it to an AI to generate just makes the boilerplate more of a problem when there's a change that needs to be made to _all_ the instances of it. Even worse if the boilerplate isn't consistent between copies in the codebase.

conradfr

What's weird for me is that most frameworks and tools usually include generators for boilerplate code anyway so not sure why wasting tokens/money on that is valuable.

globular-toast

Yeah. I'm increasingly starting to think this LLM stuff is simply the first time many programmers have been able to not write boilerplate. They didn't learn to build abstractions so essentially live on whatever platform someone else has built for them. AI is simply that new platform.

I'm lazy af. I have not been manually typing up boilerplate for the past 15 years. I use computers to do repetitive tasks. LLMs are good at some of them, but it's just another tool in the box for me. For some it seems like their first and only one.

What I can't understand is how people are ok with all that typing that you still have to do just going into /dev/null while only some translation of what you wrote ends up in the codebase. That one makes me even less likely to want to type. At least if I'm writing source code I know it's going into the repository directly.

skydhash

The one thing I’m always suspicious about is the actual mastery (programming language and computer usage) involved. You never see anyon describe the context of what they’ve been doing pre-llm.

resonious

Interesting that this guy uses AI for the initial implementation. I do the opposite. I always build the foundation. That way I know how things work fundamentally. Then I ask agents to do boilerplate tasks. They're really good at following suit, but very bad at architecture.

f311a

Yeah, LLMs are pretty bad at planning maintainable architecture. They don’t refactor it when code is evolving and probably can’t do it due to context limitations.

albingroen

So we’re supposed to start paying $1k-$1,5k on top of already crazy salaries just to maybe get a productivity boost on trivial to semi trivial issues? I know my boss would not be keen on that at least.

Jcampuzano2

If devs salaries are so crazy its quite the opposite. NOT investing 1-1.5k/mo to improve their productivity by a measurable amount would quite literally be just plain stupid and I would question your boss ability to think critically.

Not to mention - while I know many don't like it, they may be able to achieve enough of a productivity boost to not require hiring as many of those crazy salaried devs.

Its literally a no-brainer. Thinking about it from just the individual cost factor is too simplified a view.

15155

Hardware companies routinely license individual EDA tool seats that cost more than numerous developer salaries - $1k/year is nothing if it improves productivity by any measurable amount.

saulpw

The OP was saying it's $1k/mo. That's a 5-10% raise, which is a bit more than nothing.

Jcampuzano2

There are many companies that regularly spend much more than that on other software related licenses that devs need to do their job productively.

If the average US salaried developer is 10-15% more productive for just 1k more a month it is literally a no-brainer for companies to invest in that.

Of course on the other side of the coin there are many companies that are very stingy with paying for literally anything for their employees that could measurably improve productivity, and hamper their ability to be productive by intentionally paying for cheap shitty tools. They will just lose out.

baq

It isn’t a raise. Salaries are on a very different budget. Money is fungible etc but don’t tell accounting.

ryukoposting

Parent comment isn't joking. Good simulators for RF stuff can be well over $5k per month.

AnotherGoodName

I can't use $20 of credit (gpt-5 thinking via intellij's pro AI subscription) a month right now with plenty of usage so I'm surprised at the $1k figure. Is Claude that much more expensive? (a quick Google suggests yes actually).

Having said the above some level of AI spending is the new reality. Your workplace pays for internet right? Probably a really expensive fast corporate grade connection? Well they now also need to pay for an AI subscription. That's just the current reality.

everforward

I don't know what Intellij's AI integration is like, but my brief Claude Code experience is that it really chews through tokens. I think it's a combination of putting a lot of background info into the context, along with a lot of "planning" sort of queries that are fairly invisible to the end user but help with building that background for the ultimate query.

Aider felt similar when I tried it in architect mode; my prompt would be very short and then I'd chew through thousands of tokens while it planned and thought and found relevant code snippets and etc.

null

[deleted]

billllll

Paying for Internet is not a great analogy imo. If you don't pay $1k/mo for Internet, you literally can't work.

What happens if you don't pay $1k/mo for Claude? Do you get an appreciable drop in productivity and output?

Genuinely asking.

EE84M3i

Anthropic and OpenAI both have a high SSO/enterprise tier tax.

oblio

The fast corporate internet connection is probably 1000$ for 100 developers or more...

albingroen

And remember. This is on subsadised prices.

dajonker

Exactly, makes it feel almost like an advertorial for Anthropic, who likely need most customers to pay 1000 bucks a month to break even.

sdesol

It will certainly be interesting to see how businesses evolve in the upcoming years. What is written in stone is, you (employee) will be measured and I am curious to see what developers will be measured by in the future. Will you be at a greater risk of layoffs/lack of promotions/etc. if you spend more on AI? How do you as a developer prove that it is you and not the LLM that should be praised?

astrange

The high salaries make productivity improvements even more important.

beefnugs

If the world wasnt a garbage hole of mis-alignment and planning : The people seeing positivity out of this stuff would be demanding raises immediately, both AI experts and seniors should be demanding the company pay and train juniors as part of their loyalty commitment to the company

jbs789

I often find that Claude introduces a level of complexity that is not necessary in my cases. I suspect this is a function of the training data (large repos or novel solutions). That said, I do sometimes find inspiration for new techniques in its answers.

I just haven't heard others express the same over-engineering problem and wonder if this is a general observation or only shows up b/c my requests are quite simple.

(I have found that prompting it for the simplest or most efficient solution seems to help - sometimes taking 20+ lines down to 2-3, often more understandable.)

P.S. I tend to work with data and a web app for processes related to a small business, while not a formally trained developer.

chamomeal

Seems like LLMs really suffer from the "eh I'll just write it myself" mindset. Yesterday on a react app using react-query (library to manage caching and re-fetching of data) claude code wanted to update the cache manually, instead of just using a bit of state that was already in scope in the exact same component!

For me, stuff like that is the same weird uncanny valley that you used to see in AI text, and see now in AI video. It just does such inhuman things. A senior developer would NEVER think to manually mutate the cache, because it's such desperate hack. A junior dev wouldn't even realize it's an option.

willtemperley

Maybe I’m contrarian but I design and write most of my code and let LLMs do the reviews. Why?

First I know my problem space better than the LLM.

Second, the best way to express coding intention is with code. The models often have excellent suggestions on improvements I wouldn’t have thought of. I suspect the probability of providing a good answer has been increased significantly by narrowing the scope.

Another technique is to say “do this like <some good project> does it” but I suspect that might be close to copyright theft.

meerab

I have barely written any code since my switch to Claude Code! It's the best thing since sliced bread!

Here's what works for me:

- Detailed claude.md containing overall information about the project.

- Anytime Claude chooses a different route that's not my preferred route - ask my preference to be saved in global memory.

- Detailed planning documentation for each feature - Describe high-level functionality.

- As I develop the feature, add documentation with database schema, sample records, sample JSON responses, API endpoints used, test scripts.

- MCP, MCP, MCP! Playwright is a game changer

The more context you give upfront, the less back-and-forth you need. It's been absolutely transformative for my productivity.

Thank you Claude Code team!

bobbylarrybobby

What does the playwright MCP accomplish for you? Is it basically a way for Claude to play with your app in the browser without having to write playwright tests?

f311a

What you’re working on? In my industry it fails half of the time and comes up with absolute nonsense. The data just don’t exist for our problems, it can only work when you guide it and ask for a few functions at max.

ryukoposting

This sounds like my experiences with it. I'm writing embedded firmware in C and Rust. I'd describe further, but Claude seems incompetent at all aspects of this space.

meerab

I am working on VideoToBe.com - and my stack is NextJS, Postgresql and FastAPI.

Claude code is amazing at producing code for this stack. It does excellent job at outputting ffmpeg, curl commands, linux shell script etc.

I have written detailed project plan and feature plan in MarkDown - and Claude has no trouble understanding the instructions.

I am curious - what is your usecase?

mattmanser

That seems to be a great example of precisely the sort of program an AI would be good at. A small focused product that only does one thing. Mainly gluing together other people's code. It's a polished greenfield project that does one tiny bit of focused functionality.

Interestingly, this guy has been making pretty much the same app as you, and live-streamed making it on youtube:

https://www.youtube.com/@RayFernando1337

Looks like he's now pivoted to selling access to his discord server for vibe coding tips as I can't find a link to his product.

But if we're honest here, it's not going to take a ton of code to make that. All the functionality to do it is well documented.

Many people here could make a competitor in a week, without agentic AI, just using AI as a super-charged SO. The limiter pre-AI (aside from AI transcribing it) would have been reading and implementing/debugging all the documentation of the libraries you're using, which AI is great at circumventing.

Your product looks really good, and is an excellent example of what vibe coded AI is great at. I hope you're getting good traction.

ethanwillis

Personally, I give Claude a fully specified program as my prompt so that it gives me back a working program 100% of the time.

Really simple workflow!

Zee2

Ah, I’ve tried that one, but I must be doing something wrong. I give it a fully specified working program, and often times it gives me back one that only works 50% of the time!

jazzyjackson

Does Claude Code provide some kind of "global memory" the llm refers to, or is this just a request you make within the the llm's context window? Just curious hadn't heard the use of the term

EDIT: I see, you're asking Claude to modify claude.md to track your preference there, right?

https://docs.anthropic.com/en/docs/claude-code/memory

meerab

Yes. /init will initialize the project and save initial project information and preference.

Ask Claude to update the preference and document the moment you realize that claude has deviated away from the path.

mierz00

How have you been using Playwright MCP?