My experience with Claude Code after two weeks of adventures

98 comments

·July 17, 2025

iambateman

Claude Code is hard to describe. It’s almost like I changed jobs when I started using it. I’ve been all-in with Claude as a workflow tool, but this is literally steroids.

If you haven’t tried it, I can’t recommend it enough. It’s the first time it really does feel like working with a junior engineer to me.

arealaccount

Weirdly enough I have the opposite experience where it will take several minutes to do something, then I go in and debug for a while because the app has become fubar, then finally realize it did the whole thing incorrectly and throw it all away.

And I reach for Claude quite a bit because if it worked as well for me like everyone here says, that would be amazing.

But at best it’ll get a bunch of boilerplate done after some manual debugging, at worst I spend an hour and some amount of tokens on a total dead end

0x_rs

Some great advice I've found that seems to work very well: ask it to keep a succinct journal of all the issues and roadblocks found during the project development, and what was done to resolve or circumvent them. As for avoiding bloating the code base with scatterbrained changes, having a tidy architecture with good separation of concerns helps leading it into working solutions, but you need to actively guide it. For someone that enjoys problem-solving more than actually implementing them, it's very fun.

taude

to continue on this, I wouldn't let claude or any agent actually create a project structure, i'd guide it in the custom system prompt. and then in each of the folders continue to have specific prompts for what you expect the assets to be coded like, and common behavior, libraries, etc....

taude

do you create the claude.md files at several levels of your folder structure, so you can teach it how to do different things? Configuring these default system prompts is required to get it to work well.

I'd definitely watch Boris's intro video below [1]

[1] Boris introduction: https://www.youtube.com/watch?v=6eBSHbLKuN0 [2] summary of above video: https://www.nibzard.com/claude-code/

dawnerd

By the time you do all of that you might as well just write code by hand.

libraryofbabel

Sigh. As others have commented, over and over again in the last 6 months we've seen discussions on HN with the same basic variation of "Claude Code [or whatever] is amazing" with a reply along the lines of "It doesn't work for me, it just creates a bunch of slop in my codebase."

I sympathize with both experiences and have had both. But I think we've reached the point where such posts (both positive and negative) are _completely useless_, unless they're accompanied with a careful summary of at least:

* what kind of codebase you were working on (language, tech stack, business domain, size, age, level of cleanliness, number of contributors)

* what exactly you were trying to do

* how much experience you have with the AI tool

* is your tool set up so it can get a feedback loop from changes, e.g. by running tests

* how much prompting did you give it; do you have CLAUDE.me files in your codebase

and so on.

As others pointed out, TFA also has the problem of not being specific about most of this.

We are still learning as an industry how to use these tools best. Yes, we know they work really well for some people and others have bad experiences. Let's try and move the discussion beyond that!

imiric

It's telling that you ask these details from a comment describing a negative experience, yet the top-most comment full of praises and hyperbole is accepted at face value. Let's either demand these things from both sides or from neither. Just because your experience matches one side, doesn't mean that experiences different from yours should require a higher degree of scrutiny.

I actually think it's more productive to just accept how people describe their experience, without demanding some extensive list of evidence to back it up. We don't do this for any other opinion, so why does it matter in this case?

> Let's try and move the discussion beyond that!

Sharing experiences using anecdotal evidence covers most of the discussion on forums. Maybe don't try to police it, and either engage with it, or move on.

state_less

Here's a few general observations.

Your LLM (CC) doesn't have your whole codebase in context, so it can run off and make changes without considering that some remote area of the codebase are (subtly?) depending on the part that claude just changed. This can be mitigated to some degree depending on the language and tests in place.

The LLM (CC) might identify a bug in the codebase, fix it, and then figure, "Well, my work here is done." and just leave it as is without considering ramifications or that the same sort of bug might be found elsewhere.

I could go on, but my point is to simply validate the issues people will be having, while also acknowledging those seeing the value of an LLM like CC. It does provides useful work (e.g. large tedious refactors, prototyping, tracking down a variety of bugs, and so on...).

dejavucoder

Fair point.

For context, I was using Claude Code on a Ruby + Typescript large open source codebase. 50M+ tokens. They had specs and e2e tests so yeah I did have feedback when I was done with a feature - I could run specs and Claude Code could form a loop. I would usually advise it to fix specs one by one. --fail-fast to find errors fast.

Prior to Claude Code, I have been using Cursor for an year or so.

Sonnet is particularly good at NextJS and Typescript stuff. I also ran this on a medium sized Python codebase and some ML related work too (ranging from langchain to Pytorch lol)

I don't do a lot of prompting, just enough to describe my problem clearly. I try my best to identify the relevant context or direct the model to find it fast.

I made new claude.md files.

reactordev

Seconded, that a summary description of your problem, codebase, programming dialect in use, should be included whenever a “<Model> didn’t work for me” response.

rstuart4133

> But I think we've reached the point where such posts (both positive and negative) are _completely useless_, unless they're accompanied with a careful summary of at least ...

I use Claude many times a day, I ask it and Gemini to generate code most days. Yet I fall into the "I've never included a line of code generated by an LLM in committed code" category. I haven't got a precise answer for why that is so. All I can come up with is the code generated lacks the depth of insight needed to write a succinct, fast, clear solution to the problem someone can easily understand in in 2 years time.

Perhaps the best illustration of this is someone proudly proclaimed to be they committed 25k lines in a week, with the help of AI. In my world, this sounds like they are claiming they have a way of turning the sea into ginger beer. Gaining the depth of knowledge required to change 25k lines of well written code would take me more than a week of reading. Writing that much in a week is a fantasy. So I asked them to show me the diff.

To my surprise, a quick scan of the diff revealed what the change did. It took me about 15 minutes to understand most of it. That's the good news.

The bad news it that 25k lines added 6 fields to a database. 2/3's were unit tests, perhaps 2/3's of the remainder was comments (maybe more). The comments were glorious in their length and precision, littered with ASCII art tables showing many rows in the table.

Comments in particular are a delicate art. They are rarely maintained, so they can bit rot in downright misleading babble after a few changes. But the insight they provide into what author was thinking, and in particular the invariants he had in mind can save hours of divining it from the code. Ideally they concisely explain only the obscure bits you can't easily see from the code itself. Anything more becomes technical debt.

Quoting Woodrow Wilson on the amount of time he spent preparing speeches:

    “That depends on the length of the speech,” answered the President. “If it is a ten-minute speech it takes me all of two weeks to prepare it; if it is a half-hour speech it takes me a week; if I can talk as long as I want to it requires no preparation at all. I am ready now.”

Which is a round about way of saying I suspect the usefulness of LLM generated code depends more on how often a human is likely to read it, than of any of the things you listed. If it is write once, and the requirement is it works for most people in the common cases, LLM generated code is probably the way to go.

I used PayPal's KYC web interface the other day. It looked beautiful, completely inline with the rest of PayPal's styling. But sadly I could not complete it because of bugs. The server refused to accept one page, it just returned to the same page with no error messages. No biggie, I phoned support (several times, because they also could not get past the same bug), and after 4 hours on the phone the job was done. I'm sure the bug will be fixed a new contractor. He spend an few hours on it, getting an LLM to write a new version, throwing the old code away, just as his predecessor did. He will say the LLM provided a huge productivity boost, and PayPal will be happy because he cost them so little. It will be the ideal application for an LLM - got the job done quickly, and no one will read the code again.

I later discovered there was a link on the page that allowed me to skip past the problematic page, so I could at least enter the rest of the information. It was in a thing that looked confusingly like a "menu bar" on the left, although there was no visual hit any of the items in the menu were clickable. I clicked on most of them anyway, but they did nothing. While on hold for phone support, I started reading the HTML and found one was a link. It was a bit embarrassing to admit to the help person I hadn't clicked that one. It sped the process up somewhat. As I said, the page did look very nice to the eye, probably partially because of the lack of clutter created by visual hints on what was clickable.

[0] https://quoteinvestigator.com/2012/04/28/shorter-letter/

0x457

There are some tasks that it can fail and not, but a lot of "Claude Code [or whatever] is amazing" with a reply along the lines of "It doesn't work for me, it just creates a bunch of slop in my codebase." IMO is "i know how to use it" vs "I don't know how to use it" with a side of "I have good test coverage" vs "tests?"

wyldfire

I have seen both success and failure. It's definitely cool and I like to think of it as another perspective for when I get stuck or confused.

When it creates a bunch of useless junk I feel free to discard it and either try again with clearer guidelines (or switch to Opus).

tcdent

This has a lot to do with how you structure your codebase; if you have repeatable patterns that make conventions obvious, it will follow them for the most part.

When it drops in something hacky, I use that to verify the functionality is correct and then prompt a refactor to make it follow better conventions.

jm4

You can tell Claude to verify its work. I’m using it for data analysis tasks and I always have it check the raw data for accuracy. It was a whole different ballgame when I started doing that.

Clear instructions go a long way, asking it to review work, asking it to debug problems, etc. definitely helps.

vunderba

> You can tell Claude to verify its work

Definitely - with ONE pretty big callout. This only works when a clear and quantifiable rubric for verification can be expressed. Case in point, I put Claude Code to work on a simple react website that needed a "Refresh button" and walked away. When I came back, the button was there, and it had used a combination of MCP playwright + screenshots to roughly verify it was working.

The problem was that it decided to "draw" a circular arrow refresh icon and the arrow at the end of the semicircle was facing towards the circle centroid. Anyone (even a layman) would take one look at it and realize it looked ridiculous, but Claude couldn't tell even when I took the time to manually paste a screenshot asking if it saw any issues.

While it would also be unreasonable to expect a junior engineer to hand-write the coordinates for a refresh icon in SVG - they would never even attempt to do that in the first place realizing it would be far simpler to find one from Lucide, Font Awesome, emojis, etc.

bigiain

> Clear instructions go a long way, asking it to review work, asking it to debug problems, etc. definitely helps.

That's exactly what I learned. In the early 2000's, from three expensive failed development outsourcing projects.

yakz

I second this and would add that you really need an automated way to do it. For coding, automated test suites go a long way toward catching boneheaded edits. It will understand the error messages from the failed tests and fix the mistakes more or less by itself.

But for other tasks like generating reports, I ask it to write little tools to reformat data with a schema definition, perform calculations, or do other things that are fairly easy to then double-check with tests that produce errors that it can work with. Having it "do math in its head" is just begging for disaster. But, it can easily write a tool to do it correctly.

hnaccount_rng

Yeah that is kind of my experience as well. And - according to the friend who highly recommended it - I gave it a task that is "easily within its capabilities". Since I don't think I'm being gaslighted, I suspect it's me using it wrong. But I really can't figure out why. And I'm on my third attempt now..

ivanech

Just got it at work today and it’s a dramatic step change beyond Cursor despite using the same foundation models. Very surprising! There was a task a month ago where AI assistance was a big net negative. Did the same thing today w/ Claude Code in 20ish minutes. And for <$10 in API usage!

Much less context babysitting too. Claude code is really good at finding the things it needs and adding them to its context. I find Cursor’s agent mode ceases to be useful at a task time horizon of 3-5 minutes but Claude Code can chug away for 10+ minutes and make meaningful progress without getting stuck in loops.

Again, all very surprising given that I use sonnet 4 w/ cursor + sometimes Gemini 2.5 pro. Claude Code is just so good with tools and not getting stuck.

iambateman

Cool! If you're on pro, you can use a _lot_ of claude code without paying for API usage, btw.

pragmatic

Could you elaborate a bit on the tasks,languages,domain etc you’re using it with?

People have such widely varying experiences and I’m wondering why.

thegrim33

I find it pretty interesting that it's a roughly 2,500 word article on "using Claude Code" and they never once actually explain what they're using it for, what type of project they're coding. It's all just so generic. I read some of it then realize that there was absolutely no substance in what I just read.

It's also another in my growing list of data points towards my opinion that if an author posts meme pictures in their article, it's probably not an article I'm interested in reading.

kraftman

Yeah I got about half way through before thinking "wow theres no information in this" and giving up.

_se

It's always POC apps in js or python, or very small libraries in other popular languages with good structure from the start. There are ways to make them somewhat better in other cases (automated testing/validation/linting being a big one), but for the type of thing that 95% of developers are doing day to day (working on a big, sprawling code base where none of those attributes apply), it's not close to being there.

The tools really do shine where they're good though. They're amazing. But the moment you try to do the more "serious" work with them, it falls apart rapidly.

I say this as someone that uses the tools every day. The only explanation that makes sense to me is that the "you don't get it, they're amazing at everything" people just aren't working on anything even remotely complicated. Or it's confirmation bias that they're only remembering the good results - as we saw with last week's study on the impact of these tools on open source development (perceived productivity was up, real productivity was down). Until we start seeing examples to the contrary, IMO it's not worth thinking that much about. Use them at what they're good at, don't use them for other tasks.

LLMs don't have to be "all or nothing". They absolutely are not good at everything, but that doesn't mean they aren't good at anything.

Herring

I like them for refactoring and “explain this massive codebase please”. Basically polishing or investigating things that already work.

But I think we should expect the scope of LLM work to improve rapidly in the next few years.

https://metr.org/blog/2025-03-19-measuring-ai-ability-to-com...

ants_everywhere

> They're amazing. But the moment you try to do the more "serious" work with them, it falls apart rapidly.

Sorry, but this is just not true.

I'm using agents with a totally idiosyncratic code base of Haskell + Bazel + Flutter. It's a stack that is so quirky and niche that even Google hasn't been able to make it work well despite all their developer talent and years of SWEs pushing for things like Haskell support internally.

With agents I'm easily 100x more productive than I would be otherwise.

I'm just starting on a C++ project, but I've already done at least 2 weeks worth of work in under a day.

criddell

I haven't had great luck with Claude writing Windows Win32 (using MFC) in C++. It invents messages and APIs all the time that read like exactly what I want it to do.

I'd think Win32 development would be something AIs are very strong at because it's so old, so well documented, and there's a ton of code out there for it to read. Yet it still struggles with the differences between Windows messages, control notification messages, and command messages.

kbuchanan

I've had the same experience, although I feel like Claude is far more than a junior to me. It's ability to propose options, make recommendations, and illustrate trade-offs is just unreal.

kobe_bryant

in what sense, instead of doing your job which I assume you've been doing successfully for many years you now ask Claude to do it for you and then have to review it?

apwell23

half the posts on hackernews is same discussion over and over about coding agent usefulness or lack of

gjsman-1000

> It’s the first time it really does feel like working with a junior engineer to me.

I have mixed feelings; because this means there’s really no business reason to ever hire a junior; but it also (I think) threatens the stability of senior level jobs long term, especially as seniors slowly lose their knowledge and let Claude take care of things. The result is basically: When did you get into this field, by year?

I’m actually almost afraid I need to start crunching Leetcode, learning other languages, and then apply to DoD-like jobs where Claude Code (or other code security concerns) mean they need actual honest programmers without assistance.

However, the future is never certain, and nothing is ever inevitable.

Quarrelsome

> because this means there’s really no business reason to ever hire a junior

aren't these people your seniors in the coming years? Its healthy to model an inflow and outflow.

toomuchtodo

The pipeline dries up when orgs would rather get the upfront savings of gen AI productivity gains versus invest in talent development.

dejavucoder

Almost feels like a game as you level up!

erentz

There must at this point be lots and lots of actual walkthroughs of people coding using Claude Code, or whatever, and producing real world apps or libraries with them right? Would be neat to have a list because this is what I want to read (or watch), rather than people just continuously telling me all this is amazing but not showing me it’s amazing.

TheRoque

100% agree, I have been looking for a YouTube video or stream of someone leveraging AI to get their productivity boost, but I haven't found anything that made me think "okay, that really speeds up things"

graphememes

It's great for me. I have a claude.md at the root of every folder generally, outlined in piped text for minimal context addition about the rulesets for that folder, it always creates tests for what it's doing and is set to do so in a very specific folder in a very specific way otherwise it tries to create debug files instead. I also have set rules for re-use so that way it doesn't proliferate with "enhanced" class variants or structures and always tries to leverage what exists instead of bringing in new things unless absolutely necessary. The way I talk to it is very specific as well, I don't write huge prose, I don't set up huge PRDs and often I will only do planning if its something that I am myself unsure about. The only time I will do large text input is when I know that the LLM won't have context (it's newer than it's knowledge window).

I generally get great 1-shot (one input and the final output after all tasks are done) comments. I have moved past claude code though I am using the CLI itself with another model although I was using claude code and my reason for switching isn't that claude was a bad model it's just that it was expensive and I have access to larger models for cheaper. The CLI is the real power not the model itself per-se. Opus does perform a little better than others.

It's totally made it so I can do the code that I like to do while it works on other things during that time. I have about 60-70 different agent streams going at a time atm. Codebases sizes vary, the largest one right now is about 200m tokens (react, typescript, golang) in total and it does a good job. I've only had to tell it twice to do something differently.

ipaddr

What I wonder is how is the interview process now? Are they testing you with AI or without? Is leet code being asked with AI proving the answer?

Is there a bigger disconnect on how you are judged in an interview vs the job now?

How are the AI only developers handling this?

ct0

The projects you work on and the impact that they had. Hopefully.

Imanari

PSA: you can use CC with any model via https://github.com/musistudio/claude-code-router

The recent Kimi-K2 supposedly works great.

chrismustcode

I’d just use sst/opencode if using other models (I use it for Claude through Claude pro subscription too)

dejavucoder

thanks!

ctoth

Sometimes, you'll just have a really productive session with Claude Code doing a specific thing that maybe you need to do a lot of.

One trick I have gotten some milage out of was this: have Claude Code research Slash commands, then make a slash command to turn the previous conversation into a slash command.

That was cool and great! But then, of course you inevitably will interrupt it and need to do stuff to correct it, or to make a change or "not like that!" or "use this tool" or "think harder before you try that" or "think about the big picture" ... So you do that. And then you ask it to make a command and it figures out you want a /improve-command command.

So now you have primitives to build on!

Here are my current iterations of these commands (not saying they are optimal!)

https://github.com/ctoth/slashcommands/blob/master/make-comm...

https://github.com/ctoth/slashcommands/blob/master/improve-c...

whatever1

I find amazing all the effort that people put trying to program a non deterministic black box. True courage.

ctoth

Oh do let me tell you how much effort I put into tending my non-deterministic garden or relationships or hell even the contractors I am using to renovate my house!

A few small markdown documents and putting in the time to understand something interesting hardly seems a steep price!

blub

The contractors working on my house sometimes paint a room bright pink for no particular reason.

When I point that out, they profusely apologize and say that of course the walls must be white and wonder why they even got the idea of making them pink in the first place.

Odd, but nice fellows otherwise. It feels like they’re 10x more productive than other contractors.

simlevesque

Our brains are non deterministic black box. We just don't like to admit it.

ChuckMcM

Reading this I can see these tools as training tools for software engineering managers.

tortila

After reading and hearing rave reviews I’d love to try Claude Code in my startup. I already manage Claude Team subscription, but AFAIK Code is not included, it only exists in Pro/Max which are for individual accounts. How do people use it as a subscription for a team (ideally with central billing)?

dukeyukey

You can use CC with AWS Bedrock, with all the centralised billing AWS offers. That's how my company handles it.

bluetidepro

How are people using this without getting rate limited non stop? I pay for Claude Pro and I sometimes can’t go more than 5 prompts in an hour without it saying I need to wait 4 hours for a cooldown. I feel like I’m using it wrong or something, it’s such a frustrating experience. How do you give it any real code context without using all your tokens so quickly?

tomashubelbauer

I have the same issue and in recent days I seem to have gotten an extra helping of overload errors which hit extra hard when I realize how much this thing costs.

Edit: I see a sibling comment mention the Max plan. I wanna be clear that I am not talking about rate limits here but actual models being inaccessible - so not a rate limit issue. I hope Anthropic figures this out fast, because it is souring me on Claude Code a bit.

SwiftyBug

I've been using it pretty heavily and never have I been rate limited. I'm not even on the Pro Max plan.

manmal

Try giving it a repomap, eg by including it in CLAUDE.md. It should pull in less files (context) that way. Exactly telling it which files you suspect need editing also helps. If you let it run scripts, make sure to tell it to grep out only the relevant output, or pipe to /dev/null.

mbrumlow

No clue. I use it for hours on end. Longest run cost me $30 in tokens. I think it was 4 hours of back and forth.

Here is an example of chat gpt, followed by mostly Claude that finally solved a backlight issue with my laptop.

https://github.com/mbrumlow/lumd

singron

I haven't used Claude Code a lot, but I was using about $2-$5/hour, but it varied a lot. If I used it 6 hours/day and worked a normal 21 workday month (126 hours), then I would rack up $250-$630/month in API costs. I think I could be a more efficient with practice (maybe $1-$3/hour?). If you think you are seriously going to use it, then the $100/month or $200/month subscriptions could definitely be worth it as long as you aren't getting rate limited.

If you aren't sure whether to pull the trigger on a subscription, I would put $5-$10 into an API console account and use CC with an API key.

ndr_

I had success through Amazon Bedrock on us-east1 during European office hours. Died 9 minutes before 10 a.m. New York time, though.

stavros

Are you using Opus?

terhechte

you need the max plan to break free of most rate limits

bluetidepro

I wish there was a Max trial (while on Pro) to test if this was the case or not. Even if it was just a 24 hour trial. Max is an expensive trigger to pull, and hope it just solves this.

cmrdporcupine

FWIW I went Claude Max after Pro, and the trick is to turn off Opus. If you do that you can pretty much use Sonnet all working day in a normal session. I don't personally find Opus that useful, and it burns through quota at 5x the speed of Sonnet.

cmrdporcupine

Claude Max, honestly. Worth it to me.

ToJans

Whenever I'm rate limited (pro max plan), I stop developing.

For anything but the smallest things I use claude code...

And even then...

For the bigger things, I ask it to propose to me a solution (when adding new features).

It helps when you give proper guidance: do this, use that, avoid X, be concise, ask to refactor when needed.

All in all, it's like a slightly autistic junior dev, so you need to be really explicit, but once it knows what to do, it's incredible.

That being said, whenever you're stuck on an issue, or it keeps going in circles, I tend to rollback, ask for a proper analysis based on the requirements, and fill in the details of necessary.

For the non-standard things (f.e. detect windows on a photo and determine the measurement in centimetres), you still have to provide a lot of guidance. However, once I told it to use xyz and ABC it just goes. I've never written more then a few lines of PHP in my life, but have a full API server with an A100 running, thanks to Claude.

The accumulated hours saved are huge for me, especially front-end development, refactoring, or implementing new features to see if they make sense.

For me it's a big shift in my approach to work, and I'd be really sad if I have to go back to the pre-AI area.

Truth to be told, I was a happy user of cline & Gemini and spent hundreds of dollars on API calls per month. But it never gave me the feeling Claude code gave me, the reliability for this thing is saving me 80% of my time.

dontlaugh

I still don’t get why I should want that.

I’ve mentored and managed juniors. They’re usually a net negative in productivity until they are no longer juniors.

quesera

My current working theory is this:

People who enjoy mentoring juniors are generally satisfied with the ROI of iterating through LLM code generation.

People who find juniors sort-of-frustrating-but-part-of-the-job-sometimes have a higher denominator on that ROI calc, and ask themselves why they would keep banging their head against the LLM wall.

The first group is probably wiser and more efficient at multiplying their energies, in the long term.

I find myself in the second group. I run tests every couple months, but I'm still waiting for the models to have a higher R or a lower I. Any day now.

wrs

There is a VS Code extension for Claude Code. It's hardly more than a terminal window really, but that in itself is pretty handy. If you do /ide to connect up the extension it does a few things, but not yet anything resembling the Cursor diff experience (much less the Cursor tab experience, which is the reason I still use it).

mike1o1

Claude Code has pretty much replaced Copilot overnight for me, though I wish the VS Code plugin was a bit more integrated, as it's only a little bit more than a terminal, though I guess that's the point. I was hoping for syntax highlighting to match my editor and things like that (beyond just light/dark theme).

What I'd really want is a way to easily hide it, which I did quite frequently with Copilot as its own pane.

dejavucoder

I use Claude Code 50% of times with Cursor now due to the diff and tab. The extension is just a bit buggy sometimes otherwise I would use it much more. I hit some node related bugs today while searching stuff with it (forgot to report to Anthropic lol). Other bugs include a scroll stuttering.

wahnfrieden

One approach to improving CC's search is to use Repo Prompt https://repoprompt.com

dejavucoder

Thanks, I will check this out