Skip to content(if available)orjump to list(if available)

I've been using Claude Code for a couple of days

develoopest

I must be the dumbest "prompt engineer" ever, each time I ask an AI to fix or even worse, create something from scratch it rarely returns the right answer and when asked for modification it will struggle even more.

All the incredible performance and success stories always come from these Twitter posts, I do find value in asking simple but tedious task like a small refactor or generate commands, but this "AI takes the wheel level" does not feel real.

abxyz

I think it's probably the difference between "code" and "programming". An LLM can produce code and if you're willing to surrender to the LLMs version of whatever it is you ask for, then you can have a great and productive time. If you're opinionated about programming, LLMs fall short. Most people (software engineers, developers, whatever) are not "programmers" they're "coders" which is why they have a positive impression of LLMs: they produce code, LLMs produce code... so LLMs can do a lot of their work for them.

Coders used to be more productive by using libraries (e.g: don't write your own function for finding the intersection of arrays, use intersection from Lodash) whereas now libraries have been replaced by LLMs. Programmers laughed at the absurdity of left-pad[1] ("why use a dependency for 16 lines of code?") whereas coders thought left-pad was great ("why write 16 lines of code myself?").

If you think about code as a means to an end, and focus on the end, you'll get much closer to the magical experience you see spoken about on Twitter, because their acceptance criteria is "good enough" not "right". Of course, if you're a programmer who cares about the artistry of programming, that feels like a betrayal.

[1] https://en.wikipedia.org/wiki/Npm_left-pad_incident

miki123211

Oh, this captures my experience perfectly.

I've been using Claude Code a lot recently, and it's doing amazing work, but it's not exactly what I want it to do.

I had to push it hard to refactor and simplify, as the code it generated was often far more complicated than it needed to be.

To be honest though, most of the code it generated I would accept if I was reviewing another developer's work.

I think that's the way we need to look at it. It's a junior developer that will complete our tasks, not always in our preferred way, but at 10x the speed, and frequently make mistakes that we need to point out in CR. It's not a tool which will do exactly what we would.

jmull

All I really care about is the end result and, so far, LLMs are nice for code completion, but basically useless for anything else.

They write as much code as you want, and it often sorta works, but it’s a bug filled mess. It’s painstaking work to fix everything, on part with writing it yourself. Now, you can just leave it as-is, but what’s the use of releasing software that crappy?

I suppose it’s a revolution for that in-house crapware company IT groups create and foist on everyone who works there. But the software isn’t better, it just takes a day rather than 6 months (or 2 years or 5 years) to create. Come to think of it, it may not be useful for that either… I think the end-purpose is probably some kind of brag for the IT manger/exec, and once people realize how little effort is involved it won’t serve that purpose.

barbazoo

I love the subtle mistakes that get introduced in strings for example that then take me all the time I saved to fix.

fallinditch

Have you tried using Cursor rules? [1]

Creating a standard library stdlib with many (potentially thousands) of rules, and then iteratively adding to and amending the rules as you go, is one of the best practices for successful AI coding.

[1] https://docs.cursor.com/context/rules-for-ai

icedchai

This aligns with my experience. I've seen LLMs produce "code" that the person requesting is unable to understand or debug. It usually almost works. It's possible the person writing the prompt didn't actually understand the problem, so they got a half baked solution as a result. Either way, they need to go to a human with more experience to figure it out.

beezlewax

I'm waiting for artisan programming to become a thing.

LeftHandPlane

Artisanal firmware is the future (or the past? or both?): https://www.youtube.com/watch?v=vBXsRC64hw4

discordance

by 100% organic, free range and fair trade programmers

jdmoreira

From before people even knew what llms were: https://handmade.network

dr_dshiv

Like, writing binaries directly? Is assembly code too much of an abstraction?

pydry

Artisanal code has been a thing for a long while.

If we're the luddite artisans, LLMs seem to represent the knitting frames which replaced their higher quality work with vastly cheaper, far crappier merchandise. There is a historical rhyme here.

roflyear

> LLMs version of whatever it is you ask for, then you can have a great and productive time

Sure, but man are there bugs.

BeetleB

Some hints for people stuck like this:

Consider using Aider. It's a great tool and cheaper to use than Code.

Look at Aiders LLM leaderboard to figure out which LLMs to use.

Use its architect mode (although you can get quite fast without it - I personally haven't needed it).

Work incrementally.

I use at least 3 branches. My main one, a dev one and a debug one. I develop on dev. When I encounter a bug I switch to debug. The reason is it can produce a lot of code to fix a bug. It will write some code to fix it. That won't work. It will try again and write even more code. Repeat until fixed. But in the end I only needed a small subset of the new code. So you then revert all the changes and have it fix it again telling it the correct fix.

Don't debug on your dev branch.

Aider's auto committing is scary but really handy.

Limit your context to 25k.

Only add files that you think are necessary.

Combining the two: Don't have large files.

Add a Readme.md file. It will then update the file as it makes code changes. This can give you a glimpse of what it's trying to do and if it writes something problematic you know it's not properly understanding your goal.

Accept that it is not you and will write code differently from you. Think of it as a moderately experienced coder who is modifying the codebase. It's not going to follow all your conventions.

https://aider.chat/

https://aider.chat/docs/leaderboards/

tptacek

The three-branch thing is so smart.

BeetleB

It took a while for me to realize it, and frankly, it's kind of embarrassing that I didn't think of it immediately.

It is, after all, what many of us would do in our manual SW development. But when using an LLM that seems pretty good, we just assume we don't need to follow all the usual good practices.

branko_d

I have the same experience.

Where AI shines for me is as a form of a semantic search engine or even a tutor of sorts. I can ask for the information that I need in a relatively complex way, and more often than not it will give me a decent summary and a list of "directions" to follow-up on. If anything, it'll give me proper technical terms, that I can feed into a traditional search engine for more info. But that's never the end of my investigation and I always try to confirm the information that it gives me by consulting other sources.

mentalgear

Exactly same experience: since the early-access GPT-3 days, I played out various scenarios, and the most useful case has always been to use generativeAI as semantic search. It's generative features are just lacking in quality (for anything other than a toy project), and the main issues since the early GPT days remains, even though it gets better, it's still too unreliable for (mid-complex systems) serious work. Also, if you don't pay attention, it messes up other parts of the code.

jofzar

Yeah I have had some "magic" moments where I knew "what" I needed, had an idea of "how it would look",but no idea how to do it and ai helped me understand how I should do it instead of the hacky very stupid way I would have done it

Yoric

Same here. In some cases, brainstorming even kinda works – I mean, it usually gives very bad responses, but it serves as a good duck.

Code? Nope.

matt_heimer

LLM are replacing Google for me when coding. When I want to get something implemented, let's say make a REST request in Java using a specific client library, I previously used Google to find example of using that library.

Google has gotten worse (or the internet has more garbage) so finding code an example is more difficult than it used to be. Now I ask an LLM for an example. Sometimes I have to ask for a refinement and and usually something is broken in the example but it takes less time to get the LLM produced example to work than it does to find a functional example using Google.

But the LLM has only replaced my previous Google usage, I didn't expect Google to develop my applications and I don't with LLMs.

ptmcc

This has been my experience of successful usage as well. It's not writing code for me, but pulling together the equivalent of a Stack Overflow example and some explaining sentences that I can follow up on. Not perfect and I don't blindly copy paste it, same as Stack Overflow ever was, but faster and more interactive. It's helpful for wayfinding, but not producing the end result.

layer8

In order to use a library, I need to (this is my opinion) be able to reason about the library’s behavior, based on a specification of its interface contract. The LLM may help with coming up with suitable code, but verifying that the application logic is correct with respect to the library’s documented interface contract is still necessary. It’s therefore still a requirement to read and understand the library’s documentation. For example, for the case of a REST client, you need to understand how the possible failure modes of the HTTP protocol and REST API are translated by the library.

slooonz

I decided to try seriously the Sonnet 3.7. I started with a simple prompt on claude.ai ("Do you know claude code ? Can you do a simple implementation for me ?"). After minimal tweaking from me, it gave me this : https://gist.github.com/sloonz/3eb7d7582c33e95f2b000a0920016...

After interacting with this tool, I decided it would be nice if the tool could edit itself, so I asked (him ? it ?) to create its next version. It came up with a non-working version of this https://gist.github.com/sloonz/3eb7d7582c33e95f2b000a0920016.... I fixed the bug manually, but it started an interactive loop : I could now describe what I wanted, describe the bugs, and the tool will add the features/fix the bugs itself.

I decided to rewrite it in Typescript (by that I mean: can you rewrite yourself in typescript). And then add other tools (by that: create tools and unit tests for the tools). https://gist.github.com/sloonz/3eb7d7582c33e95f2b000a0920016... and https://gist.github.com/sloonz/3eb7d7582c33e95f2b000a0920016... have been created by the tool itself, without any manual fix from me. Setting up the testing/mock framework ? Done by the tool itself too.

In one day (and $20), I essentially had recreated claude-code. That I could improve just by asking "Please add feature XXX". $2 a feature, with unit tests, on average.

WD-42

So you’re telling me you spent 20 dollars and an entire day for 200 lines of JavaScript and 75 lines of python and this to you constitutes a working re-creation of Claude Code?

This is why expectations are all out of whack.

slooonz

2200 lines. Half of them unit tests I would probably have been too lazy to write myself even for a "more real" project. Yes, I consider $20 cheap for that, considering:

1. It’s a learning experience 2. Looking at the chat transcripts, many of those dollars are burned for stupid reasons (Claude often fails with the insertLines/replaceLines functions and break files due to miss-by-1 offset) that are probably fixable 3. Remember that Claude started from a really rudimentary base with few tools — the bootstrapping was especially inefficient

Next experiment will be on an existing codebase, but that’s probably for next weekend.

BeetleB

That amount of output is comparable to what many professional engineers produce in a given day, and they are a lot more expensive.

Keep in mind this is the commenters first attempt. And I'm surprised he paid so much.

Using Aider and Sonnet I've on multiple occasions produced 100+ lines of code in 1-2 hours, for under $2. Most of that time is hunting down one bug it couldn't fix by itself (reflective of real world programming experience).

There were many other bugs, but I would just point out the failures I was seeing and it would fix it itself. For particularly difficult bugs it would at times even produce a full new script just to aid with debugging. I would run it and it would spit out diagnostics which I fed back into the chat.

The code was decent quality - better than what some of my colleagues write.

I could probably have it be even more productive if I didn't insist on reading the code it produced.

Silhouette

Thanks for writing up your experience and sharing the real code. It is fascinating to see how close these tools can now get to producing useful, working software by themselves.

That said - I'm wary of reading too much into results at this scale. There isn't enough code in such a simple application to need anything more sophisticated than churning out a few lines of boilerplate that produce the correct result.

It probably won't be practical for the current state of the art in code generators to write large-scale production applications for a while anyway just because of the amount of CPU time and RAM they'd need. But assuming we solve the performance issues one way or another eventually it will be interesting to see whether the same kind of code generators can cope with managing projects at larger scales where usually the hard problems have little to do with efficiently churning out boilerplate code.

NitpickLawyer

aider has this great visualisation of "self written code" - https://aider.chat/HISTORY.html

escapecharacter

I've found AI to be useful on precisely-scoped tasks I might assign to a junior programmer to take a day to do, like "convert this exact bash script to a Powershell script".

But in my own work, those tasks are pretty rare, like 3 times a month? Often I start working on something, and the scope and definition of success changes while I'm in the midst of it. Or it turns out to be harder than expected and it makes sense to timebox it and do a quick search for workarounds.

As much as we joke about StackOverflow commenters sometimes telling a question-asker they shouldn't be doing what they're trying to do, you do actually want that (soft) pushback some of the time. Most modern LLMs will gleefully come up with a complete plan for how we're gonna get the square peg in the round hole.

philipswood

> you do actually want that (soft) pushback some of the time. Most modern LLMs will gleefully come up with a complete plan for how we're gonna get the square peg in the round hole.

I once accidentally asked a local DeepSeek distilled model to do the wrong thing by accidentally copy pasting the wrong variable name.

It told me how to do it, and then asked me if I was sure.

My local DeepSeek R1 model (deepseek-r1:7b) saw me trying to do something stuupid (I was working with the wrong variable). It told me how to do what I asked and then asked:

> _Is this modification part of a larger change you're making to the code? I'd like to make sure we're not modifying something important that might have side effects._

Looking at its though process:

> _The user wants to modify this string by replacing "-input" with "-kb". But looking at the ARN structure, I recall that AWS S3 ARNs are usually in the form arn<:aws:1151472526310103070>s3:::bucket_name RegionalPart path. The part after the bucket name is typically s3:// followed by the object key._ > _Wait, maybe the user has a specific reason to make this change. Perhaps they're formatting or structuring something for a different purpose. They might not realize that directly modifying ARNs can cause issues elsewhere if not done correctly._

null

[deleted]

smallerfish

I've done code interviews with hundreds of candidates recently. The difference between those who are using LLMs effectively and those who are not is stark. I honestly think engineers who think like OP are going to get left behind. Take a weekend to work on getting your head around this by building a personal project (or learning a new language).

A few things to note:

a) Use the "Projects" feature in Claude web. The context makes a significant amount of difference in the output. Curate what it has in the context; prune out old versions of files and replace them. This is annoying UX, yes, but it'll give you results.

b) Use the project prompt to customize the response. E.g. I usually tell it not to give me redundant code that I already have. (Claude can otherwise be overly helpful and go on long riffs spitting out related code, quickly burning through your usage credits).

c) If the initial result doesn't work, give it feedback and tell it what's broken (build messages, descriptions of behavior, etc).

d) It's not perfect. Don't give up if you don't get perfection.

triyambakam

Hundreds of candidates? That's significant if not an exaggeration. What are the stark differences you have seen? Did you inquire about the candidate's use of language models?

smallerfish

Yes. I do async video interviews in round 1 of my interview process in order to narrow the candidate funnel. Candidates get a question at the start of the interview, with a series of things to work through in their own IDE while sharing their screen. I review all recordings (though I will skip around, and if candidates don't get very far I won't spend a lot of time watching at 1x speed.) The question as laid out encourages them to use all of the tools they usually rely on while coding (including google, stackoverflow, LLMs, ...).

Candidates who use LLMs generally get through 4 or 5 steps in the interview question. Candidates who don't are usually still on step 2 by the end of the interview (with rare exceptions), without their code quality being significantly better.

(I end up in 1:1 interviews with perhaps 10-15% of candidates who take round 1).

nsonha

if it's real that person interviewed at least one candidate per day last year. Idk what kind of engineering role in what kind of org where you even do that.

jacobedawson

I'd add to that that the best results are with clear spec sheets, which you can create using Claude (web) or another model like ChatGPT or Grok. Telling them what you want and what tech you're using helps them create a technical description with clear segments and objectives, and in my experience works wonders in getting Claude Code on the right track, where it has full access to the entire context of your code base.

crabl

What I've noticed from my extensive use over the past couple weeks has been Claude Code really sucks at thinking things through enough to understand the second and third order consequences of the code that it's writing. That said, it's easy enough to work around its deficiencies by using a model with extended thinking (Grok, GPT4.5, Sonnet 3.7 in thinking mode) to write prompts for it and use Claude Code as basically a dumb code-spewing minion. My workflow has been: give Grok enough context on the problem with specific code examples, ask it to develop an implementation plan that a junior developer can follow, and paste the result into Claude Code, asking it to diligently follow the implementation plan and nothing else.

simonw

"Claude Code really sucks at thinking things through enough to understand the second and third order consequences of the code that it's writing"

Yup, that's our job as software engineers.

TylerLives

This has been my experience as well. Breaking problems into smaller problems where you can easily verify correctness works much better than having it solve the whole problem on its own.

WD-42

you just described how a good developer works.

cglace

In all of these posts I fail to see how this is engineering anymore. It seems like we are one step away from taking ourselves out of the picture completely.

bckr

I don’t write binaries, assembly, or C. If I don’t have to write an application, I’m okay with that.

I still have to write the requirements, design, and acceptance criteria.

I still have to gather the requirements from stakeholders, figure out why those will or will not work, provision infra, figure out how to glue said infra together, test and observe and debug the whole thing, get feedback from stakeholders…

I have plenty of other stuff to do.

And if you automate 99% of the above work?

Then the requirements are going to get 100Xed. Put all the bells and whistles in. Make it break the laws of physics. Make it never ever crash and always give incredibly detailed feedback to the end users. Make it beautiful and faster than thought itself.

I’m not worried about taking myself out of the loop.

chaosprint

It seems the original poster hasn't extensively tried various AI coding assistants like Cursor or Windsurf.

Just a quick heads-up based on my recent experience with agent-based AI: while it's comfortable and efficient 90% of the time, the remaining 10% can lead to extremely painful debugging experiences.

In my view, the optimal scenarios for using LLM coding assistants are:

- Architectural discussions, effectively replacing traditional searches on Google.

- Clearly defined, small tasks within a single file.

The first scenario is highly strategic, the second is very tactical. Agents often fall awkwardly between these two extremes. Personally, I believe relying on an agent to manage multiple interconnected files is risky and counterproductive for development.

hashmap

This has been my experience as well. I find that the copy/paste workflow with a browser LLM still gets me the most bang for the buck in both those cases. The cli agents seem to be a bit manic when they get hold of the codebase and I have a harder time corralling them into not making large architectural changes without talking through them first.

For the moment, after a few sessions of giving it a chance, I find myself using "claude commit" but not asking it to do much else outside the browser. I still find o1-pro to be the most powerful development partner. It is slow though.

sbszllr

> In my view, the optimal scenarios for using LLM coding assistants are:

> - Architectural discussions, effectively replacing traditional searches on Google.

> - Clearly defined, small tasks within a single file.

I think you're on point here, and it has been my experience too. Also, not limited to coding but general use of LLMs.

intrasight

> extremely painful debugging experiences.

I'd claim that if you're debugging the code - or even looking at it for that matter - that you're using AI tools the wrong way.

chaosprint

I'd be very interested to know of a way to make it work with AI that doesn't require debugging if you can illustrate.

collingreen

This is exactly my impression of the summary of these kinds of posts and, I'm speculating here, maybe where there is such a stark difference.

I'm guessing that the folks who read the output and want to understand it deeply and want to "approve" it like a standard pull request are having a very different perspective and workflow than those who are just embracing the vibe.

I do not know if one leads to better outcomes than the other.

esafak

Are you serious? Why not just vibe work with your human coworkers and merge to master then? Let's see what the outcome is!

tomnipotent

The author works on Cody at Sourcegraph so I'll give him the benefit of the doubt that he's tried all the major players in the game.

finolex1

He literally says in his post "It might look antiquated but it makes Cursor, Windsurf, Augment and the rest of the lot (yeah, ours too, and Copilot, let's be honest) FEEL antiquated"

rs186

A single tweet with lots of analogy, with no screenshot/screen recording/code examples whatsoever. These are just words. Are we just discussing programming based on vibe?

frankc

I think the interest has more to do with who is doing the tweeting, don't you think?

delusional

It's influencer culture. It's like when people watch those "software developer" youtubers and pretend it's educational. It's reality television for computer people.

mpalmer

Reality television plus cooking show, exactly.

macNchz

Cooking shows are a perfect analogy for this stuff. For some reason I never connected the highly-edited-mass-appeal "watch someone do skilled work" videos on YouTube with Food Network style content until just now, but you're right they're totally scratching the same basic itch. They make people feel like they're learning something just by watching, while there is really no substitute for actually just doing the thing.

tylerrobinson

> reality television for computer people

Complete with computer people kayfabe!

kleiba

What, someone cannot utter an opinion anymore?

h4ny

I find that question ironic.

kleiba

But isn't that the point?

bob1029

I find that maintaining/developing code is not an ideal use case for LLMs and is distracting from the much more interesting ones.

Any LLM application that relies more-or-less on a single well-engineered prompt to get things done is entry level and not all that impressive in the big picture - 99% of the heavy lifting is in the foundation model and next token prediction. Many code assistants are based on something like this out of necessity of needing to support anybody's code. You can't rely on too many clever prompt chaining patterns to build optimizations for Claude Code because everyone takes different approaches to their codebase and has wildly differing expectations for how things should go down. Because the range of expectations is so vast, there is a lot of room to get disappointed.

The LLM applications that are most interesting have the model integrated directly with the product experience and rely on deep domain expertise to build sophisticated chaining of prompts, tool calling and nesting of conversations. In these applications, the user's experience and outcomes are mostly predetermined with the grey areas intended to be what the LLM is dealing with. You can measure things and actually do something about it. What was the probability of calling one tool over the other in a specific context of use? Placing these prompts and statistics alongside domain requirements will enable you to see and make a difference.

phartenfeller

I tried it too and tasked it to do a bigger migration (one web framework to another). It failed pretty bad where I stopped the experiment. It still gave me a headstart where I can take parts and continue the migration manually. But the worst thing was that it did things I didn't asked for like changing the HTML structure and CSS of pages and changing hand picked HEX color codes...

More about my experience on my blog: https://hartenfeller.dev/blog/testing-claude-code

ludamn

[dead]

raylad

I tried it on a small Django app and was not impressed in the end.

It looks like it’s doing a lot, and at first I was very impressed, but after a while I realized that when it ran into a problem it kept on trying nonworking strategies even though it had tried them before and I had added to claude.md instructions to keep track of strategies and not reuse failing ones.

It was able to make a little progress, but not get to the end of the task, and some of its suggestions were completely insane. At one point there was a database issue and it suggested switching to an entirely different database than the one that was already used by the app, which was working and production.

$12 spent in a couple of hours later, it had created 1200 lines of partially working code and rather of a mess. I ended up throwing away all the changes and going back to using the web UI.

babyent

Now take your $12 and multiply it by 100k people or more trying it.

Even if you won’t use it again, that’s booked revenue for the next fundraise!

hleszek

I must have been a little too ambitious with my first test with Claude Code.

I asked it to refactor a medium-sized Python project to remove duplicated code by using a dependency injection mechanism. That refactor is not really straightforward as it involves multiple files and it should be possible to use different files with different dependencies.

Anyway, I explain the problem in a few lines and ask for a plan of what to do.

At first I was extremely impressed, it automatically used commands to read the files and gave me a plan of what to do. It seemed it perfectly understood the issue and even proposed some other changes which seemed like a great idea.

So I just asked him to proceed and make the changes and it started to create folders and new files, edit files, and even run some tests.

I was dumbfounded, it seemed incredible. I did not expect it to work with the first try as I had already some experience with AI making mistakes but it seemed like magic.

Then once it was done, the tests (which covered 100% of the code) were not working anymore.

No problem, I isolate a few tests failing and ask Claude Code to fix it and it does.

Now for a few times I found some failing tests and ask him to fix it, slowly trying to fix the mess until there is a test which had a small problem: it succeeded (with pytest) but froze at the end of the test.

I ask again Claude Code to fix it and it tries to add code to solve the issue, but nothing works now. Each time it adds some bullshit code and each time it fails, adding more and more code to try to fix and understand the issue.

Finally after $7,5 spent and 2000+ lines of code changed it's not working, and I don't know why as I did not make the changes.

As you know it's easier to write code than to read code so at end I decided to scrape everything and do all the changes myself little by little, checking that the tests keep succeeding as I go along. I did follow some of the recommended changes it proposed tough.

Next time I'll start with something easier.

jpc0

Really yoy nearly got the correct approach there.

I generally follow the same approach these days, ask it to develop a plan then execute but importantly have it excute each step in as small increments as possible and do a proper code review for each step. Ask if for changes you want it to make.

There is certainly times I need to do it myself but definitely this has improved some level of productivity for me.

It's just pretty tedious so I generally write a lot of "fun" code myself, and almost always do the POC myself then have the AI do the "boring" stuff that I know how to do but really don't want to do.

Same with docs, the modern reasoning models are very good at docs and when guided to a decent style can really produce good copy. Honestly R1/4o are the first AI I would actually concider pulling into my workflow since they make less mistakes and actually help more than they harm. They still need to be babysit though as you noticed with Claude.

elcomet

I'm wondering if you can prompt it to work like this - make minimal changes, and run the tests at each step to make sure the code is still working

darkerside

I'm curious for the follow up post from Yegge, because this post is worthless without one. Great, Claude Code seems to be churning out bug fixes. Let's see if it actually passes tests, deploys, and works as expected in production for a few days if not weeks before we celebrate.

pchristensen

He posts a few times a year at https://sourcegraph.com/blog

UncleEntity

> ...do all the changes myself little by little, checking that the tests keep succeeding as I go along.

Or... you can do that with the robots instead?

I tried that with the last generation of Claude, only adding new functionality when the previously added functionality was complete, and it did a very good job. Well, Claude for writing the code and Deepseek-R1 for debugging.

Then I tried a more involved project with apparently too many moving parts for the stupid robots to keep track of and they failed miserably. Mostly Claude failed since that's where the code was being produced, can't really say if Deepseek would've fared any better because the usage limits didn't let me experiment as much.

Now that I have an idea of their limitations and had them successfully shave a couple yaks I feel pretty confident to get them working on a project which I've been wanting to do for a while.

noisy_boy

The trick is not to get sucked into making it do 100% of the task and have a judgement of the sweet spot. Provide it proper details upfront along with the desired overall structure - that should settle in about 10-15 mins of back and forth. This must include tests that you have to review manually - again you will find issues and lose time again (say about 30-45mins). Cut your losses and close the lose ends of the test cide. Now run the tests and start giving it discreet tasks to fix the tests. This is easily 20-40 mins. Now take over and go through the while thing yourself because this is where you will find more issues upon in-depth checking (the LLM has done most of what it could) and this where you must understand the code you need to support.

credit_guy

I'm using Copilot for writing documentation jupyter notebooks. I do lots of matplotlib plots. Setting up these plots takes lots of repetitive lines of code. Like plt.legend(). With Copilot these lines just show up, and you press tab and move on. Sometimes it is freaky how it guesses what I want to do. For this type of work, Copilot increases my productivity by a factor of 5 easily.

There are other types of work where Copilot is useless. But it's up to me to take the good parts, and ignore the bad parts.

bglazer

Yeah copilot is very good for matplotlib. Clunky interface with lots of repetitive code, but also tons of examples on the internet means that I almost never write matplotlib code by hand anymore.

dabinat

I used GitHub Copilot and Udemy to teach myself Rust. Copilot was especially helpful at resolving obtuse compiler error messages.

But as I’ve improved at Rust I have noticed I am using Copilot a lot less. For me now it has mainly become a tool for code completion. It’s not helping me solve problems now, it’s really just about saving time. I have estimated (unscientifically) that it probably improves my productivity 2-4x.

macrolime

Can Claude Code also be a devops agent or is it only for coding?

I currently use Cursor as a devops agent, I use the remote ssh extension to ssh into a VM, then Cursor will set up everything, I make snapshots on way in case it fucks up. It's been really great to quickly be able to setup and try out different infrastructures and backends in no time at all. It works well enough that I now do all my development using remote dev with ssh or remote containers on the a server. Having a virtualized dev environment is a great addition to just having git for the code.

mtlynch

This is particularly interesting, as Steve Yegge works on (and I think leads) Sourcegraph Cody[0], which is a competitor to Claude Code.

Cody does use Claude Sonnet, so they do have some aligned interests, but it's still surprising to see Yegge speak so glowingly about another product that does what his product is supposed to do.

[0] https://sourcegraph.com/cody

esafak

Cody lets you pick your model.

manojlds

Rising tide lifts all the boats and all that.

Claude Code didn't feel that different to me, and maybe they have something that is better and when they do release it they can say hey look, we pushed hard and have something that's better than even Claude Code.

mechanicum

I mean, doing that is pretty much what made him (semi-)famous in the first place (https://gist.github.com/chitchcock/1281611).

mtlynch

Yeah, but it's pretty different complaining from the position of a rank and file engineer at what was then like a 50k-person org as opposed to praising a competitor's product when you're at a small company, and you're the public face of your product.

istjohn

Thanks, I never read that one. Yegge's writing is just delicious. He could write a guide to watching paint dry and I would savor every word.