How I program with LLMs
307 comments
·January 7, 2025dewitt
gopalv
> That he says LLMs make him more productive at all as a hands-on developer, especially around first drafts on a new idea, means a lot to me personally.
There is likely to be a great rift in how very talented people look at sharper tools.
I've seen the same division pop up with CNC machines, 3d printers, IDEs and now LLMs.
If you are good at doing something, you might find the new tool's output to be sub-par over what you can achieve yourself, but often the lower quality output comes much faster than you can generate.
That causes the people who are deliberate & precise about their process to hate the new tool completely - expressing in the actual code (or paint, or marks on wood) is much better than trying to explain it in a less precise language in the middle of it. The only exception I've seen is that engineering folks often use a blueprint & refine it on paper.
There's a double translation overhead which is wasteful if you don't need it.
If you have dealt with a new hire while being the senior of the pair, there's that familiar feeling of wanting to grab their keyboard instead of explaining how to build that regex - being able to do more things than you can explain or just having a higher bandwidth pipe into the actual task is a common sign of mastery.
The incrementalists on the other hand, tend to love the new tool as they tend to build 6 different things before picking what works the best, slowly iterating towards what they had in mind in the first place.
I got into this profession simply because I could Ctrl-Z to the previous step much more easily than my then favourite chemical engineering goals. In Chemistry, if you get a step wrong, you go to the start & start over. Plus even when things work, yield is just a pain there (prove it first, then you scale up ingredients etc).
Just from the name of sketch.dev, it appears that this author is of the 'sketch first & refine' model where the new tool just speeds up that loop of infinite refinement.
liotier
> If you are good at doing something, you might find the new tool's output to be sub-par over what you can achieve yourself, but often the lower quality output comes much faster than you can generate. That causes the people who are deliberate & precise about their process to hate the new tool completely
Wow, I've been there ! Years ago we dragged a GIS system kicking and screaming from its nascent era of a dozen ultrasharp dudes with the whole national fiber optics network in their head full of clever optimizations, to three thousand mostly clueless users churning out industrial scale spaghetti... The old hands wanted a dumb fast tool that does their bidding - they hated the slower wizard-assisted handholding, that turned out to be essential to the new population's productivity.
Command line vs. GUI again... Expressivity vs. discoverability, all the choices vs. don't make me think. Know your users !
numpad0
I can't relate to this comment at all. Doesn't feel like what's said in GP either.
IMO, LLMs are super fast predictive input and hallucinatory unzip; files to be decompressed don't have to exist yet, but input has to be extremely deliberate and precise.
You have to have a valid formula that gives the resultant array that don't require no more than 100 IQ to comprehend, and then they unroll it for you into the whole code.
They don't reward trial and error that much. They don't seem to help outsiders like 3D printers did, either. It is indeed a discriminatory tool as in it mistreats amateurs.
And, by the way, it's also increasingly obvious to me that assuming pro-AI posture more than what you would from purely rational and utilitarian standpoint triggers a unique mode of insanity in humans. People seem to contract a lot of negativity doing it. Don't do that.
jprete
This is a good characterization. I'm precision-driven and know what I need to do at any low level. It's the high-level definition that is uncertain. So it doesn't really help to produce a dozen prototypes of an idea and pick one, nor does it help to fill in function definitions.
harrall
I believe it’s more that people hate trying new tools because they’ve already made their choice and made it their identity.
However, there are also people who love everything new and jump onto the latest hype too. They try new things but then immediately advocate it without merit.
Where are the sane people in the middle?
wvenable
> Where are the sane people in the middle?
They are the quiet ones.
dns_snek
As an experienced software developer, I paid for ChatGPT for a couple of months, I trialed Gemini Pro for a couple of months, and I've used the current version of Claude.
I'd be happy if LLMs could produce working code as often and as quickly as the evangelist claim, but whenever I try to use LLM to work on my day to day tasks, I almost always walk away frustrated and disappointed - and most of my work is boring on technical merits, I'm not writing novel comp-sci algorithms or cryptography libraries.
Every time I say this, I'm painted as some luddite who just hates change when the reality is that no, current LLMs are just not fit for many of the purposes they're being evangelized for. I'd love nothing more than to be a 2x developer on my side projects, but it just hasn't happened and it's not for the lack of trying or open mindedness.
edit: I've never actually seen any LLM-driven developers work in real time. Are there any live coding channels that could convince the skeptics what we're missing out on something revolutionary?
evilfred
Middle Ground Fallacy
tikkun
Intersting.
So engineers that like to iterate and explore are more likely to like LLMs.
Whereas engineers that like have a more rigid specific process are more likely to dislike LLMs.
godelski
I frequently iterate and explore when writing code. Code gets written multiple times before being merged. Yet, I still haven't found LLMs to be helpful in that way. The author gives "autocomplete", "search", and "chat-driven programming" as 3 paradigms. I get the most out of search (though a lot of this is due to the decreasing value of Google), autocomplete is pretty weak to me especially as I macro or just use contextual complete, and I've failed miserably at chat-driven programming on every attempt. I spend more time debugging the AI than it would to debug myself. Albeit it __feels__ faster because I'm doing more typing + waiting rather than continuous thinking (but the latter has extra benefits).
erosivesoul
FWIW I find LLMs almost useless for writing novel code. Like it can spit out a serviceable UUID generator when I need it, but try writing something with more than a layer or two of recursion and it gets confused. I turn copilot on for boilerplate and off for solving new problems.
blastfrompust
[dead]
travisporter
> I got into this profession simply because I could Ctrl-Z to the previous step much more easily than my then favourite chemical engineering goals.
That is interesting. Asking as a complete ignoramus - is there not a way to do this now? Like start off with a 100 of reagent and at every step use a bit and discard if wrong
ssivark
But for every step that turns out to be "correct" you now have to go back and redo that in your held-out sample anyways. So it's not like you get to save on repeating the work -- IIUC you just changed it from depth-first execution order to breadth-first execution order.
numpad0
That likely ends up with 100 failed results all attributed to the same set of causes
dboreham
Calculators vs slide rules.
antirez
I have also many years of programming experience and find myself strongly "accelerated" by LLMs when writing code. But, if you think at it, it makes sense that many seasoned programmers are using LLMs better. LLMs are a helpful tool, but also a hard-to-use tool, and in general it's fair to think that better programmers can do a better use of some assistant (human or otherwise): better understanding its strengths, identifying faster the good and bad output, providing better guidance to correct the approach...
Other than that, what correlates more strongly with the ability to use LLMs effectively is, I believe, language skills: the ability to describe problems very clearly. LLMs reply quality changes very significantly with the quality of the prompt. Experienced programmers that can also communicate effectively provide the model with many design hints, details where to focus, ..., basically escaping many local minima immediately.
mhalle
I completely agree that communication skills are critical in extracting useful work or insight from LLMs. The analogy for communicating with people is not far-fetched. Communicating successfully with a specific person requires an understanding of their strengths and weaknesses, their tendencies and blind spots. The same is true for communicating with LLMs.
I have actually found that from a documentation point of view, querying LLMs has made me better and explaining things to people. If, given the documentation for a system or API, a modern LLM can't answer specific questions about how to perform a task, a person using the same documentation will also likely struggle. It's proving to be a good way to test the effectiveness of documentation, for humans and for LLMs.
bsenftner
Communication skills are the keys to using LLMs. Think about it: every type of information you want is in them, in fact it is there multiple times, with multiple levels of seriousness in the treatment of the idea. If one is casual in their request, using casual language, then the LLM will reply with a casual reply because that matched your request best. To get a hard, factual answer from those that are experts in a subject, use the formal term, use the expert's language and you'll get back a rely more likely to be correct because it's in the same level of formal treatment as correct answers.
psychoslave
>every type of information you want is in them
Actually, I'm afraid that no. It won't give us the step by step scalable processes to make humanity as a whole enter in a loop of indefinitely long period of world peace, with each of us enjoying life in its own thriving manner. That would be great information to broadcast, though.
Also it equally has ability to produce large pile of completely delusional answers, that mimics just as well genuinely sincere statements. Of course, we can also receive that kind of misguiding answers from humans. But the amount of output that mere humans can throw out in such a form is far more limited.
All that said, it's great to be able to experiment with it, and there are a lot of nice and fun things to do with it. It can be a great additional tool, but it won't be a self-sufficient panacea of information source.
gen220
Hey! Asking because I know you're a fellow vimmer [0]. Have you integrated LLMs into your editor/shell? Or are you largely copy-pasting context between a browser and vim? This context-switching of it all has been a slight hang-up for me in adopting LLMs. Or are you asking more strategic questions where copy-paste is less relevant?
[0] your videos on writing systems software were part of what inspired me to make a committed switch into vim. thank you for those!
rudiksz
> "seasoned programmers are using LLMs better".
I do not remember a single instance when code provided to me by an LLM worked at all. Even if I ask something small that cand be done in 4-5 lines of code is always broken.
From a fellow "seasoned" programmer to another: how the hell do you write the prompts to get back correct working code?
HappMacDonald
I'd ask things like "which LLM are you using", and "what language or APIs are you asking it to write for".
For the standard answers of "GPT-4 or above", "claude sonnet or haiku", or models of similar power and well known languages like Python, Javascript, Java, or C and assuming no particularly niche or unheard of APIs or project contexts the failure rate of 4-5 line of code scripts in my experience is less than 1%.
mordymoop
I write the prompt as if I’m writing an email to a subordinate that clearly specifies what the code needs to do.
If what I’m requesting an improvement to an existing code, I paste the whole code if practical, or if not, as much of the code as possible, as context before making request for additional functionality.
Often these days I add something like “preserve all currently existing functionality.” Weirdly, as the models have gotten smarter, they have also gotten more prone to delete stuff they view as unnecessary to the task at hand.
If what I’m doing is complex (a subjective judgement) I ask it to lay out a plan for the intended code before starting, giving me a chance to give it a thumbs up or clarify its understanding of what I’m asking for if it’s plan is off base.
numpad0
dc: not a seasoned dev, with <b> and <h1> tags on "not".
They can't think for you. All intelligent thinking you have to do.
First, give them high level requirement that can be clarified into indented bullet points that looks like code. Or give them such list directly. Don't give them half-open questions usually favored by talented and autonomous individuals.
Then let them further decompress that pseudocode bullet points into code. They'll give you back code that resemble a digitized paper test answer. Fix obvious errors and you get a B grade compiling code.
They can't do non-conventional structures, Quake style performance optimized codes, realtime robotics, cooperative multithreading, etc., just good old it takes what it takes GUI app API and data manipulation codes.
For those use cases with these points in mind, it's a lot faster to let LLM generate tokens than typing `int this_mandatory_function_does_obvious (obvious *obvious){ ...` manually on a keyboard. That should arguably be a productivity boost in the sense that the user of LLM is effectively typing faster.
antirez
Check my YouTube channel if you have a few minutes. I just published a video about adding a complex feature (UTF-8) to the Kilo editor, using Claude.
jkaptur
The story from the article matches my experience. The LLM's first answer is often a little broken, so I tweak it until it's actually correct.
wvenable
I rarely get back not working code but I've also internalized it's limitations so I no longer ask it for things it's not going to be able to do.
As other commenters have pointed it, there also a lot of variation between different models and some are quite dumb.
I've had no issues with 10-20 line coding problems. I've also had it built a lot of complete shell scripts and had no problem there either.
kragen
That's really interesting. What are the most important things you've learned to do with the LLMs to get better results? What do your problem descriptions look like? Are you going back and forth many times, or crafting an especially-high-quality initial prompt?
antirez
I'm posting a set of videos on my YT channel where I'll show the process I follow. Thanks!
LouisSayers
> the ability to describe problems very clearly
Yes, and to provide enough context.
There's probably a lot that experience is contributing to the interaction as well, for example - knowing when the LLM has gone too far, focusing on what's important vs irrelevant to the task, modularising and refactoring code, testing etc
ignoramous
> [David, Former staff engineer at Google ... CTO of Tailscale,] doesn't need LLMs. That he says LLMs make him more productive at all as a hands-on developer, especially around first drafts on a new idea, means a lot to me...
Don't doubt for a second the pedigree of founding engs at Tailscale, but David is careful to point out exactly why LLMs work for them (but might not for others):
I am doing a particular kind of programming, product development, which could be roughly described as trying to bring programs to a user through a robust interface. That means I am building a lot, throwing away a lot, and bouncing around between environments. Some days I mostly write typescript, some days mostly Go. I spent a week in a C++ codebase last month exploring an idea, and just had an opportunity to learn the HTTP server-side events format. I am all over the place, constantly forgetting and relearning.
If you spend more time proving your optimization of a cryptographic algorithm is not vulnerable to timing attacks than you do writing the code, I don't think any of my observations here are going to be useful to you.
big_youth
> If you spend more time proving your optimization of a cryptographic algorithm is not vulnerable to timing attacks than you do writing the code, I don't think any of my observations here are going to be useful to you.
I am not a software dev I am a security researcher. LLM's are great for my security research! It is so much easier and faster to iterate on code like fuzzers to do security testing. Writing code to do a padding oracle attack would have taken me a week+ in the past. Now I can work with an LLM to write code and learn and break within the day.
It has accelerated my security research 10 fold, just because I am able to write code and parse and interpret logs at a level above what I was able to a few years ago.
pplonski86
I'm in similar situations, I jump between many environments, mainly between Python and Typescript, however, currently testing a new idea of learning algorithm in C++, and I simply don't always remember all syntax. I was very skeptical about LLMs at first. Now, I'm using LLMs daily. I can focus more on thinking rather than searching stackoverflow. Very often I just need simple function, that it is much faster to create with chat.
JKCalhoun
And if anyone remembers: before Stack Overflow you more or less had to specialize in a domain, become good using a handful of frameworks/API, on one platform. Learning a new language, a new API (god forbid a new platform) was to sail, months long, into seas unknown.
In this regard, with first Stack Overflow and now LLMs, the field has improved mightily.
greenyouse
That approach sounds similar to the Idris programming language with Type Driven Development. It starts by planning out the program structure with types and function signatures. Then the function implementation (aka holes) can be filled in after the function signatures and types are set.
I feel like this is a great approach for LLM assisted programming because things like types, function signatures, pre/post conditions, etc. give more clarity and guidance to the LLM. The more constraints that the LLM has to operate under, the less likely it is to get off track and be inconsistent.
I've taken a shot at doing some little projects for fun with this style of programming in TypeScript and it works pretty well. The programs are written in layers with the domain design, types, schema, and function contracts being figured out first (optionally with some LLM help). Then the function implementations can be figured out towards the end.
It might be fun to try Effect-TS for ADTs + contracts + compile time type validation. It seems like that locks down a lot of the details so it might be good for LLMs. It's fun to play around with different techniques and see what works!
lysecret
100% this is what I do in python too!
brabel
I am not a genius but have a couple of decades experience and finally started using LLMs in anger in the last few weeks. I have to admit that when my free quota from GitHub Copilot ran out (I had already run out of Jetbrains AI as well!! Our company will start paying for some service as the trials have been very successful), I had a slight bad feeling as my experience was very similar to OP: it's really useful to get me started, and I can finish it much more easily from what the AI gives me than if I started from scratch. Sometimes it just fills in boilerplate, other times it actually tells me which functions to call on an unfamiliar API. And it turns out it's really good at generating tests, so it makes my testing more comprehensive as it's so much faster to just write them out (and refine a bit usually by hand). The chat almost completely replaced my StackOverflow queries, which saves me much time and anxiety (God forbid I have to ask something on SO as that's a time sink: if I just quickly type out something I am just asking to be obliterated by the "helpful" SO moderators... with the AI, I just barely type anything at all, leave it with typos and all, the AI still gets me!).
EagnaIonat
Have you tried using Ollama? You can download and run an LLM locally on your machine.
You can also pick the right model for the right need and it's free.
brabel
Yes. If the AI is not integrated with the IDE, it's not as helpful. If there were an IDE plugin that let you use a local model, perhaps that would be an option, but I haven't seen that (Github Copilot allows selecting different models, but I didn't check more carefully whether that also includes a local one, anyone knows?).
mentos
I’m using ChatGPT4o to convert a C# project to C++. Any recommendation on what Ollama model I could use instead?
devjab
I'm genuinely curious but what did you use StackOverflow for before? With a couple of decades in the industry I can't remember when the last time I "Google programmed" anything was. I always go directly to the documentation for whatever it is I'm working for, because where else would I find out how it actually works? It's not like I haven't "Google programmed" when I was younger, but it's just such a slow process based on trusting strangers on the internet that it never really made much sense once I started knowing what I was doing. I sort of view LLM's in a similar manner. Why would you go to them rather than the actual documentation? I realize this might sound arrogant or rude, and I really hope you believe me when I say that I don't mean it like this. The reason I'm curious is because we're really struggling getting junior developers to not look, everywhere, but the documentation first. Which means they often actually don't know how what they build works. Which can be an issue when they load every object of a list into memory isntead of using a generator...
As far as using LLMs in anger I would really advice anyone to use them. GitHub copilot hasn't been very useful for me personally, but I get a lot of value out of running my thought process by a LLM. I think better when I "think out loud" and that is obviously challenging when everyone is busy. Running my ideas by an LLM helps me process them in a similar (if not better) fashion, often it won't even really matter what the LLM conjures up because simply describing what I want to do often gives me new ideas, like "thinking out loud".
As far as coding goes. I find it extremely useful to have LLMs write cli scripts to auto-generate code. The code the LLM will produce is going to be absolute shite, but that doesn't matter if the output is perfectly fine. It's reduced my personal reliance on third party tools by quite a lot. Because why would I need a code generator for something (and in that process trust a bunch of 3rd party libraries) when I can have a LLM write a similar tool in half an hour?
brabel
I believe you don't mean to be rude, but you just sound completely naive to me. To think that documentation includes everything is just, like, have you actually been coding anything at all that goes just slightly off the happy path? Example from yesterday: I have a modular JavaFX application (i.e. it uses Java JMS modules, not just Maven/Gradle modules). I introduced a call to `url()` in JavaFX CSS. That works when running using the classpath, but not when using the module path. I spent half an hour reading docs to see what they say about modular applications. They didn't mention anything at all. Specially because in my case, I was not just doing `getClass().getResource`... I was using the CSS directive to load a resource from the jar. This is exactly when I would likely go on SO and ask if anyone had seen this before. It used to be highly likely someone who's an expert on JavaFX would see and answer my question, sometimes even people who directly worked on JavaFX!
StackOverflow was not really meant for juniors, as juniors usually can indeed find answers on documentation, normally. It was, like ExpertsExchange before it, a place for veterans to exchange tribal knowledge like this. If you think only juniors use SO, you seem to have arrived at the scene just yesterday and just don't know what you're talking about.
wiseowise
> Why would you go to them rather than the actual documentation?
Not every documentation is made equal. For example: Android docs are royal shit. They cover some basic things, e.g. show a button, but good look finding esoteric Bluetooth information or package management, etc. Most of it is a mix of experimentation and historical knowledge (baggage).
Vox_Leone
I have been using LLM to generate functional code from *pseudo-code* with excellent results. I am starting to experiment with UML diagrams, both with LLM and computer vision to actually generate code from UML diagrams; for example a simple activity diagram could be the prompt on LLM 's, and might look like:
Start -> Enter Credentials -> Validate -> [Valid] -> Welcome Message -> [Invalid] -> Error Message
Corresponding Code (Python Example):
class LoginSystem:
def validate_credentials(self, username, password):
if username == "admin" and password == "password":
return True
return False
def login(self, username, password):
if self.validate_credentials(username, password):
return "Welcome!"
else:
return "Invalid credentials, please try again."
*Edited for clarityjonvk
This example illustrates one of the risks of using LLMs without subject expertise though. I just tested this with claude and got that exact same validation method back. Using string comparison is dangerous from a security perspective [1], so this is essentially unsafe validation, and there was no warning in the response about this.
1. https://sqreen.github.io/DevelopersSecurityBestPractices/tim...
jpc0
Are you talking about the timing based attacks on that website which fails miserably at rendering a useable page on mobile?
jpc0
Could you add to the prompt that the password is stored in an sqlite database using argon2 for encryption, the encryption parameters are stored as environment variables.
You would like it to avoid timing based attacks as well as dos attacks.
It should also generate the functions as pure functions so that state is passed in and passed out and no side effects(printing to the console) happen within the function.
Then also confirm for me that it has handled all error cases that might reasonably happen.
While you are doing that, just think about how much implicit knowledge I just had to type into the comment here and that is still ignoring a ton of other knowledge that needs to be considered like whether that password was salted before being stored. All the error conditions for the sqlite implementation in python, the argon2 implementation in the library.
TLDR: that code is useless and would have taken me the same amount of time to write as your prompt.
HarHarVeryFunny
> His post reminds me of an old idea I had of a language where all you wrote was function signatures and high-level control flow
Regardless of language, that's basically how you approach the design of a new large project - top down architecture first, then split the implementation into modules, design the major data types, write function signatures. By the time you are done what is left is basically the grunt work of implementing it all, which is the part that LLMs should be decent at, especially if the functions/methods are documented to level (input/output assertions as well as functionality) where it can also write good unit tests for them.
dingnuts
> the grunt work of implementing it all
you mean the fun part. I can really empathize with digital artists. I spent twenty years honing my ability to write code and love every minute of it and you're telling me that in a few years all that's going to be left is PM syncs and OKRs and then telling the bot what to write
if I'm lucky to have a job at all
HarHarVeryFunny
I think it depends on the size of the project. To me, the real fun of being a developer is the magic of being able to conceive of something and then conjure it up out of thin air - to go from an idea to reality. For a larger more complex project the major effort in doing this is the solution conception, top-down design (architecture), and design of data structures and component interfaces... The actual implementation (coding), test cases and debugging, then does become more like drudgework, not the most creative or demanding part of the project, other than the occasional need for some algorithmic creativity.
Back in the day (I've been a developer for ~45 years!) it was a bit different as hardware constraints (slow 8-bit processors with limited memory) made algorithmic and code efficiency always a primary concern, and that aspect was certainly fun and satisfying, and much more a part of the overall effort than it is today.
dekhn
I think what you're describing is basically "interface driven development" and "test driven development" taken to the extreme: where the formal specification of an implementation is defined by the test suite. I suppose a cynic would say that's what you get if you left an AI alone in a room with Hyrum's Law.
highfrequency
> A lot of the value I personally get out of chat-driven programming is I reach a point in the day when I know what needs to be written, I can describe it, but I don’t have the energy to create a new file, start typing, then start looking up the libraries I need... LLMs perform that service for me in programming. They give me a first draft, with some good ideas, with several of the dependencies I need, and often some mistakes. Often, I find fixing those mistakes is a lot easier than starting from scratch.
This to me is the biggest advantage of LLMs. They dramatically reduce the activation energy of doing something you are unfamiliar with. Much in the way that you're a lot more likely to try kitesurfing if you are at the beach standing next to a kitesurfing instructor.
While LLMs may not yet have human-level depth, it's clear that they already have vastly superhuman breadth. You can argue about the current level of expertise (does it have undergrad knowledge in every field? PhD level knowledge in every field?) but you can't argue about the breadth of fields, nor that the level of expertise improves every year.
My guess is that the programmers who find LLMs useful are people who do a lot of different kinds of programming every week (and thus are constantly going from incompetent to competent in things that other people already know), rather than domain experts who do the same kind of narrow and specialized work every day.
otteromkram
I think your biggest takeaway should be that they person writing the blog post is extremely well-known versed in programming and has labored over code for hours, along with writing tests, debugging, etc. He knows what he would like because it's second nature. He was able to get the best from the LLM because his vision of what the code should look like helped craft a solid prompt.
Newer people into programming might not have as good of a time because they may skip actually learning something fundamentals and rely on LLMs as a crutch. Nothing wrong with that, I suppose, but there might be at some point when everything goes up in smoke and the LLM is out of answers.
No amount of italic font is going to change that.
highfrequency
My experience is opposite - I get the most value out of LLMs for topics that I have less expertise in. It’s become vastly easier up to speed in a new field because you can immediately answer basic questions, have the holes in your understanding pointed out, and be directed to the concepts you are missing.
mlepath
The first rule of programming with LLMs is don't use them for anything you don't know how to do. If you can look at the solution and immediately know what's wrong with it, they are a time saver otherwise...
I find chat for search is really helpful (as the article states)
IanCal
That seems like a wild restriction.
You can give them more latitude for things you know how to check.
I didn't know how to setup the right gnarly typescript generic type to solve my problem but I could easily verify it's correct.
kccqzy
If you merely know how to check, would you also know how to fix it after you find that it's wrong?
If you are lucky to have the LLM fix it for you, great. If you don't know how to fix it yourself and the LLM doesn't either, you've just wasted a lot of time.
IanCal
It did fix it, I iterated passing in the type and linter errors until it passed all the requirements I had.
> If you merely know how to check, would you also know how to fix it after you find that it's wrong?
Probably? I'm capable of reading documentation, learning and asking others.
> If you don't know how to fix it yourself and the LLM doesn't either, you've just wasted a lot of time.
You may be surprised by how little time, but regardless it would have taken more time to hit that point without the tool.
Also sometimes things don't work out, that's OK. As long as overall it improves work, that's all we need.
fastball
If you don't understand what the generic is doing, there might be edge-cases you don't appreciate. I think Typescript types are fairly non-essential so it doesn't really matter, but for more important business logic it definitely can make a difference.
IanCal
I understand what it's doing, and could easily set out the cases I needed.
billmcneale
That's the wrong approach.
I use chat for things I don't know how to do all the time. I might not know how to do it, but I sure know how to test that what I'm being told is correct. And as long as it's not, I iterate with the chat bot.
WhiteNoiz3
A better way to phrase it might be don't use it for something that you aren't able to verify or validate.
sdesol
I agree with this. I keep harping on this, but we are sold automation instead of a power tool. If you have domain knowledge in the problem that you are solving, then LLMs can become an extremely valuable aid.
zmmmmm
I think it's just a broader definition of "know how to do". If you can write a test for it then I'm going to argue you know "how" to do it in a bigger picture sense. As in, you understand the requirements and inherent underlying technical challenges behind what you are asking to be done.
The issue is, there are always subtle aspects to problems that most developers only know by instinct. Like, "how is it doing the unicode conversion here" or "what about the case when the buffer is exactly the same size as the message, is there room for the terminating character?". You need the instincts for these to properly construct tests and review the code it did. If you do have those instincts, I argue you could write the code, it's just a lot of effort. But if you don't, I will argue you can't test it either and can't use LLMs to produce (at least) professional level code.
bityard
I feel like that's a good option ONLY if the code you are writing will never be deployed to an environment where security is a concern. Many security bugs in code are notoriously difficult to spot and even frequently slip through reviews from humans who are actively looking for exactly those kinds of bugs.
I suppose we could ask the question: Are LLMs better at writing secure code than humans? I'll admit I don't know the answer to that, but given what we know so far, I seriously doubt it.
photon_collider
"Trust but verify" is still useful especially when you ask LLMs to do stuff you don't know. I've used LLMs to help me get started on tasks where I wasn't even sure of what a solution was. I would then inspect the code and review any relevant documentation to see if the proposed solution would work. This has been time consuming but I've learned a lot regardless.
itsgrimetime
IMO this is a bad take. I use LLMs for things I don’t know how to do myself all the time. Now, I wouldn’t use one to write some new crypto functions because the risk associated with getting it wrong is huge, but if I need to write something like a wrapper around some cloud provider SDK that I’m unfamiliar with, it gets me 90% of the way there. It also is way more likely to know at least _some_ of the best practices where I’ll likely know none. Even for more complex things getting some working hello world examples from an LLM gives me way more threads to pull on and research than web searching ever has.
zmmmmm
> write something like a wrapper around some cloud provider SDK that I’m unfamiliar with
you're equating "unfamliar" with "don't know how to do" but I will claim you do know how to do it, you would just be slow because you have to reference documentation and learn which functions do what.
Retr0id
> if I need to write something like a wrapper around some cloud provider SDK that I’m unfamiliar with
But "writing a wrapper" is (presumably) a process you're familiar with, you can tell if it's going off the rails.
joemazerino
Writing a wrapper is easier to verify because of the context of the API or SDK you're wrapping. Seems wrong? Check the docs. Doesn't work? Curl it yourself.
Barrin92
>It also is way more likely to know at least _some_ of the best practices
What's way more likely to know the best practices is the documentation. A few months ago there was a post that made the rounds about how the Arc browser introduced a really severe security flaw by misconfiguring their Firebase ACLs despite the fact that the correct way to configure them is outlined in the docs.
This to me is the sort of thing (although maybe not necessarily in this case) out of LLM programming. 90% isn't good enough, it's the same as Stackoverflow pasting. If you're a serious engineer and you are unsure about something, it is your task to go to the reference material, or you're at some point introducing bugs like this.
In our profession it's not just crypto libraries, one misconfigured line in a yaml file can mean causing millions of dollars of damage or leaking people's most private information. That can't be tackled with a black box chatbot that may or may not be accurate.
tnvmadhav
I'd like to rephrase as, "don't deploy LLM generated code if you don't know how it works (or what it does)"
This means, it's okay to use LLM to try something new that you're on the fence about. Learn it and then once you've learned that concept or the idea, you can go ahead to use same code if it's good enough.
JKCalhoun
"don't deploy ̶L̶L̶M̶ ̶g̶e̶n̶e̶r̶a̶t̶e̶d̶ code if you don't know how it works (or what it does)"
(Which goes for StackOverflow, etc.)
switchbak
I've seen a whole flurry of reverts due to exactly this. I've also dabbled in trusting it a little too much, and had the expected pain.
I'm still learning where it's usable and where I'm over-reaching. At present I'm at about break-even on time spent, which bodes well for the next few years as they iron out some of the more obvious issues.
ignoramous
> ... don't use them for anything you don't know how to do ... I find chat for search is really helpful (as the article states)
Not really. I often use Chat to understand codebases. Instead trying to navigate mature, large-ish FOSS projects (like say, the Android Run Time) by looking at it file by file, method by method, field by field (all to laborious), I just ask ... Copilot. It is way, way faster than I and are mostly directionally correct with its answers.
qianli_cs
Exactly, you have to (vaguely) know what you’re looking for and have some basic ideas of what algorithms would work. AI is good at helping with syntax stuff but not really good at thinking.
j45
You can ask the LLM to teach it to you step by step, and then you can validate it by doing it as well as you go, still quicker than learning it and not knowing how to debug it.
Learning how something works is critical or it's far worse than technical debt.
Balgair
I'm not a 'programmer'. At best, I'm a hacker, at best. I don't work in a team. All my code is mostly one time usage to just get some little thing done, sometimes a bit of personal stuff too. I mostly use Excel anyways, and then python, and even then, I hate python because half the time I'm just dealing with library issues (not a joke, I measured it (and, no, I'm not learning another language, but thank you)). I'm in biotech, a very non code-y section of it too.
LLMs are just a life saver. Literally.
They take my code time down from weeks to an afternoon, sometimes less. Any they're kind.
I'm trying to write a baseball simulator on my own, as a stretch goal. I'm writing my own functions now, a step up for me. The code is to take in real stats, do Monte Carlo, get results. Basic stuff. Such a task was impossible for me before LLMs. I've tried it a few times. No go. Now with LLMs, I've got the skeleton working and should be good to go before opening day. I'm hoping that I can use it for some novels that I am writing to get more realistic stats (don't ask).
I know a lot of HN is very dismissive of LLMs as code help. But to me, a non programmer, they've opened it up. I can do things I never imagined that I could. Is it prod ready? Hell no, please God no. But is it good enough for me to putz with and get just working? Absolutely.
I've downloaded a bunch of free ones from huggingface and Meta just to be sure they can't take them away from me. I'm never going back to that frustration, that 'Why can't I just be not so stupid?', that self-hating, that darkness. They have liberated me.
wdutch
I no longer work in tech, but I still write simple applications to make my work life easier.
I frequently use what OP refers to as chat-driven programming, and I find it incredibly useful. My process starts by explaining a minimum viable product to the chat, which then generates the code for me. Sometimes, the code requires a bit of manual tweaking, but it’s usually a solid starting point. From there, I describe each new feature I want to add—often pasting in specific functions for the chat to modify or expand.
This approach significantly boosts what I can get done in one coding session. I can take an idea and turn it into something functional on the same day. It allows me to quickly test all my ideas, and if one doesn’t help as expected, I haven’t wasted much time or effort.
The biggest downside, however, is the rapid accumulation of technical debt. The code can get messy quickly. There's often a lot of redundancy and after a few iterations it can be quite daunting to modify.
prettyblocks
I have a similar approach, but the mess can be contained by asking for optimizations and refactors very frequently and only asking for very granular features.
chii
> The code can get messy quickly. There's often a lot of redundancy and after a few iterations it can be quite daunting to modify.
i forsee in the future an LLM that has sufficient context length for (automatic) refactoring and tech debt removal, by pasting large portions of these existing code in.
scarface_74
Even without LLMs, at least with statically type languages like C#, ReSharper can do solution wide refactoring that are guaranteed correct as long as you don’t use reflection.
https://www.jetbrains.com/help/resharper/Refactorings__Index...
I don’t see any reason it couldn’t do more aggressive refactors with LLMs and either correct itself or don’t do the refactor if it fails static code checking. Visual Studio can already do real time type checking for compile time errors
Aeolun
Cursor has recently added something like this ‘Bug Finder’. It told me that finding bugs on my entire codebase would cost me $21 or so, so I never actually tried, but it sounds cool.
trash_cat
> The biggest downside, however, is the rapid accumulation of technical debt. The code can get messy quickly. There's often a lot of redundancy and after a few iterations it can be quite daunting to modify.
What stops you from using o1 or sonnet to refactor everything? It sounds like a typical LLM task.
SkyBelow
>The biggest downside, however, is the rapid accumulation of technical debt.
Is that really related to the LLM?
Even in pre-LLM times, anytime I've scrapped together some code to solve some small immediate problem it grows tech debt at an amazing rate. Getting a feel for when a piece of code is going to be around long enough that it needs to be refactored, cleaned up, documented, etc. is a skill I developed over time. Even now it isn't a prefect guess, as there is an ongoing tug of war between wasting time today refactoring something I might not touch again with wasting time tomorrow having to pick up something I didn't clean up.
nunez
I definitely respect David's opinion given his caliber, but pieces like this make me feel strange that I just don't have a burning desire to use them.
Like, yesterday I made some light changes to a containerized VPN proxy that I maintain. My first thought wasn't "how would Claude do this?" Same thing with an API I made a few weeks ago that scrapes a flight data website to summarize flights in JSON form.
I knew I would need to write some boilerplate and that I'd have to visit SO for some stuff, but asking Claude or o1 to write the tests or boilerplate for me wasn't something I wanted or needed to do. I guess it makes me slower, sure, but I actually enjoy the process of making the software end to end.
Then again, I do all of my programming on Vim and, technically, writing software isn't my day job (I'm in pre-sales, so, best case, I'm writing POC stuff). Perhaps I'd feel differently if I were doing this day in, day out. (Interestingly, I feel the same way about AI in this sense that I do about VSCode. I've used it; I know what's it capable of; I have no interest in it at all.)
The closest I got to "I'll use LLMs for something real" was using it in my backend app that tracks all of my expenses to parse pictures of receipts. Theoretically, this will save me 30 seconds per scan, as I won't need to add all of the transaction metadata myself. Realistically, this would (a) make my review process slower, as LLMs are not yet capable of saying "I'm not sure" and I'd have to manually check each transaction at review time, (b) make my submit API endpoint slower since it takes relatively-forever for it to analyze images (or at least it did when I experimented with this on GPT4-turbo last year), and (c) drive my costs way up (this service costs almost nothing to run, as I run it within Lambda's free tier limit).
ge96
I'm an avg dev, I was never into LLMs/co-pilot etc mocking prompt engineering but... my current job is working with an LLM framework so idk... future proofs me I guess. I do like computer vision and ML on dataset eg. training hand writing IMU by gestures that's cool.
The embeddings I feel like there is something there even if it doesn't actually understand. My journey has just begun.
I scoff every time someone says "this + AI". AI is this thing they just throw in there. Last time I didn't want to work with some tech I quit my job was not a good move not being financially independent. Anyway yeah I'll keep digging into this. I still don't use co-pilot right now but I'm reading up more on the embedding stuff for cross training or some case like RAG.
uludag
I think there's a big selection bias on hackernews that you wouldn't get elsewhere. There's still "elite" software developers I see who really aren't into the whole LLM tooling space. I found use in the autocomplete and search workflows that the author mentioned but I stopped using these tools, out of curiosity for things were before. It turns out I don't need it to be productive and I too probably enjoy working more without it.
null
nemothekid
I think "Chat driven programming" is the most common type of the most hyped LLM-based programming I see on twitter that I just can't relate to. I've incorporated LLMs mainly as auto-complete and search; asking ChatGPT to write a quick script or to scaffold some code for which the documentation is too esoteric to parse.
But having the LLM do things for me, I frequently run into issues where it feels like I'm wasting my time with an intern. "Chat-based LLMs do best with exam-style questions" really speaks to me, however I find that constructing my prompts in such a way where the LLM does what I want uses just as much brainpower as just programming the thing my self.
I do find ChatGPT (o1 especially) really good at optimizing existing code.
tptacek
There's an art to cost-effectively coaxing useful answers (useful drafts of code) from an LLM, and there's an art to noticing the most productive questions to put to that process. It's a totally different way of programming than having an LLM looking over your shoulder while you direct, function by function, type by type, the code you're designing.
If you feel like you're wasting your time, my bet is that you're either picking problems where there isn't enough value to negotiate with the LLM, or your expectations are too high. Crawshaw mentions this in his post: a lot of the value of this chat-driven style is that it very quickly gets you unstuck on a problem. Once you get to that point, you take over! You don't convince the LLM to build the final version you actually commit to your branch.
Generating unit test cases --- in particular, generating unit test cases that reconcile against unsophisticated, brute-force, easily-validated reference implementations of algorithms --- are a perfect example of where that cost/benefit can come out nicely.
sibeliuss
My technique is to feed it a series of intro questions that prepare it for the final task. Chat the thing into a proper comfort level, and then from there, with the context at hand, ask to help solve the real problem. Def feels like a new kind of programming model because its still very programming-esque.
throwup238
> "Chat-based LLMs do best with exam-style questions" really speaks to me, however I find that constructing my prompts in such a way where the LLM does what I want uses just as much brainpower as just programming the thing my self.
It speaks to me too because my mechanical writing style (as opposed to creative prose) could best be described as what I learned in high school AP English/Literature and the rest of the California education system. For whatever reason that writing style dominated the training data and LLMs just happens to be easy to use because I came out of the same education system as many of the people working at OpenAI/Anthropic.
I’ve had to stop using several generic turns of phrase like “in conclusion” because it made my writing look too much like ChatGPT.
Aeolun
I’ve found that everything just works (more or less) since switching to Cursor. Agent based composer mode is magical. Just give it a few files for context, and ask it to do what you want.
AlotOfReading
It's interesting that you find it useful for optimization. I've found that they're barely capable of anything more than shallow optimization in my stuff without significant direction.
What I find useful is that I can keep thinking at one abstraction level without hopping back and forth between algorithm and codegen. The chat is also a written artifact I can use the faster language parts of my brain on instead of the slower abstract thought parts.
LouisSayers
The use of LLMs reminds me a bit of how people use search engines.
Some years ago I gave a task to some of my younger (but intelligent) coworkers.
They spent about 50 minutes searching in google and came back to me saying they couldn't find what they were looking for.
I then typed in a query, clicked one of the first search results and BAM! - there was the information they were unable to find.
What was the difference? It was the keywords / phrases we were using.
notjoemama
Our company has a no AI use policy. The assumption is zero trust. We simply can’t know whether a model or its framework could or would send proprietary code outside the network. So it’s best to assume all LLMs/AI is or will send code or fragments of code. While I applaud the incredible work by their creators, I’m not sure how a responsible enterprise class company could rely on “trust us bro” EULAs or repo readmes.
codebje
The same way responsible enterprise class companies rely on "trust us bro" EULAs for financial systems, customer databases, payroll, and all the other systems it would be very expensive and error prone to build custom for every business.
ryanobjc
Pretty much this.
OpenAI poisoned the well badly with their "we train off your chats" nonsense.
If you are using any API service, or any enterprise ChatGPT plan, your tokens are not being logged and recycled into new training data.
As for why trust them? Like the parent said: EULAs. Large companies trust EULAs and terms of service for every single SAAS product they use, and they use tons and tons of them.
OpenAI in a clumsy attempt to create a regulatory moat by doing sketchy shit and waving wild "AI will kill us all" nonsense has created a situation where the usefullness of these transforming generative solutions are automatically rejected by many.
BBosco
The vast majority of fortune 500’s have legal frameworks up for dealing with internal AI use already because the reality is employees are going to use it regardless of internal policy. Assuming every employee will act in good faith just because a blanket AI ban is in place is extremely optimistic at best, and isn’t a good substitute for actual understanding.
sulam
Internal policies at these companies are rarely subject to a level of faith that you're implying. Instead external access to systems is logged, internal systems are often sandboxed or otherwise constrained in how you interact with them, and anything that looks like exfiltration sets off enough alarms to have your manager talking to you that same day, if not that same hour.
pama
Your company could locally host LLMs; you wont get chatGPT or Claude quality, but you can get something that would have been SOTA a year ago. You can vet the public inference codebases (they are only of moderate complexity), and you control your own firewalls.
Kostchei
You can get standalone/isolated versions of chatGPT, if your org is large enough, in partnership with OpenAI. And others. They run on the same infra but in accounts you set up, cost the same, but you have visibility on the compute, and control of data exfil - ie is there is none.
null
CubsFan1060
You can run Claude on both AWS and Google Cloud. I’m fairly certain they don’t share data, but would need to verify to be sure.
evilduck
You can also run Llama 405B and the latest (huge) DeepSeek on your own hardware and get LLMs that trade blows with Claude and ChatGPT, while being fully isolated and offline if needed.
attentive
So, you're asking how enterprise class companies are using github for repos and gmail for all the enterprise mail? What's next, zoom/teams for meetings?
cudgy
Does your company develop software overseas where legal action is difficult? Or where their ip could be nationalized or secretly stolen? Where network communications are monitored and saved?
lm28469
> I’m not sure how a responsible enterprise class company could rely on “trust us bro” EULAs or repo readmes.
Isn't that what we do with operating systems, internet providers, &c. ?
aulin
How is that related? we're talking of continuously sending proprietary code and related IP to a third party, seems a pretty valid concern to me.
I, for one, work every day with plenty of proprietary vendor code under very restrictive NDAs. I don't think they would be very happy knowing I let AIs crawl our whole code base and send it to remote language models just to have fancy autocompletion.
lm28469
Do you read every single line of code of every single dependency you have ? I don't see how llms are more of a threat than a random compromised npm package or something from a OS package manager. Chances are you're already relying on tons and tons of "trust me bro" and "it's opensource bro don't worry, just read the code if you feel like it"
ryanobjc
"Continuously sending proprietary code and related IP to a third party"
Isn't this... github?
Companies and people are doing this all day every day. LLM APIs are really no different. Only when you magic it up as "the AI is doing thinking" ... but in reality text -> tokens -> math -> tokens -> text. It's a transformation of numbers into other numbers.
The EULAs and ToS say they don't log or retain information from API requests. This is really no different than Google Drive, Atlassian Cloud, Github, and any number of online services that people store valuable IP and proprietary business and code in.
bongodongobob
Ok, the LLM crawls your code. Then what? What is the exfiltration scenario?
mbesto
> proprietary code outside the network
Thought exercise: what would seriously happen if you did let some of your proprietary code outside your network? Oddly enough, 75% of the people writing code on HN probably have their companies code stored in GitHub. So there already is an inherent trust factor with GH/MSFT.
As another anecdote - Twitch's source code got leaked a few years back. Did Twitch lose business because of it?
switchbak
The other consideration: your company's code probably just isn't that good.
I think many people over-value this giant pile of text. That's not to say IP theft doesn't exist, but I think the actual risk is often overblown. Most of an organization's value is in the team's collective knowledge and teamwork ability, not in the source code.
aulin
> Thought exercise: what would seriously happen if you did let some of your proprietary code outside your network
Lawsuits? Lawful terminations? Financial damages?
mbesto
Huh? No, i'm saying, what potential damage does an organization have? Not the individual who may leak data outside your network.
janalsncm
You can run pretty decent models on your laptop these days. Works in airplane mode.
Ozzie_osman
One mode I felt was missed was "thought partner", especially while debugging (aka rubber ducking).
We had an issue recently with a task queue seemingly randomly stalling. We were able to arrive at the root cause much more quickly than we would have because of a back-and-forth brainstorming session with Claude, which involved describing the issue we were seeing, pasting in code from library to ask questions, asking it to write some code to add some missing telemetry, and then probing it for ideas on what might be going wrong. An issue that may have taken days to debug took about an hour to identify.
Think of it as rubber ducking with a very strong generalist engineer who knows about basically any technical concepts.
vendiddy
I found myself doing this with o1 recently for software architecture.
I will evaluate design ideas with the model, express concerns on trade-offs, ask for alternative ideas, etc.
Some of the benefit is having someone to talk to, but with proper framing it is surprisingly good at giving balanced takes.
mmahemoff
The new video and screen-share capabilities in ChatGPT and Gemini should make rubber-ducking smoother.
I feel like I've worn out my computer’s clipboard and alt-tab keys at this stage of the LLM experience.
fragmede
You may want to try any of the tools that can write to the filesystem so you're at least not copy pasting code from a chat window. CoPilot, Cursor, Aider, Tabnine, etc.
bangaladore
The killer feature about LLMs with programming in my opinion is autocomplete (the simple copilot feature). I can probably be 2-3x more productive as I'm not typing (or thinking much). It does a fairly good job pulling in nearby context to help it. And that's even without a language server.
Using it to generate blocks of code in a chat like manner in my opinion just never works well enough in the domains I use it on. I'll try to get it to generate something and then realize when I get some functional result I could've done it faster and more effectively.
Funny enough, other commenters here hate autocomplete but love chat.
switchbak
I had to turn autocomplete off. I value it when I want it, but otherwise it's such a distraction that it both slows me down, and actively irritates me.
Perhaps I'm just an old man telling the LLM to get off my lawn, but I find it does bad things to my ability to concentrate on hard things.
Having a good sense of when it would be useful, and invoking it on demand seems to be a decent enough middle ground for me. Much of it boils down to UX - if it could be present but not actively distracting, I'd probably be ok with it.
imhoguy
Both autocomplete and chat are half-way UX solutions. Really what I need is some kind of mix of in-place chat with completion.
For context, very often I have to put some comment before the line for completion to set an expectation context.
Instead editor should allow me to influence completion with some kind of in-place suggestion input available under keyboard shortcut. Then I could type what I want into such input and when I hit Enter or Tab the completion proposal appears. Even better if it would let me undo/modify such input, and have shortcuts like "show me different option", "go back to previous".
m3kw9
The autocomplete is mostly a nusance and maybe low percentage of the time it does right.
jghn
I thought so too. Until I worked with a client who doesn't allow the use of LLM tools, and I had to turn my Copilot off. That's when I realized how much I'd grown to rely on it despite the headaches.
LVB
The biggest nuisance aspect for me is when it is trying to do things that the LSP can do 100% correctly. Almost surely it is my tooling setup and the LLM is squashing LSP stuff. Seeing Copilot (or even Cursor) suggesting methods or parameters that don't exist is really annoying. Just stand down and let the LSP answer those basic questions, TYVM.
throwup238
Cursor ostensibly has a config setting to run a “shadow” workspace [1], aka a headless copy of the window you’re working in to get feedback from linters and LSPs but they’ve been iterating so fast I’m not sure it’s still working (or ever did much, really).
It really feels like we’re at the ARPANET stage where there’s so much obvious hanging fruit, it’s just going to take companies a while to perfect it.
ahoka
The industry standard was 40% accepted the last time I checked. Correct could be a bit lower, so maybe 1/3?
It’s like having to delete the auto-closed parenthesis more often than not.
tptacek
Yeah, I don't like it either. I think it speaks to the mindset difference Crawshaw is talking about here. When I'm writing code, I don't want things getting in my way. I have a plan. I'm actually pretty Zen about all the typing. It's part of my flow-state. But when I'm exploring code in a dialog with a chatbot, I'm happy for the help.
switchbak
I think we're going to be considered dinosaurs pretty soon. Much like how it's getting harder to buy a manual transmission, programming 'the old way' will probably just fade away over time.
LeftHandPath
I’ve never used it, simply because I hate autocomplete in emails.
Gmail autocomplete saves me maybe 2-5s per email: the recipients name, a comma, and a sign off. Maybe a quarter or half sentence here or there, but never exactly what I would’ve typed.
In code bases, I’ve never seen the appeal. It’s only reliably good at stuff that I can easily find on Google. The savings are inconsequential at best, and negative at worst when it introduces hard-to-pinpoint bugs.
LLMS are incredible technology, but when applied to code, they act more like non-deterministic macros.
switchbak
"negative at worst when it introduces hard-to-pinpoint bugs" - this is actually very true. I've had it recreate patterns _partially_, and paste in the wrong thing in a place that was very hard to discern.
It probably saved me 40 mins, then proceeded to waste 2 hours of me hunting for that issue. I'm probably at the break-even on the whole. The ultimate promise is very compelling, but my current use isn't particularly amazing. I do use a niche language though, so I'm outside the global optima.
LeftHandPath
Exactly! I expect that some are able to put it to good use. I am not one of those people.
My experiences with ChatGPT and Gemini have included lots of confident but wrong answers, eg “What castle was built at the highest altitude”. Thats what gives me pause.
Gemini spits out a great 2D A* implementation no problem. That is awesome. Actually, contrary to my original comment, I probably will use AI for that sort of thing going forward.
Despite that, I don’t want it in my IDE. Maybe I’m just a bit of a Luddite.
null
charlieyu1
I’m a hobby programmer who never worked a programming job. Last week I was bored, I asked o1 to help me to write a Solitaire card game using React because I’m very rusty with web development.
The first few steps were great. Guided me to install things and setup a project structure. The model even generated codes for a few files.
Then something went wrong, the model kept telling me what to do in vague, but didn’t output codes anymore. So I asked for further help, and now it started contradicting itself, rewriting business logic that were implemented in the first response, 3-4 pieces of code snippets of the same file that aren’t compatible etc, and it all fell apart.
mocamoca
LLMs contexts are fast to overload, as the article states. That's why he writes smaller, specific packages, one at a time, and uses a web UI instead of something like cursor.
I had the same issue as you a few days ago. By separating the problem in smaller parts and addressing each parts one by one it got easier.
In your specific case I would try to fully complete the business logic one side. Reset the context. Then provide the logic to a new context and ask for an interface. Difficulty will arise when discovering that the logic is wrong or not suited to the UI, but i would keep using the same process to edit the code. Maybe two different contexts, one for logic, one for UI?
How did you do?
jarsin
My first program ever was a windows calculator. My roomates would sit down and find bugs after I thought I perfected it. I learned so much spending weeks trying to get that damn thing working.
I'm not too optimistic about the future of software development if juniors are turning to AI to do those early projects for them.
cpursley
Yeah, you wanna use Claude for code. That's the problem. Try Cursor or Bolt.
One interesting bit of context is that the author of this post is a legit world-class software engineer already (though probably too modest to admit it). Former staff engineer at Google and co-founder / CTO of Tailscale. He doesn't need LLMs. That he says LLMs make him more productive at all as a hands-on developer, especially around first drafts on a new idea, means a lot to me personally.
His post reminds me of an old idea I had of a language where all you wrote was function signatures and high-level control flow, and maybe some conformance tests around them. The language was designed around filling in the implementations for you. 20 years ago that would have been from a live online database, with implementations vying for popularity on the basis of speed or correctness. Nowadays LLMs would generate most of it on the fly, presumably.
Most ideas are unoriginal, so I wouldn't be surprised if this has been tried already.