Skip to content(if available)orjump to list(if available)

I use Cursor daily - here's how I avoid the garbage parts

walthamstow

Eng leadership at my place are pushing Cursor pretty hard. It's great for banging out small tickets and improving the product incrementally kaizen-style, but it falls down with anything heavy.

I think it's weakening junior engineers' reasoning and coding abilities as they become reliant on it without having lived for long, or at all, in the before times. I think may be doing the same to me too.

Personally, and quietly, I have a major concern about the conflict of interest of Cursor deciding which files to add to context then charging you for the size of the context.

As with so many products, it's cheap to start with, you become dependent on it, then one day it's not cheap and you're fucked.

rco8786

I’ve been a paying cursor user for 4-5 months now and feeling the same. A lot more mistakes leaking into my PRs. I feel a lot faster but there’s been a noticeable decrease in the quality of my work.

Obviously I could just better review my own code, but that’s proving easier said than done to the point where I’m considering going back to vanilla Code.

ljm

Same result - I tried it for a while out of curiosity but the improvements were a false economy: time saved in one PR is time lost to unplanned work afterwards. And it is hard to spot the mistakes because they can be quite subtle, especially if you've got it generating boilerplate or mocks in your tests.

Makes you look more efficient but it doesn't make you more effective. At best you're just taking extra time to verify the LLM didn't make shit up, often by... well, looking at the docs or the source.. which is what you'd do writing hand-crafted code lol.

I'm switching back to emacs and looking at other ways I can integrate AI capabilities without losing my mental acuity.

geoduck14

Can you elaborate on the mistakes you see? What languages are you working with?

KronisLV

> Personally, and quietly, I have a major concern about the conflict of interest of Cursor deciding which files to add to context then charging you for the size of the context.

> As with so many products, it's cheap to start with, you become dependent on it, then one day it's not cheap and you're fucked.

If it gets too expensive, then I guess the alternative becomes using something like Continue.dev or Cline with one of the providers like Scaleway that you can rent GPUs from or that have managed inference… either that, or having a pair of L4 cards in a closet somewhere (or a fancy Mac, or anything else with a decent amount of memory).

Whereas if there are no well priced options anywhere (e.g. the upfront investment for a company to buy their own GPUs to run with Ollama or something else), then that just means that running LLM based systems nowadays is economically infeasible for many.

null

[deleted]

theshrike79

What do you consider "heavy"? Is it optimising an algorithm or "rewrite this whole codebase in <a different language>"?

walthamstow

Refactoring a critical function that is very large and complex. Yeah, maybe it shouldn't be so large and so complex, but it is and that's the codebase we have. I'm sure many other companies do too.

kamaal

Thats not how apex productivity folks have used any IDE productivity leap including this one.

You dont outsource your thinking to the tool, You do the thinking and let the tool type it for you.

malux85

Then only use it for the small tasks? There's one button you have to click to turn it off.

onion2k

I think it's weakening junior engineers' reasoning and coding abilities as they become reliant on it without having lived for long, or at all, in the before times.

As someone old enough to have built websites in Notepad.exe it's totally reasonable that I ask my teams to turn off syntax highlighting, brace matching, and refactoring tools in VSCode. I didn't have them when I started, so they shouldn't use them today. Modern IDE features are just making them lazy.

/s

timothygold

I think your mistaking programmer productivity with A.I that generates all the code for you allowing you to switch off your brain completely. prompt engineering code is not the same skill as programming and being good at it does not mean you actually understand how code or software works.

walthamstow

Is this supposed to be funny?

brookst

Funny / sad. GP is just highlighting the all too common attitude of people who grew up using new tech (graphing calculators, Wikipedia, etc) who reach a certain age and suddenly new tech is ruining the youth of today.

It’s just human nature, you can decide if it’s funny or sad or whatever

moron4hire

Not all change is progress.

Change comes with pros and cons. The pros need to outweigh the cons (and probably significantly so) for change to be considered progress.

Syntax highlighting has the pro of making code faster to visually parse for most people at the expense of some CPU cycles and a 10 second setting change for people for whom color variations are problematic. It doesn't take anything away. It's purely additive.

AI code generation tools provide a dubious boost to short term productivity at the expense of extra work in the medium term and skill atrophy in the long term.

My junior developers think I don't know they are using AI coding tools. I discovered it about 2 months into them doing it, and I've been tracking their productivity both before and after. In one case, one might be committing to the repository slightly more frequently. But in all cases, they still aren't completing assignments on time. Or at all. Even basic things have to be rewritten because they aren't suitable for purpose. And in our pair programming sessions, I see them frozen up now, where they weren't before they started using the tools. I can even see them habitually attempt to use the AI, but then remember I'm sitting with them, and halt.

I tried to use AI code generation once to fill in some ASP.NET Core boilerplate for setting up authentication. Should be basic stuff. Should be 3 or 4 lines of code. I've done it before, but I forgot the exact lines and had been told AI was good for this kind of lazy recall of common tasks. It gave me a stub that had a comment inside, "implement authentication here". Tried to coax the AI into doing what I wanted and easily spent 10x more time than it would have taken to look up the documentation. And it still wasn't done. I haven't touched AI code gen since.

So IDK. I'm very skeptical of the claims that AI is writing significant amounts of working code for people, or that it at all rivals even a moderately smart junior developer (say nothing of actually experienced senior). I think what's really happening is that people are spending a lot of time spinning the roulette wheel, always betting on 00, and then crowing they're a genius when it finally lands.

onion2k

AI code generation tools provide a dubious boost to short term productivity at the expense of extra work in the medium term and skill atrophy in the long term.

At the moment, sure. They've only been available for about 5 minutes in the grand scheme of dev tools. If you believe that AI assistants are going to be put back in the box then you are just flat out wrong. They'll improve significantly.

I'm very skeptical of the claims that AI is writing significant amounts of working code for people

You may be right, but people write far too much code as it is. Software development should be about thinking more than typing. Maybe AI's most useful feature will be writing something that's slightly wrong in order to get devs to think about a good solution to their problem and then they can just fix the AI's code. If that results in better software then it's a huge win worth billions of dollars.

The belief that AI is worthless unless it's writing the code that a good dev would write is a trap that you should avoid.

kamaal

>>In one case, one might be committing to the repository slightly more frequently. But in all cases, they still aren't completing assignments on time.

Most people are using it to finish work soon, rather than use it to do more work. As a senior engineer your job must not be to stop the use of LLMs, but create opportunities to build newer and bigger products.

>>I can even see them habitually attempt to use the AI, but then remember I'm sitting with them, and halt.

I understand you and I grew up in a different era. But life getting easier for the young isnt exactly something we must resent. Things are only getting easier with time and have been like this for a few centuries. None of this is wrong.

>>Tried to coax the AI into doing what I wanted and easily spent 10x more time than it would have taken to look up the documentation.

Honestly this largely reads like how my dad would describe technology from the 2000s. It was always that he was better off without it. Whether that was true or false is up for debate, but the world was moving on.

kamaal

>>but it falls down with anything heavy.

If you are using LLMs to write anything more than a if(else)/for block at a time you are doing it wrong.

>>I think it's weakening junior engineers' reasoning and coding abilities as they become reliant on it without having lived for long, or at all, in the before times.

When I first started work, my employer didn't provide internet access to employees, their argument would always be how would you code if there was no internet connection, out there in the real world? , As it turns out they were not only worried about the wrong problem, but the got the whole paradigm about this new world wrong.

In short it was not worth building anything at all in a world internet doesn't exist.

>>then one day it's not cheap ...

Again you are worried about the wrong thing, your worry should not be what happens when its no longer cheaper, but when it, as a matter of fact gets cheaper. Which it will.

bluefirebrand

> If you are using LLMs to write anything more than a if(else)/for block at a time you are doing it wrong

Then what value are they actually adding?

If this is all they are capable of, surely you could just write this code yourself much faster than trying to describe what you want to the LLM in natural language?

I cannot imagine any decently competent programmer gaining productivity from these tools if this is how limited they still are

Why are people so bullish on them?

galbar

This is how I feel. I mentioned this to a couple of friends over a beer and their answer was that there are many not "decently competent programmer"s in the industry currently and they benefit immensely from this technology, at the expense of the stability and maintainability of the system they are working on.

kamaal

English to Code translation.

Albeit they are fairly context aware as to what you are asking. So they can save a lot of RTFM and code/test cycles. At times they can look at the functions that are already built, and write new ones for you, if you can begin to describe the function well.

But if you want to write a good function, like written to fit tightly to specifications. Its too much English. You need to describe in steps what is to be done, plus exceptions. And at some point you are just doing logic programming(https://en.wikipedia.org/wiki/Logic_programming) In the sense that whole english text looks like a list of and/or situations + exceptions.

So you have to go one atomic step(a decision statement and a loop) at a time. But thats a big productivity boost too. Reason being able to put lots of text in place without you having to manually type it out.

>>you could just write this code yourself much faster than trying to describe what you want to the LLM in natural language?

Honestly speaking most of coding is manually laborious if you don't know touch typing. And even if you did know its a chore.

I remember when I started using co-pilot with react it was doing a lot of otherwise typing work I'd have to do.

>>I cannot imagine any decently competent programmer gaining productivity from these tools if this is how limited they still are

IMO opinion, my brain atleast over the years has seen so many code patterns, debugging situations and what to anticipate and assemble as I go, that having some intelligent typing assistant is a major productivity boost.

>>Why are people so bullish on them?

Eventually newer programming languages will come along and people will build larger things.

ModernMech

Honestly, a lot of the problems people have with programming that they use AI to solve can be solved with better language design and dev tools.

For example, I like LLMs because they take care of a lot of the boilerplate I have to write.

But I only have to write that boilerplate because it's part of the language design. Advances in syntax and programming systems can yield similar speedups in programming ability. I've seen a 100x boost in productivity that came down to switching to a DSL versus C++.

Maybe we need more DSLs, better programming systems, better debugging tools, and we don't really need LLMs the way LLM makers are telling us? LLMs only seem so great because our computer architecture, languages and dev tooling and hardware are stuck in the past.

Instead of being happy with the Von Neumann architecture, we should be exploring highly parallel computer architectures.

Instead of being happy with imperative languages, we should be investing heavily in studying other programming systems and new paradigms.

Instead of being happy coding in a 1D text buffer, we should be investing more in completely imaginative ways of building programs in AR, VR, 3D, 2D.

LLMs are going to play a part here, but I think really they are a band-aid to a larger problem, which is that we've climbed too high in one particular direction (von-neuman/imperative/text) and we are at a local maxima. We've been there since 2019 maybe.

There are many other promising peaks to climb, avenues of research that were discovered in the 60s/70s/80s/90s have been left to atrophy the past 30 years as the people who were investigating those paths refocused or are now gone.

I think all these billions invested in AI are going to vaporize, and maybe then investors will focus back on the fundamentals.

LLMs are like the antenna at the top of the Empire State Building. Yes, you can keep going up if you climb up there, but it's unstable and eventually there really is a hard limit.

If we want to go higher that that, we need to build a wider and deeper foundation first.

laborcontract

Cursor's current business model produces a fundamental conflict between the well-being of the user and the financial well-being of the company. We're starting to see these cracks form as LLM providers are relying on scaling through inference-time compute.

Cursor has been trying to do things to reduce the costs of inference, especially through context pruning. For instance, if you "attach" files to a conversation, Cursor no longer stuffs the code from those files into the prompt. Instead, it'll run function calls to open those files and read bits and pieces of the code until the model feels it has enough information. This seems like a perfectly reasonable strategy until you realize you cannot do the same thing with reasoning models, if you're limiting the reasoning to just the initial prompt.

If you prune out context from the initial prompt, instead of reasoning on richer context, the llm reasons only on the prompt itself (w/ no access to the attached files). After the thinking process, Cursor runs function calls to retrieve more context, which entirely defeats the point of "thinking" and induces the model to create incoherent plans and speculative edits in its thinking process, thus explaining Claude's bizarre over-editing behavior. I suspect this is why so many Cursor users are complaining about Claude 3.7.

On top of this, Cursor has every incentive to keep the thinking effort for both o3-mini and Claude 3.7 to the very minimum so as to reduce server load.

Cursor is being hailed as one of the greatest SAAS growth stories but their $20/mo all-you-can-eat business model puts them in such a bad place.

NitpickLawyer

> This seems like a perfectly reasonable strategy until you realize you cannot do the same thing with reasoning models, if you're limiting the reasoning to just the initial prompt.

Keep in mind that what we call "reasoning" models today are the first iteration. There's no fundamental reason why you can't do what you stated. It's not done now, but it can be done.

There's nothing stoping you from running "tinking" in "chunks" of 1-2 paragraphs, doing some search, and adding more context (maybe from pre-reasoned cache) and continuing the reasoning from there.

There's also work being done on think - summarise - think - summarise - etc. And on various "RAG"-like thinking.

rafaelmn

>Cursor has been trying to do things to reduce the costs of inference, especially through context pruning. For instance, if you "attach" files to a conversation, Cursor no longer stuffs the code from those files into the prompt. Instead, it'll run function calls to open those files and read bits and pieces of the code until the model feels it has enough information. While that seems like a perfectly reasonable strategy, it starts to fall apart when integrating reasoning models.

In general I feel like this was always the reason automatic context detection could not be good in fixed fee subscription models - providers need to constrain the context to stay profitable. I also saw that things like Claude Code happily chew through your codebase, and bank account, since they are charging by token - so they have the opposite incentive.

Roritharr

This is only surface-level deep. Cursor already has Quotas for their paid plans and Usage-based Pricing for their larger models, which I run into and fall over to their usage based model every month.

Imo most of their incentive on context-pruning comes not just from reducing the token amount, but from the perception that you only have to find "the right way"tm to build that context window automatically, to get to coding panacea. They just aren't there yet.

laborcontract

If you’re going to pay on the margin, why not use those incremental dollars running the same requests on cline? I’m assuming cost is the deciding factor here because, quality-wise, plugging directly into provider apis with cline always does a much better job for me.

IanCal

> Instead, it'll run function calls to open those files and read bits and pieces of the code until the model feels it has enough information. This seems like a perfectly reasonable strategy until you realize you cannot do the same thing with reasoning models, if you're limiting the reasoning to just the initial prompt.

There's nothing about this that conflicts with reasoning models, I'm not sure what you mean here.

laborcontract

what i mean is that their implementation (thinking only on the first response) renders zero benefit because it doesn’t see the code itself. They run multiple function calls to analyze your codebase in increments. If they ran the thinking model on the output of those function calls, then performance would be great but, so far, this is not what they are doing (yet). It also dramatically increases the cost of running the same operation.

IanCal

But the way those models work is to run everything once the function calls come in. Are you saying cursor is not using the model you selected on function calls responses?

throwaway314155

This sounds like a Cursor issue, not something that effects reasoning models in general.

edit: Ah, I see what you mean now.

namaria

Reflecting on your comment I realized that using a huge amount of GPUs is akin to an Turing machine approaching infinite speed. So I think the promise of LLMs writing code is basically saying: if we add a huge number of reading/writing heads with unbounded number of rules, we can solve decideability. Because what is the ability to generate arbitrarily complex code if not solving the halting problem? Maybe there's a more elegant or logical way to postulate this, or maybe I'm just confused or plain wrong, but it seems to me that it is impossible to generate a program that is guaranteed to terminate unless you can solve decideability. And throwing GPUs at a huge tape is just saying that the tape approaches infinite size and the Turing machine approaches infinite speed...

Or put another way, isn't the promise of software that is capable to generate any software given a natural language description in finite time basically assuming P=NP? Because unless the time can be guaranteed to be finite, throwing GPU farms and memory at this most general problem (isn't the promise of using software to generating arbitrary software the same as the promise that any possible problem can be solved in polynomial time?) is not guaranteed to solve it in finite time.

MrBuddyCasino

> Cursor has been trying to do things to reduce the costs of inference, especially through context pruning.

You can also use cline with gemini-2.0-flash, which supports a huge context window. Cline will send it the full context and not prune via RAG, which helps.

laborcontract

I love cline but i’ve never tried the gemini models with it. I’ll give it a shot tonight, thanks for the tip!

null

[deleted]

cyprx

I had been using Cursor for a month until a day when my house got no internet, then i realized that i started forgetting how to write code properly

risyachka

I had the exact same experience, pretty sure this happens in most cases, people just don’t realize it

ant6n

Just get a Mac Studio with 512GB RAM and run a local model when the internet is down.

jjude

Which local model would you recommend that comes close to cursor in response quality? I have tried deepseek, mistral, and few others. None comes close to quality of cursor. I keep coming back to it.

itsabackupplan

It's a backup plan, who cares if the quality matches? If it did, Cursor would not be in question.

pknerd

Can one run cursor with local LLMs only?

_puk

Back up your $20 a month subscription with a $2000 Mac Studio for those days your internet is down.

Peak HN.

automatic6131

Lol he suggested a $10k Mac Studio

But you can at least resell that $10k Mac Studio, theoretically.

timothygold

Maybe this "backup" solution.. developed into commodity hardware as an affordable open source solution that keeps the model and code locally and private at all times is the actual solution we need.

Lets say a cluster of raspberry pi's / low powered devices producing results as good as claude 3.7-sonnet. Would it be completely infeasible to create a custom model that is trained on your own code base and might not be a fully fledged LLM but provides similar features to cursor?

Have we all gone bonkers sending our code to third parties? The code is the thing you want to keep secret unless your working on an open source project.

yohannesk

Even more absurd is that Mac Studio with 512GB RAM costs around $9.5K

rullopat

2000$? You wish!

eadmund

But then I’d be using a Mac, and that would slow my development down and be generally miserable.

jillesvangurp

The UX of tools like these is largely constrained by how good they are with constructing a complete context of what you are trying to do. Micromanaging context can be frustrating.

I played with aider a few days ago. Pretty frustrating experience. It kept telling me to "add files" that are in the damn directory that I opened it in. "Add them yourself" was my response. Didn't work; it couldn't do it somehow. Probably once you dial that in, it starts working better. But I had a rough time with it creating commits with broken code, not picking up manual file changes, etc. It all felt a bit flaky and brittle. Half the problem seems to be simple cache coherence issues and me having to tell it things that it should be figuring out by itself.

The model quality seems less important than the plumbing to get the full context to the AI. And since large context windows are expensive, a lot of these tools are cutting corners all the time.

I think that's a short term problem. Not cutting those corners is valuable enough that a logical end state is tools that don't do that that cost a bit more. Just load the whole project. Yes it will make every question cost 2-3$ or something like that. That's expensive now but if it drops by 20x we won't care.

Basically large models that support huge context windows of millions/tens of millions of tokens cost something like the price of a small car and use a lot of energy. That's OK. Lots of people own small cars. Because they are kind of useful. AIs that have a complete, detailed context of all your code, requirements, intentions, etc. will be able to do a much better job that one that has to guess all of that from a few lines of text. That would be useful. And valuable to a lot of people.

Nvidia is rich because they have insane margins on their GPUs. They cost a fraction of what they sell them for. That means that price will crash over time. So, I'm optimistic that a lot of this stuff will improve rapidly.

mbeex

> aider [...] It kept telling me to "add files" that are in the damn directory that I opened it in.

That's intentional, and I like it. It limits the context dynamically to what is necessary (of course it makes mistakes). You can also add files with placeholders and in a number of other ways. but most of the time I let Aider decide. It has a repomap (https://aider.chat/docs/repomap.html), gradually building up knowledge and makes proposals based on this and other information it gathered also with token costs and out-of-context-window in mind.

As for manual changes: aider is opinionated regarding the role of Git in your workflow. At first glance, this repels some people and some stick to this opinion. For others, it is exactly one of the advantages, especially in combination with the shell-like nature of the tool. But the standard Git handling can still be overridden. For me personally, the default behavior becomes more and more smooth and second nature. And the whole thing is scriptable, I only begin to use the possibilities.

In general: Tools have to be learned, impatient one-shot attempts are simply not enough anymore.

jampekka

> Nvidia is rich because they have insane margins on their GPUs. They cost a fraction of what they sell them for. That means that price will crash over time. So, I'm optimistic that a lot of this stuff will improve rapidly.

OTOH currently the LLM companies are probably taking a financial loss with each token. Wouldn't be surprised if the price doesn't even cover the electricity used in some cases.

Also e.g. Gemini already runs on Google's custom hardware, skipping the Nvidia tax.

_heimdall

> Nvidia is rich because they have insane margins on their GPUs. They cost a fraction of what they sell them for. That means that price will crash over time. So, I'm optimistic that a lot of this stuff will improve rapidly.

That still leaves us with an ungodly amount of resources used both to build the GPUs and to run them for a few years before having to replace them with even more GPUs.

Its pretty amazing to me how quickly the big tech companies pivoted from making promises to "go green" to buying as many GPUs as possible to burn through entire powerplants worth of electricity.

myflash13

Try Claude Code. It figures out context by itself. I’m having a lot of success with it for a few days now, whereas I never caught on with Cursor due to the context problem.

rhodescolossus

I've tried Cursor a couple of times but my complain is always the same: why forking VS Code when all this functionality could just be an extension, same as Copilot does?

Some VSCode extensions don't work, you need to redo all your configuration, add all your workspaces... and the gain vs Copilot is not that high

andrewl-hn

> why forking VS Code when all this functionality could just be an extension, same as Copilot does?

Have you programmed extensions for VSCode before? While it seems like a fairy extensible system overall, the editor component in particular is very restrictive. You can add text (that's what extensions like ErrorLens and GitLens are doing), inlay hints, and on-hover popup overlays (those can only trigger on words, and not on punctuation). What Cursor does: the automatic diff-like views of AI suggestions with graphic outlines, floating buttons, and whatnot right on top of the text editing view - is not possible in vanilla VSCode.

This was originally driven by necessity of tighter control over editor performance. In its early days VSCode was competing with Atom - another extensible JS-powered editor from GitHub, and while Atom had an early lead due to larger extensions catalog VSCode ultimately won the race because they manged to maintain lower latency of their text editor component. Nowadays they still don't want to introduce extra extension points to it, because newer faster editors pop out all the time, too.

frereubu

> and the gain vs Copilot is not that high

I think that's (at least part of) your answer. More friction to move back from an entirely separate app rather than disabling an extension.

2sk21

I read this point in the article with bafflement:

"Learn when a problem is best solved manually."

Sure, but how? This is like the vacuous advice for investors: buy low and sell high

dkersten

By trying things and seeing what it’s good and bad at. For example, I no longer let it make data modelling decisions (both for client local data and database schemas), because it had a habit of coding itself into holes it had trouble getting back out of, eg duplicating data that it then has difficulty keeping in sync, where a better model from the start might have been a more normalised structure.

But I came to this conclusion by first letting it try to do everything and observing where it fell down.

blainm

I've found tools like Cursor useful for prototyping and MVP development. However, as the codebase grows, they struggle. It's likely due to larger files or an increased number of them filling up the context window, leading to coherence issues. What once gave you a speed boost now starts to work against you. In such cases, manually selecting relevant files or snippets from them yields better results, but at that point it's not much different from using the web interface to something like Claude.

Semaphor

I had that same experience with Claude Code. I tried to do a 95% "Idle Development RPG" approach to developing a music release organization software. At the beginning, I was really impressed, but with more and more complexity, it becomes increasingly incoherent, forgetting about approaches and patterns used elsewhere and reinventing the wheel, often badly.

blitzar

Or the context not being large enough for all the obscure functions and files to go into the context. I am too basic to have dug deep enough, but a simple (automatic) documentation context for the entire project would certainly improve things for me.

turnsout

Agreed. One useful tip is to have Cursor break up large files into smaller files. For some reason, the model doesn't do this naturally. I've had several Cursor experiments grow into 3000+ line files because it just keeps adding.

Once the codebase is reasonably structured, it's much better at picking which files it needs to read in.

yard2010

How can I stop Cursor from sending .env files with secrets as plain text? Nothing I tried from the docs works.

M4v3R

This is a huge issue that was already raised on their forums and it's very surprising they didn't address it yet.

[0] https://forum.cursor.com/t/environment-secrets-and-code-secu...

timrichard

I have been adding .env files to .cursorignore so far.

I can see from that thread that the approach hasn’t been perfect, but it seems that the last two releases have tried to address that :

“0.46.x : .cursorignore now blocks files from being added in chat or sent up for tab completions, in addition to ignoring them from indexing.”

null

[deleted]

Amekedl

Compounding the opinions of other commentors, I feel that using Cursor is a bad idea. It's a closed source SaaS, and with these components involved, service quality can do wild swings on a daily basis, not something I'm particularly keen of.

rco8786

This is true of every single service provider outside of fully OSS solutions, which are a teeny tiny fraction of the world's service providers.

turnsout

There's always Aider with local models!

stared

Other useful things I've discovered:

- Push for DRY principles ("make code concise," "ensure good design").

- Swap models strategically; sometimes it's beneficial to design with one model and implement with another. For example, use DeepSeek R1 for planning and Claude 3.5 (or 3.7) for execution. GPT-4.5 excels at solving complex problems that other models struggle with, but it's expensive. - Insist on proper typing; clear, well-typed code improves autocompletion and static analysis.

- Certain models, particularly Claude 3.7, overly favor nested conditionals and defensive programming. They frequently introduce nullable arguments or union types unnecessarily. To mitigate this, keep function signatures as simple and clean as possible, and validate inputs once at the entry point rather than repeatedly in deeper layers.

- Emphasize proper exception handling. Some models (again, notably Claude 3.7) have a habit of wrapping everything in extensive try/catch blocks, resulting in nested and hard-to-debug code reminiscent of legacy JavaScript, where undefined values silently pass through multiple abstraction layers. Allowing code to fail explicitly is a blessing for debugging purposes; masking errors is like replacing a fuse with a nail.

stared

Some additional thoughts on GPT-4.5: it provides BFK-9k experience - eats e̶n̶e̶r̶g̶y̶ ̶c̶e̶l̶l̶s̶ budget ($2 per call!) like there is no tomorrow, but removes bugs with a blast.

In my experience, the gap between Claude 3.7 and GPT-4.5 is substantial. Claude 3.7 behaves like an overzealous intern on stimulants. It delivers results but often includes unwanted code changes, resulting in spaghetti code with deeply nested conditionals and redundant null checks. Although initial results might appear functional, the resulting technical debt makes subsequent modifications increasingly difficult, often leaving the codebase in disarray. GPT-4.5 behaves more like a mid-level developer, thoughtfully applying good programming patterns.

Unfortunately, the cost difference is significant. For practical purposes, I typically combine models. GPT-4.5 is generally reserved for planning, complex bug fixes, and code refinement or refactoring.

In my experience, GPT-4.5 consistently outperforms thinking models like o1. Occasionally, I'll use o3-mini or DeepSeek R1, but GPT-4.5 tends to be noticeably superior (at least, on average). Of course, effectiveness depends heavily on prompts and specific problems. GPT-4.5 often possesses direct knowledge about particular libraries (even without web searching), whereas o3-mini frequently struggles without additional context.

kikimora

Wouldn’t it be easier instead of juggling with models and their quirks to just write the code the old way?

stared

Depends.

Sometimes I could solve in 15 mins, a bug I had been chasing for days. In other cases, it is simpler to write codes by hand - as AI either does not solve a problem (even a simple one), or does, but at a cost of tech debt - or it takes longer than doing things manually.

AI is just one more tool in our arsenal. It is up to us to decide when to use them. Just because we have a hammer does not mean we need to use it for screws.

> Wouldn’t it be easier instead of juggling with [something] and their quirks to just write the code the old way?

This phrase, when taken religiously, would keep us writing purely in assembly - as there is always "why this new language", "why this framework", "why LLMs".

mrlowlevel

Do any of these tools use the rich information from the AST to pull in context? Coupled with semantic search for entry points into the AST, it feels like you could do a lot…

zarathustreal

Don’t they all do this? Surely they’re not just doing naive text, n-gram, regex, embeddings, etc, right?