I tried vibe coding in BASIC and it didn't go well

47 comments

·July 16, 2025

manca

I literally had the same experience when I asked the top code LLMs (Claude Code, GPT-4o) to rewrite the code from Erlang/Elixir codebase to Java. It got some things right, but most things wrong and it required a lot of debugging to figure out what went wrong.

It's the absolute proof that they are still dumb prediction machines, fully relying on the type of content they've been trained on. They can't generalize (yet) and if you want to use them for novel things, they'll fail miserably.

abrookewood

Clearly the issue is that you are going from Erlang/Elixir to Java, rather than the other way around :)

Jokes aside, they are pretty different languages. I imagine you'd have much better luck going from .Net to Java.

h4ck_th3_pl4n3t

I just wished the LLM model providers would realize this and instead would provide specialized LLMs for each programming language. The results likely would be better.

chuckadams

The local models JetBrains IDEs use for completion are specialized per-language. For more general problems, I’m not sure over-fitting to a single language is any better for a LLM than it is for a human.

hammyhavoc

They'll never be fit for purpose. They're a technological dead-end for anything like what people are usually throwing them at, IMO.

motorest

> They'll never be fit for purpose. They're a technological dead-end for anything like what people are usually throwing them at, IMO.

This comment is detached from reality. LLMs in general have been proven to be effective at even creating complete, fully working and fully featured projects from scratch. You need to provide the necessary context and use popular technologies with enough corpus to allow the LLM to know what to do. If one-shot approaches fail, a few iterations are all it takes to bridge the gap. I know that to be a fact because I do it on a daily basis.

zer00eyz

I will give you an example of where you are dead wrong, and one where the article is spot on (without diving into historic artifacts).

I run HomeAssistant, I don't get to play/use it every day. Here, LLM's excel at filling in the (legion) of blanks in both the manual and end user devices. There is a large body of work for it to summarize and work against.

I also play with SBC's. Many of these are "fringe" at best. LLM's are as you say "not fit for purpose".

What kind of development you are using LLM's for will determine your experience with them. The tool may or may not live up to the hype depending how "common", well documented and "frequent" your issue is. Once you start hitting these "walls" you realize that no, real reason, leaps of inference and intelligence are still far away.

recipe19

I work on niche platforms where the amount of example code on Github is minimal, and this definitely aligns with my observations. The error rate is way too high to make "vibe coding" possible.

I think it's a good reality check for the claims of impending AGI. The models still depend heavily on being able to transform other people's work.

winrid

Even with typescript Claude will happily break basic business logic to make tests pass.

motorest

> Even with typescript Claude will happily break basic business logic to make tests pass.

It's my understanding that LLMs change the code to meet a goal, and if you prompt them with vague instructions such as "make tests pass" or "fix tests", LLMs in general apply the minimum necessary and sufficient changes to any code that allows their goal to be met. If you don't explicitly instruct them, they can't and won't tell apart project code from test code. So they will change your project code to make tests work.

This is not a bug. Changing project code to make tests pass is a fundamental approach to refactoring projects, and the whole basis of TDD. If that's not what you want, you need to prompt them accordingly.

chuckadams

Fixing bugs is also changing project code to make tests pass. The assistant is pretty good at knowing which side to change when it’s working from documentation that describes the correct behavior.

CalRobert

That seems like the tests don’t work?

vineyardmike

Completely agree. I’m a professional engineer, but I like to get some ~vibe~ help on person projects after-work when I’m tired and just want my personal project to go faster. I’ve had a ton of success with go, JavaScript, python, etc. I had mixed-success with writing idiomatic Elixir roughly a year ago, but I’ve largely assumed that this would be resolved today, since every model maker has started aggressively filling training data with code, since we found the PMF of LLM code-assistance.

Last night I tried to build a super basic “barely above hello world” project in Zig (a language where IDK the syntax), and it took me trying a few different LLMs to find one that could actually write anything that would compile (Gemini w/ search enabled). I really wasn’t expecting it considering how good my experience has been on mainstream languages.

Also, I think OP did rather well considering BASIC is hardly used anymore.

gompertz

Yep I program in some niche languages like Pike, Snobol4, Unicon. Vibe coding is out of the question for these languages. Forced to use my brain!

andsoitis

> The models

The models don’t have a model of the world. Hence they cannot reason about the world.

hammyhavoc

"reason" is doing some heavy-lifting in the context of LLMs.

jjmarr

I've noticed the error rate doesn't matter if you have good tooling feeding into the context. The AI hallucinates, sees the bug, and fixes it for you.

empressplay

I don't know if you're working with modern models. Grok 4 doesn't really know much about assembly language on the Apple II but I gave it all of the architectural information it needed in the first prompt of a conversation and it built compilable and executable code. Most of the issues I encountered were due to me asking for too much in a prompt. But it built a complete, albeit simple, assembly language game in a few hours of back and forth with it. Obviously I know enough about the Apple II to steer it when it goes awry, but it's definitely able to write 'original' code in a language / platform it doesn't inherently comprehend.

timschmidt

This matches my experience as well. Poor performance usually means I haven't provided enough context or have asked for too much in a single prompt. Modifying the prompt accordingly and iterating usually results in satisfactory output within the next few tries.

ofrzeta

It didn't go well? I think it went quite well. It even produced an almost working drawing program.

abrookewood

Yep, thought the same thing. I guess people have very different expectations.

Radle

I had way better results. I'd assume the same would have happened to the author if he provided the LLM with a full documentation on what ATARI BASIC is and some example programs.

Especially when asking the LLM to create a drawing program and a game the author would have probably received working code if he supplied the ai with documentation to the graphics function and sprite rendering using ATARI BASIC.

fcatalan

I had more luck with a little experiment a few days ago: I took phone pics of one of the shorter BASIC listings from Tim Hartnell's "Giant Book of Computer Games" (I learned to program out of those back in the early 80s, so I treasure my copy) and asked Gemini to translate it to plain C. It compiled and played just fine on the first go.

xiphias2

4o is not even a coding model and very far from the best coding models OpenAI has, I seriously don't understand why these articles are upvoted so much

ilaksh

I think it's a fair article.

However I will just mention a few things. When you make an article like this please take note of the particular language model used and acknowledge that they aren't all the same.

Also realize that the context window is pretty large and you can help it by giving it information from manuals etc. so you don't need to rely on the intrinsic knowledge entirely.

If they used o3 or o3 Pro and gave it a few sections of the manual it might have gotten farther. Also if someone finds a way to connect an agent to a retro computer, like an Atari BASIC MCP that can enter text and take screenshots, "vibe coding" can work better as an agent that can see errors and self-correct.

serf

please just include the prompts rather than saying "So I said X.."

There is a lot of nuance in how X is said.

CMay

This does kind of make me wonder.

It's believable that we might either see an increase in the number of new programming languages since making new languages is becoming more accessible, or we could see fewer new languages as the problems of the existing ones are worked around more reliably with LLMs.

Yet, what happens to adoption? Perhaps getting people to adopt new languages will be harder as generations come to expect LLM support. Would you almost need to use LLMs to synthesize tons of code examples that convert into the new language to prime the inputs?

Once conversational intelligence machines reach a sort of godlike generality, then maybe they could very quickly adapt languages from much fewer examples. That still might not help much with the gotchas of any tooling or other quirks.

So maybe we'll all snap to a new LLM super-language in 20 years, or we could be concreting ourselves into the most popular languages of today for the next 50 years.

hammyhavoc

Fantasy.

docandrew

Maybe other folks’ vibe coding experiences are a lot richer than mine have been, but I read the article and reached the opposite conclusion of the author.

I was actually pretty impressed that it did as well as it did in a largely forgotten language and outdated platform. Looks like a vibe coding win to me.

sixothree

Here's an example of a recent experience.

I have a web site that is sort of a cms. I wanted users to be able to add a list of external links to their items. When a user adds a link to an entry, the web site should go out and fetch a cached copy of the site. If there are errors, it should retry a few times. It should also capture an mhtml single file as well as a full page screenshot. The user should be able to refresh the cache, and the site should keep all past versions. The cached copy should be viewable in a modal. The task also involves creating database entities, DTOs, CQRS handlers, etc.

I asked Claude to implement the feature, went and took a shower, and when I came out it was done.

nico

Im pretty new to CC, been using it in a very interactive way.

What settings are you using to get it to just do all of that without your feedback or approval?

Are you also running it inside a container, or setting some sort of command restrictions, or just yoloing it on a regular shell?

hammyhavoc

Let us know how the security audit by human beings on the output goes.

catmanjan

The auditors are using llms too!

firesteelrain

Not surprised; there were so many variations of BASIC and unless you train ChatGPT on a bunch of code examples and contexts then it can only get so close.

Try a local LLM then train it

ofrzeta

> ... unless you train ChatGPT on a bunch of code examples and contexts then it can only get so close.

How do you do this?

oharapj

If you're OpenAI you scrape StackOverflow and GitHub and spend billions of dollars on training. If you're a user, you don't

sixothree

RAG maybe?

clambaker117

Wouldn’t it have been better to use Claude 4?

sixothree

I'm thinking Gemini CLI because of the context. He could add some information about the programming language itself in the project. I think that would help immensely.

4b11b4

Even though the max token limit is higher, it's more complicated than that.

As the context length increases, undesirable things happen.

HN

I tried vibe coding in BASIC and it didn't go well

I tried vibe coding in BASIC and it didn't go well