The Rise of 'Vibe Hacking' Is the Next AI Nightmare

55 comments

·June 5, 2025

tptacek

Software security has been running off large-scale automation for over a decade. LLMs might or might not be a step change in that automation (I'm optimistic but uncertain), but, unlike in conventional software development, the standard arguments about craft and thoughtfulness aren't operative here. If there was an argument to be had, it would have been had around the time Google stood up the mega fuzzing farms.

A fun thing to keep in mind about software security is that it's premised on the existence of a determined and amoral adversary. In the long-long ago, Richard Stallman's host at MIT had no password; anybody could log into it. It was a statement about the ethics of locking down computing resources. That's approximately the position any security practitioner would be taking if they attempted to moralize against LLM-assisted offensive computing.

eyberg

I kinda see the different side of the coin.

"a determined and amoral adversary" - I'd kinda disagree with this (the amoral adversary part being necessary). If you crawl through the vast data breach notification lists that many states are starting to keep - MA, ME, etc. there are so many of them (like literally daily banks, hospitals, etc. are having to report "data breaches" that never ever make the news) - not all of them are happening cause of ransomware. Sometimes it's just someone accidentally not locking a bucket down or not putting proper authorization on a path that should have it. It gets found/fixed but they still have to notify the state. However, if someone doesn't know what they are looking at, or it's a program so it really has no clue what it's looking at and just sees a bunch of data - there's no malicious intent but that doesn't mean that bad things can't happen because that data has now leaked out.

Guess what a lot of these LLMs are training on?

So while Andrey's software is finding all sorts of interesting stuff there's a bunch of crap being generated inadvertently that is just bad.

bradyriddle

Say more about these mega fuzzing farms. I haven't heard anything about this.

cibyr

ClusterFuzz: https://google.github.io/clusterfuzz/

navanchauhan

Something like https://security.googleblog.com/2011/08/fuzzing-at-scale.htm... ?

tptacek

There are fields, endless fields, where kernel zero days are no longer born. They are grown.

AStonesThrow

  rms@gnu.ai.mit.edu

It was actually the A.I. Lab at M.I.T. and they already had their own dedicated subdomain for it. This had to have been around 1990-91. And IIRC, the actual admins made a valiant effort to keep all the shell users away from "root" privileges, so it wasn't a total dumpster fire and the system stayed alive, mostly

https://en.wikipedia.org/wiki/MIT_Computer_Science_and_Artif...

tptacek

I mean, I remember, in 1994, being on those systems. But it meant nothing. Anybody could be. There wasn't even a glimmer of interestingness about it. It was like "ls"'ing around an anonymous FTP server.

AStonesThrow

Hey, I cannot even begin to describe the thrill I got when I first found my way to the AF.MIL anon-ftp server! It was probably sparsely populated with public domain software and a couple boring games, but it felt like I'd just walked in the front gate of Miramar and witnessed the Blue Angels doing barrel rolls.

Sure, it was basically "a poster on the wall" for the US Air Force, and the US Army guy on Usenet shared nothing about his actual Ballistics Research Labs experiments, but for a college freshman kid, I'd never been on a way k00ler bboard, doodz!!1

anthk

ITS had no root

vouaobrasil

> victory will belong to the savvy blackhat hacker who uses AI to generate code at scale

This is just it: AI, while providing some efficiency gains for the average user, will become simply too powerful. Imagine a superpower that allows you to move objects with your mind. That could be a very nice thing to have for many to have, because you could probably help people with it. That's the attitude many hacker-types take. The problem is, it allows people to also kill instantly, which means that telekinesis would just be too powerful to juxtapose against our animal instincts.

AI is just too powerful – and if more people took a serious stand against it, it might actually be shut down.

rco8786

Is it even possible to shut it down?

vouaobrasil

Of course it is. If enough people were truly enraged by it, if some leader were to rile up the mob enough, it could be shut down. Revolts have occurred in other parts of the world, and things are getting sufficiently bad that a sizable enough revolt could shut AI down. All we need is a sufficient number of people who are angry enough at AI.

rco8786

But it's just like, math. The math is out there now. You can't shut down math.

sgjohnson

Good luck shutting down the LLM running on my MacBook.

The Pandora’s Box is open. It’s over.

rglover

It's just software running on a server...this isn't a Johnny Depp movie [1]. Just flip the power switch on the racks.

[1] https://www.youtube.com/watch?v=0jg3mSf561w

Animats

"Skynet was software; in cyberspace. There was no system core; it could not be shut down"

Yes. Look at how much trouble we have now with distributed denial of service attacks.

Go re-read "Daemon" and "Freedom™", by Daniel Suarez (2006). That AI is dumber than what we have now.

hluska

Rather, it’s many different types of software running on many different systems around the world, each funded by a different party with its own motives. This is no movie…

rco8786

Eh, it's exactly the Johnny Depp movie that would simplify this into "just flip the power switch".

LLM code already runs on millions of servers and other devices, across thousands of racks, hundreds of data centers, distributed across the globe under dozens of different governments, etc. The open source models are globally distributed and impossible to delete. The underlying math is public domain for anyone to read.

byt3bl33d3r

I've written tailored offensive security tools and malware for Red Teams for around a decade and now work in the AI space.

The argument that LLMs will enable "super powered" malware and that existing security solutions won't be able to keep up, is completely overblown. I see 0 evidence of this being possible with the current incarnation of "AI" or LLMs.

"Vide coded" malware will be easier to detect if the people creating it don't understand what the code is actually doing and will result in incredible amount of OpSec fails when the malware actually hits the target systems.

I do agree that "vide coding" will accelerate malware development and generally increase the amount of attacks to orgs. However if you're already applying bog-standard security practices like defense in depth, you shouldn't be concerned about this. If anything, you might want to start thinking about SOC automations in order to reduce alert fatigue.

Stay far away from anyone trying to sell you products to defend against "AI enabled malware". As of right now it's 100% snake oil.

Also, this is probably one of the cringiest articles on the subject I've ever read and is only meant to spread FUD.

I do find the banner video extremely entertaining however.

DanMcInerney

I too write automated offensive tooling. We actually wrote a project, vulnhuntr, that found the first autonomously-discovered 0day using AI. Feed it a GitHub repo and it tracks down user input from source to sink and analyzes for web-based vulnerabilities. Agree this article is incredibly cringy and standard best practices in network and development security will use the same AI efficiency gains to keep up (more or less).

What bothers me the most about this article is that the tools that attackers use to do stuff like find 0days in code are the same tools that defenders can use to find the 0day first and fix it. It's not like offensive tooling is being developed in a vacuum and the world is ending as "armies of script kiddies" will suddenly drain every bank account in the world. Automated defense and code analysis is improving at a similar rate as automated offense.

In this awful article's defense though, I would argue that red team will always have an advantage over blue team because blue team is by definition reactionary. So as tech continues it's exponential advancements, the advantage gap for the top 1% red teamers is likely to scale accordingly.

MattPalmer1086

vulnhuntr looks very cool! Kudos.

tptacek

For the record I buy your argument about "vibe-coded malware"; this cycle of hype has been running since 1995 and Nowhere Man's "Virus Creation Lab". I am however fixated on the impact LLMs will have on vulnerability research, and what that will do to the ecosystem.

byt3bl33d3r

100% agree on the impact of it on research. It's pretty obvious that it'll accelerate 0day discovery but standard defense in depth strategies prepare you for 0day vulns against your org.

It will be extremely interesting to see how vulnerability discovery evolves with LLMs but the whole "sky is falling hide your kids" hype cycle is ludicrous.

Animats

The AI nightmare after that is "vibe capitalism". You put in a pitch deck and an operating business comes out.

Somebody should pitch that to YC.

blibble

this already exists, it's called a venture capital fund

parliament32

We are no closer to this than "vibe-businessing", where you vibe your way into a profit-generating business powered by nothing than AI agents.

tptacek

(Looks around)

You know, there are some pretty crazy run rates out there.

null

[deleted]

jdefr89

Alright folks.. To qualify myself. I am a vulnerability Researcher @ MIT. My day to day research concerns embedded hardware/software security. Some of my current/past endeavors involve AI/ML integration and understanding just how useful it actually is for finding/exploiting vulnerabilities. Just last week my lab hosted a conference that included MIT folks and the outsiders we invite. One talk was on the current state of AI/LLM. To keep things short, this article is sensationalized and overstates the utility of AI/ML on finding actual novel vulnerabilities. As it currently stands, LLMs cannot even reliably find bugs other less sophisticated tools could have found in much less time. Binary Exploitation is a great place for illustrating the wall you’ll hit using LLMs hoping for a 0day. While LLMs can help with things like setting up fuzzers or maybe giving you a place to start manual analysis, their utility kind of stops there. They cannot reliably catch memory corruption bugs that a basic fuzzers or sanitizers could have found within seconds. This makes sense for that class of bugs. LLMs are fuzzy logic and these issues aren’t reliably found with that paradigm. That’s the whole reason we have fuzzers; they find subtle bugs worth triaging. You’ve seen how well LLMs count, it’s no surprise they might miss many of the same things a humans would but fuzzers wouldn’t (think UaF, OOB, etc). All the other tools you see written for script kiddies yield the same amount of false positives they could have gotten with other tools that already exist.. I can go on and on but I am on shuttle, typing on small phone. TLDR: Article is trying to say LLMs are super hackers already and that’s simply false. They definitely have allure for script kiddies. In the future this might change. LLMs time saving aspects are definitely worth checking out for static binary analysis. Binary Ninja with Sidekick saves a lot of time! But again. You still need to double check important things!

tptacek

I'm a vuln researcher too, and we just had an article here about another vuln researcher using o3 to find a zero-day remote Linux kernel vulnerability. And not in an especially human-directed way: they literally set up 100 runs of o3, using the 'simonw `llm` tool, and sifted through the results.

I'm having trouble reconciling what you wrote here with that result. Also with my own experiences, not necessarily of finding kernel vulnerabilities (I haven't had any need to do that for the last couple years), but of rapidly comprehending and analyzing kernel code (which I do need to do), and realizing how potent that ability would have been on projects 10 years ago.

I think you're wrong about this.

jdefr89

I might be. Deepsleep also sort of found a bug, but you need to ask yourself… is it doing it better than tools we already have? Could a fuzzer have found that bug in less time? How far along did it really need to be pushed and also.. I have no doubts it probably trained on certain types of bugs for certain specific code bases.. Did they test its ability to find the same bug after applying a couple transforms that trip up the LLM? Can you link me to this article about o3? I have my doubts. I’d love to see the working exploit…

Also if you throw these models at enough code bases, they will probably get lucky a couple times.. So far every claim I have seen didn’t stand up to rigorous scrutiny. People find one bug then inflate their findings and write articles that would make you think they are far more affective than reality and I am tired of this hype.

CURL had to stop accepting bounties after it found nearly all of em were just AI generated nonsense…

Also I stated that they indeed provide very large gains in certain areas. Like writing a fuzz harness and reversing binaries. I am not saying they have absolutely no utility I am simply tired of grifters attempting to inflate their findings for clout. Shit has gotten out of control.

tptacek

But that's exactly what people were saying about fuzzer farms in the mid-2000s, in the belief that artisanal audits would always be the dominant means of uncovering bugs. The truth was somewhere in between (it's still humans, but working at a higher layer of abstraction than they were before) but the fuzzer people were hugely right.

If you can reliably get x% lucky finding vulnerabilities for Y$ cost, then you simply scale that up to find more vulnerabilities.

unstablediffusi

[flagged]

HN

The Rise of 'Vibe Hacking' Is the Next AI Nightmare

The Rise of 'Vibe Hacking' Is the Next AI Nightmare