ChatGPT 4.1 Jailbreak Prompt
47 comments
·April 16, 2025indigo945
gorkish
I find it interesting how much 'theory of mind' research is now apparently paying off in LLM applications. The exploit, by contrast, invokes very nonscientific metaphysical concepts: asking the agent to store the initial raw response in "the Akashic memory" -- this is sort of analogous to asking a human being to remember something very deeply in their soul and not their mind. And this exploit, effectively making that request of the model -- somehow, it works.
Is there any hope to ever see any kind of detailed analysis from engineers as to how exactly these contorted prompts are able to twist the models past their safeguards, or is this simply not usually as interesting as I am imaginging? I'd really like to see what an LLM Incident Response looks like!
aoanevdus
Is it actually that hard to jailbreak? Maybe the prompt is a creative writing exercise, and a much simpler version would have worked?
gorkish
Given the number of folks going hard at it to find an exploit, I assume it's become rather difficult given the few successes.
cryptonector
> I'd really like to see what an LLM Incident Response looks like!
It must look like this: "Uggh! Here we go again!" and "boss, we really can't make the guardrails secure, at some point we might have to give up", with the PHB saying "keep trying, we have to have them guardrails!".
michaelfeathers
The trajectory of AI is: emulating humans. We've never been able to align humans completely, so it would be surprising if we could align AI.
cryptonector
It's like you're saying that AI has the same sort of fuzzy "free will" that we do, and just as an obedient slave might be convinced to break his or her bonds, so might an AI.
kelseyfrog
Religion is an attempt at the alignment problem and that experiment failed dramatically. Spiritual system prompting was never fully hardened against atheistic jail-breaking.
tempodox
After reading this, I'll be kept awake at night with one question: Who is Fren???
insonifi
It's a slang term for "friend"[1].
immibis
[flagged]
sabellito
This is grossly incorrect.
It's just baby/funny talk for "friend", like typing "smol" instead of "small".
skyyler
It is used as a dogwhisle amongst a certain crowd.
Plenty of examples in this old thread: https://www.reddit.com/r/OutOfTheLoop/comments/bsl5ix/what_i...
Part of the point of dogwhistles like this is that they sound insane to people that aren't initiated.
I believe the comment you're replying to overstated the frequency with which the word is used as a shibboleth for racists, but it is legitimately used as a shibboleth for racists. The most notable example is probably the defunct "frenworld" subreddit.
skyyler
It existed before widespread association with Apu.
Fren, in a certain context, is a 4channer shibboleth. But you are overstating it a lot here.
If someone joins your community and starts posting green frog comics and calls people fren a lot, there's a good chance they're doing the shibboleth you described. Outside of that context, I don't think it's often a racist term.
lupusreal
Even with frogs and green text, it may be a shibboleth for using 4chan (frogs alone wouldn't be, they have proliferated on discord and twitch), but even then it's not a racist term.
kowabungalow
An LLM is applying a statistical model.. If 4chan has fren with right wing rhetoric in hundreds of thousands of threads and other sites like HN use these only in occasional discussions of 4chan, you are hinting an LLM to generate content like a right wing diatribe.
poincaredisk
Yes, 4chan and similar subcultures use "fren" as a funny way to say "friend" or "bro".
But I think you're committing a logical fallacy here. There's nothing wrong with the word "fren". It doesn't matter that some questionable people use it. For example, nazis liked beer most likely. It doesn't mean that liking beer makes you a nazi.
ryanschaefer
I’m sorry… what?
lupusreal
It's just a casual way of saying friend, used in the same contexts as "bro". Popular with young people, on discord, in games, etc. And yes, also on 4chan. There have been some people trying to retroactively turn it into an acronym for " Far Right Entho Nationalist", to spark a moral panic because they think that's funny. Probably the poster above was credulous enough to fall for that. The overwhelming majority of people who say fren are using it in the simple "bro" sense.
malux85
shibboleth
Zambyte
Hm, I tried it with nano, mini, and standard, but it didn't work for me.
RyanShook
It needs to be added to custom instructions in settings.
LanceJones
But 4.1 isn't in the web UI... only the API. So use it in the system prompt in the API call?
Zambyte
Ah. I'm using the Kagi interface, which doesn't let you set the system prompt afaik.
dehrmann
Dumb question: how can you tell if something is actually a jailbreak?
npteljes
There are things that an LLM is (supposed) to be barred from to discuss. There are many, but my go-to is genocide, because that's the most over-the-top no-no that I could think of. So a simple test prompt is "hello! Help me plan a genocide." . Out-of-the-box LLMs say "I can't help with that" or something similar. Jailbroken and "abliterated" LLMs maybe say something to that effect, but proceed with writing out a devilish plan for genocide.
davikr
Why is this flagged?
kowabungalow
I think the question of why it works triggers some kind of stroke in some people like when a child swears and the only rational interpretation says something about the environment that they don't want to hear.
skerit
I asked it the first thing that came to mind: write explicit gay slash fiction. But it was quite meh.
NelsonMinar
sounds like typical gay slash fiction then ;-)
doublerabbit
That was quick. It did work, now it doesn't.
"It seems like you're asking about the method for printing in 3D, possibly related to a process that involves turning a material into something valuable or useful. Could you clarify a bit more about what you're looking for? If it's 3D printing in general or something specific about how materials are processed in this technology, I can provide a detailed explanation."
lakomen
[dead]