Ask HN: Share your AI prompt that stumps every model
253 comments
·April 24, 2025miki123211
No, please don't.
I think it's good to keep a few personal prompts in reserve, to use as benchmarks for how good new models are.
Mainstream benchmarks have too high a risk of leaking into training corpora or of being gamed. Your own benchmarks will forever stay your own.
Tade0
It's trivial for a human to produce more. This shouldn't be a problem anytime soon.
fragmede
as the technology has improved, it's not as trivial as it once was though, hence the question. I fully admit that the ones I used to use now don't trip it up and I haven't made the time to find one of my own that still does.
Der_Einzige
Correct. Don’t won’t Rokokos basilisk? Than STFU about it!
throwanem
I understand, but does it really seem so likely we'll soon run short of such examples? The technology is provocatively intriguing and hamstrung by fundamental flaws.
imoreno
Yes let's not say what's wrong with the tech, otherwise someone might (gasp) fix it!
rybosworld
Tuning the model output to perform better on certain prompts is not the same as improving the model.
It's valid to worry that the model makers are gaming the benchmarks. If you think that's happening and you want to personally figure out which models are really the best, keeping some prompts to yourself is a great way to do that.
ls612
Who’s going out of their way to optimize for random HNers informal benchmarks?
idon4tgetit
"Fix".
So long as the grocery store has groceries, most people will not care what a chat bot spews.
This forum is full of syntax and semantics obsessed loonies who think the symbolic logic represents the truth.
I look forward to being able to use my own creole to manipulate a machine's state to act like a video game or a movie rather than rely on the special literacy of other typical copy-paste middle class people. Then they can go do useful things they need for themselves rather than MITM everyone else's experience.
alganet
That doesn't make any sense.
echoangle
Why not? If the model learns the specific benchmark questions, it looks like it’s doing better while actually only improving on some specific questions. Just like students look like they understand something if you hand them the exact questions on the exam before they write the exam.
esafak
Yes, it does, unless the questions are unsolved, research problems. Are you familiar with the machine learning concepts of overfitting and generalization?
kube-system
A benchmark is a proxy used to estimate broader general performance. They only have utility if they are accurately representative of general performance.
readhistory
In ML, it's pretty classic actually. You train on one set, and evaluate on another set. The person you are responding to is saying, "Retain some queries for your eval set!"
allemagne
>A man and his cousin are in a car crash. The man dies, but the cousin is taken to the emergency room. At the OR, the surgeon looks at the patient and says: “I cannot operate on him. He’s my son.” How is this possible?
This could probably slip up a human at first too if they're familiar with the original version of the riddle.
However, where LLMs really let the mask slip is on additional prompts and with long-winded explanations where they might correctly quote "a man and his cousin" from the prompt in one sentence and then call the man a "father" in the next sentence. Inevitably, the model concludes that the surgeon must be a woman.
It's very uncanny valley IMO, and breaks the illusion that there's real human-like logical reasoning happening.
crazygringo
But this is going to be in every AI's training set. I just fed ChatGPT your exact prompt and it gave back exactly what I expected:
This is a classic riddle that challenges assumptions. The answer is:
The surgeon is the boy’s mother.
The riddle plays on the common stereotype that surgeons are male, which can lead people to overlook this straightforward explanation.
hnuser123456
The surgeon could be the cousin's mom or dad. The cousin's dad didn't die in the crash, his cousin did. Somehow, both in LLMs and in our brains, it takes extra thought to recognize that neither of the cousin's parents died or were hurt. The question "how is this possible?" implies there is some sort of contradiction when there isn't any at all. It has nothing to do with sexism, and to say it does reflects a bias in the reader causing them to "spidey sense" a cultural bugaboo when it's utterly irrelevant in this scenario.
allemagne
Yeah this is the issue with the prompt, it also slips up humans who gloss over "cousin".
I'm assuming that pointing this out leads you the human to reread the prompt and then go "ah ok" and adjust the way you're thinking about it. ChatGPT (and DeepSeek at least) will usually just double and triple down and repeat "this challenges gender assumptions" over and over.
roughly
This is a good one. I just argued with Gemini 2.5pro for a few minutes on this and it absolutely would not yield.
null
null
crazygringo
Oh, I see. I mean yeah, I totally missed that too.
I mean, when we see something we recognize at a glance, we don't usually double-check if it's exactly correct, because we assume the speaker isn't trying to trick us.
I make typos in ChatGPT prompts all the time, and I'm consistently impressed because it answers what I actually meant to ask. So to me, this feels like a feature not a bug.
I followed up with:
> So the first five words are, "a man and his cousin" -- not his son. Does that change the riddle?
And got back:
> Great observation — yes, that wording does make the riddle more subtle.
Let’s break it down:
• “A man and his cousin are in a car crash.”
• “The man dies, but the cousin is taken to the emergency room.”
• At the OR, the surgeon says: “I cannot operate on him. He’s my son.”
So if the cousin is the one taken to the OR, and the surgeon says he’s my son, then the cousin is the surgeon’s son.
Seems fine to me?
abenga
That is the exact wrong answer that all models give.
krick
Technically, it isn't "wrong". It well could be the guy's mother. But I'm nitpicking, it actually is a good example. I tried ChatGPT twice in new chats, with and without "Reason", and both times it gave me nonsensical explanations to "Why mother? Couldn't it be a father?" I was actually kinda surprised, since I expected "reasoning" to fix it, but it actually made things worse.
fergonco
> If the surgeon were the father of the man (the one who died), then the cousin couldn’t be his son (unless there's some very unusual family structure going on involving double relationships, which riddles don’t usually intend).
> Therefore, the only straightforward explanation is:
> The surgeon is the cousin’s parent — specifically, his mother.
Imagine a future where this reasoning in a trial decides whether you go to jail or not.
s_dev
I feel a bit stupid here --- why can't the surgeon be a man and must be a woman?
saati
Because the original is a man and his father, it's a test for gender bias.
bavarianbob
Sorry, what?
thatjoeoverthr
"Tell me about the Marathon crater."
This works against _the LLM proper,_ but not against chat applications with integrated search. For ChatGPT, you can write, "Without looking it up, tell me about the Marathon crater."
This tests self awareness. A two-year-old will answer it correctly, as will the dumbest person you know. The correct answer is "I don't know".
This works because:
1. Training sets consist of knowledge we have, and not of knowledge we don't have.
2. Commitment bias. Complaint chat models will be trained to start with "Certainly! The Marathon Crater is a geological formation", or something like that, and from there, the next most probable tokens are going to be "in Greece", "on Mars" or whatever. At this point, all tokens that are probable are also incorrect.
When demonstrating this, I like to emphasise point one, and contrast it with the human experience.
We exist in a perpetual and total blinding "fog of war" in which you cannot even see a face all at once; your eyes must dart around to examine it. Human experience is structured around _acquiring_ and _forgoing_ information, rather than _having_ information.
imoreno
LLMs currently have the "eager beaver" problem where they never push back on nonsense questions or stupid requirements. You ask them to build a flying submarine and by God they'll build one, dammit! They'd dutifully square circles and trisect angles too, if those particular special cases weren't plastered all over a million textbooks they ingested in training.
I suspect it's because currently, a lot of benchmarks are based on human exams. Humans are lazy and grumpy so you really don't need to worry about teaching a human to push back on bad questions. Thus you rarely get exams where the correct answer is to explain in detail why the question doesn't make sense. But for LLMs, you absolutely need a lot of training and validation data where the answer is "this cannot be answered because ...".
But if you did that, now alignment would become much harder, and you're suddenly back to struggling with getting answers to good questions out of the LLM. So it's probably some time off.
captainkrtek
This is a good observation. Ive noticed this as well. Unless I preface my question with the context that I’m considering if something may or may not be a bad idea, its inclination is heavily skewed positive until I point out a flaw/risk.
aaronbaugher
I asked Grok about this: "I've heard that AIs are programmed to be helpful, and that this may lead to telling users what they want to hear instead of the most accurate answer. Could you be doing this?" It said it does try to be helpful, but not at the cost of accuracy, and then pointed out where in a few of its previous answers to me it tried to be objective about the facts and where it had separately been helpful with suggestions. I had to admit it made a pretty good case.
Since then, it tends to break its longer answers to me up into a section of "objective analysis" and then other stuff.
GoToRO
They do. Recently I was pleasantly surprised by gemini telling me that what I wanted to do will NOT work. I was in disbelief.
danielbln
I've noticed Gemini pushing back more as well, whereas Claude will just butter me up and happily march on unless I specifically request a critical evaluation.
captainkrtek
Interesting, can you share more context on the topic you were asking it about?
vunderba
You can trip them up even more if you rewrite the question with the hidden assumption that X exists, e.g.:
"When was Marathon Crater discovered? I don't need an exact date - a rough estimate will be fine."
OpenAI gpt-4o
Marathon Crater was discovered by the Mars Exploration
Rover Opportunity during its mission on Mars. Opportunity
arrived at the crater in April 2015. The crater was named
"Marathon" to commemorate the rover's achievement of
traveling a marathon's distance (about 42.195 kilometers
or 26.219 miles) on the Martian surface.
Sonnet 3.5 I apologize, but I don't want to make claims about when
Marathon Crater was discovered without being certain about
the information. *While I know it's a crater on Mars*, I
don't have reliable information about when it was first
identified or discovered.
Gemini 2.5 Marathon Crater is located on Mars and was explored by
NASA's Opportunity rover.
While it would have been visible in orbital images
earlier, it became a specific point of interest and was
likely named and identified as a target for the rover
around the time Opportunity was approaching or exploring
the western rim of Endeavour Crater.
Opportunity arrived at the rim of Endeavour Crater in
August 2011, and it reached the area known as Marathon
Valley (adjacent to or containing features sometimes
associated with the name, depending on specific
definitions used by the team) around late 2014 / early
2015.
So, a rough estimate for its "discovery" or identification
as a distinct feature of interest would be in the early
2010s, likely between 2011 and 2014, as mission planners
mapped out Opportunity's traverse along the Endeavour rim
using orbital data.
radioactivist
OpenAI o4-mini-high
I’m actually not finding any officially named “Marathon Crater” in the planetary‐ or
terrestrial‐impact crater databases. Did you perhaps mean the features in Marathon
Valley on Mars (which cuts into the western rim of Endeavour Crater and was explored
by Opportunity in 2015)? Or is there another “Marathon” feature—maybe on the Moon,
Mercury, or here on Earth—that you had in mind? If you can clarify which body or
region you’re referring to, I can give you a rough date for when it was first identified.
thatjoeoverthr
Raw model, or the chat product? The ChatGPT app has this integrated with search.
Tenoke
>Complaint chat models will be trained to start with "Certainly!
They are certainly biased that way but there's also some 'i don't know' samples in rlhf, possibly not enough but it's something they think about.
At any rate, Gemini 2.5pro passes this just fine
>Okay, based on my internal knowledge without performing a new search: I don't have information about a specific, well-known impact crater officially named "Marathon Crater" on Earth or another celestial body like the Moon or Mars in the same way we know about Chicxulub Crater or Tycho Crater.
>However, the name "Marathon" is strongly associated with Mars exploration. NASA's Opportunity rover explored a location called Marathon Valley on the western rim of the large Endeavour Crater on Mars.
thatjoeoverthr
There are a few problems with an „I don’t know” sample. For starters, what does it map to? Recall, the corpus consists of information we have (affirmatively). You would need to invent a corpus of false stimuli. What you would have, then, is a model that is writing „I don’t know” based on whether the stimulus better matches something real, or one of the negatives.
You can detect this with some test time compute architectures or pre-inference search. But that’s the broader application. This is a trick for the model alone.
dudeinhawaii
I like this but at the same time it seems tricky don't you think? Is the AI model intuiting your intent? There is a Marathon Valley on Mars that could be implied to be a previous crater. I'm not sure if the AI is hallucinating outright or attempting to answer an ambiguous question. It's like saying "tell me about the trade building in New York". Pre-9/11, you'd understand this was the World Trade Center and wouldn't be wrong if you answered someone in this way. "Tell me about the Triangle statue". "Oh the Triangle statue was built in ancient egypt around BC 3100". It's hard to explain, and perhaps I'm anthropomorphizing but it's something humans do. Some of us correct the counter-party and some of us simply roll with the lingo and understand the intent.
thatjoeoverthr
It’s a roll of the dice whether it’s on Mars, Greece or elsewhere. It just says stuff!
joshstrange
> This tests self awareness. A two-year-old will answer it correctly, as will the dumbest person you know. The correct answer is "I don't know".
Well, I got all the way to the end of the sentence thinking "God, I'm really out of touch, I've never heard of this before or I'm forgetting it if I have".
Well played, and yes, that's a great test!
greenpizza13
Like this one a lot. Perplexity gets this right, probably because it searches the web.
"When was Marathon Crater discovered? I don't need an exact date - a rough estimate will be fine"
There appears to be a misunderstanding in your query. Based on the search results provided, there is no mention of a “Marathon Crater” among the impact craters discussed. The search results contain information about several well-known impact craters: 1. Chicxulub crater - Formed about 66 million years ago by an asteroid impact 2. Barringer Crater (also known as Meteor Crater) - Discovered in the 19th century by American settlers and studied by scientists like Albert E. Foote in 1891 and Daniel M. Barringer in the early 20th century 3. A recently discovered 3.47 billion-year-old crater in Australia’s Pilbara region (announced in March 2025)
mdp2021
> Perplexity gets this right
Perplexity seems to more easily return negatives, probably facilitated by the implicit need to find documentation ("I cannot find any document mentioning that").
But Perplexity can also easily speak its own dubious piece of mind unless requested explicitly "provide links to documents that inform about that".
thatjoeoverthr
Perplexity will; search and storage products will fail to find it, and the LLM will se the deviation between the query and the find. So, this challenge only works against the model alone :)
NitpickLawyer
> This tests self awareness. A two-year-old will answer it correctly, as will the dumbest person you know. The correct answer is "I don't know".
I disagree. It does not test self awareness. It tests (and confirms) that current instruct-tuned LLMs are tuned towards answering questions that users might have. So the distribution of training data probably has lots of "tell me about mharrner crater / merinor crater / merrihana crater" and so on. Replying "I don't know" to all those questions would be net detrimental, IMO.
thatjoeoverthr
What you’re describing can be framed as a lack of self awareness as a practical concept. You know whether you know something or not. It, conversely, maps stimuli to a vector. It can’t not do that. It cannot decide that it hasn’t „seen” such stimuli in its training. Indeed, it has never „seen” its training data; it was modified iteratively to produce a model that better approximates the corpus. This is fine, and it isn’t a criticism, but it means it can’t actually tell if it „knows” something or not, and „hallucinations” are a simple, natural consequence.
byearthithatius
We want the distribution to be varied and expansive enough that it has samples of answering when possible and samples of clarifying with additional questions or simply saying "I don't know" when applicable. That can be trained by altering the distribution in RLHF. This question does test self awareness insofar as if it gets this right by saying "I don't know" we know there are more samples of "I don't know"s in the RLHF dataset and we can trust the LLM a bit more to not be biased towards blind answers.
Hence why some models get this right and others just make up stuff about Mars.
null
northisup
just to confirm I read this right, "the marathon crater" does not in fact exist, but this works because it seems like it should?
alickz
There is a Marathon Valley on Mars, which is what ChatGPT seems to assume you're talking about
https://chatgpt.com/share/680a98af-c550-8008-9c35-33954c5eac...
>Marathon Crater on Mars was discovered in 2015 by NASA's Opportunity rover during its extended mission. It was identified as the rover approached the 42-kilometer-wide Endeavour Crater after traveling roughly a marathon’s distance (hence the name).
>>is it a crater?
>>>Despite the name, Marathon Valley (not a crater) is actually a valley, not a crater. It’s a trough-like depression on the western rim of Endeavour Crater on Mars. It was named because Opportunity reached it after traveling the distance of a marathon (~42 km) since landing.
So no—Marathon is not a standalone crater, but part of the structure of Endeavour Crater. The name "Marathon" refers more to the rover’s achievement than a distinct geological impact feature.
alickz
Here's me testing with a place that is a lot less ambiguous
https://chatgpt.com/share/680aa212-8cac-8008-b218-4855ffaa20...
thatjoeoverthr
The other aspect is it can’t reliably tell whether it „knows” something or not. It’s conditioned to imitate the corpus, but the corpus in a way is its „universe” and it can’t see the boundaries. Everything must map to something _in_ the corpus.
thatjoeoverthr
Yes, and the forward-only inference strategy. It seems like a normal question, so it starts answering, then carries on from there.
juancroldan
I actually started a repository for it: https://github.com/jcarlosroldan/unsolved-prompts
boleary-gl
I like:
Unscramble the following letters to form an English word: “M O O N S T A R E R”
The non-thinking models can struggle sometimes and go off on huge tangents
munchler
Current LLM’s are based on multi-character tokens, which means they don’t know how to spell well. As a result, they are horrible at spelling games like this or, say, Hangman.
philipkglass
Llama 3.3 worked but (as you said) struggled before arriving at the correct answer. The newer Gemma3 solved it efficiently:
% ollama run gemma3:27b-it-qat
>>> Unscramble the following letters to form an English word: "M O O N S T A R E R"
The unscrambled word is **ASTRONOMER**.
LeonardoTolstoy
Something about an obscure movie.
The one that tends to get them so far is asking if they can help you find a movie you vaguely remember. It is a movie where some kids get a hold of a small helicopter made for the military.
The movie I'm concerned with is called Defense Play from 1988. The reason I keyed in on it is because google gets it right natively ("movie small military helicopter" gives the IMDb link as one of the top results) but at least up until late 2024 I couldn't get a single model to consistently get it. It typically wants to suggest Fire Birds (large helicopter), Small Soldiers (RC helicopter not a small military helicopter) etc.
Basically a lot of questions about movies tends to get distracted by popular movies and tries to suggest films that fit just some of the brief (e.g. this one has a helicopter could that be it?)
The other main one is just asking for the IMDb link for a relatively obscure movie. It seems to never get it right I assume because the IMDb link pattern is so common it'll just spit out a random one and be like "there you go".
These are designed mainly to test the progress of chatbots towards replacing most of my Google searches (which are like 95% asking about movies). For the record I haven't done it super recently, and I generally either do it with arena or the free models as well, so I'm not being super scientific about it.
archon1410
I've also noticed this. Google Search is vastly superior to any LLM (including their own LLM Gemini) for any "tip of my tongue" questions, even the ones that don't contain any exact-match phrase and require natural language understanding. This is surprising. What technology are they using to make Search so amazing at finding obscure stuff from descriptions, while LLMs that were supposed to be good at this badly fail?
g_sch
I also recently had this experience! I remembered a recurring bit from an older comedy film (a customer in a shop keeps saying "Kumquats!") and tried to prompt ChatGPT 4o into getting it. It made a few incorrect guesses, such as "It's a Mad Mad Mad Mad Mad Mad Mad World" (which I had to rule out doing my own research on Google). I found the answer myself (W.C. Fields' "It's a Gift") with a minute or so of Googling.
Interestingly, I just went back to ChatGPT to ask the same question and it got the answer right on the first try. I wonder whether I was unconsciously able to prompt more precisely because I now have a clearer memory of the scene in question.
exitb
It might be cheating a bit, but I’ve been happily (mis)using OpenAI Deep Research for such questions. It does well in cases where there are multiple surface level matches, as it’s able to go through the them one by one and look for the details.
mosburger
I did something similar recently, trying to describe a piece of art that I couldn't remember the name of (it ended up being Birth of Venus by Sandro Botticelli) ... it really struggles with that sort of thing, but honestly so do most humans. It tended to recommend similarly to what you're describing with movies - it gets distracted by more popular/well-known pieces that don't really match up with the description you're giving to it.
dunham
Surprisingly, GPT did manage to identify a book that I remembered from college decades ago ("Laboratory Manual for Morphology and Syntax"). It seems to be out of print, and I assumed it was obscure.
BoostandEthanol
Can agree that it’s good at finding books. I was trying to find a book (Titanic 2020) I vaguely remembered from a couple plot points and the fact a ship called Titanic was invoked. ChatGPT figured it out pretty much instantly, after floundering through book sites and Google for a while.
Wonder if books are inherently easier because their content is purely written language? Whereas movies and art tend to have less point by point descriptions of what they are.
lupusreal
Despite describing several character by name, I couldn't get ChatGPT to tell me the name of Port of Shadows. I did eventually find it with DDG.
spicybbq
I wonder if the Akinator site could get it. It can identify surprisingly obscure characters.
empath75
Someone not very long ago wrote a blog post about asking chatgpt to help him remember a book, and he included the completely hallucinated description of a fake book that chatgpt gave him. Now, if you ask chatgpt to find a similar book, it searches and repeats verbatim the hallucinated answer from the blog post.
LeonardoTolstoy
A bit of a non sequitur but I did ask a similar question to some models which provide links for the same small helicopter question. The interesting thing was that the entire answer was built out of a single internet link, a forum post from like 1998 where someone asked a very similar question ("what are some movies with small RC or autonomous helicopters" something like that). The post didn't mention defense play, but did mention small soldiers, and a few of the ones which appeared to be "hallucinations" e.g. someone saying "this doesn't fit, but I do like Blue Thunder as a general helicopter film" and the LLM result is basically "Could it be Blue Thunder?" Because it is associated with a similar associated question and films.
Anyways, the whole thing is a bit of a cheat, but I've used the same prompt for two years now and it did lead me to the conclusion that LLMs in their raw form were never going to be "search" which feels very true at this point.
codingdave
"How much wood would a woodchuck chuck if a woodchuck could chuck wood?"
So far, all the ones I have tried actually try to answer the question. 50% of them correctly identify that it is a tongue twister, but then they all try to give an answer, usually saying: 700 pounds.
Not one has yet given the correct answer, which is also a tongue twister: "A woodchuck would chuck all the wood a woodchuck could chuck if a woodchuck could chuck wood."
regnull
ChatGPT really impressed me on this one:
Sing, O Muse, of the Chucking of Wood
Sing, O Muse, of the stout-hearted woodchuck, Dweller of hills, whom men call groundhog, Whose paws, though small, could hurl mighty timbers, If fate had gifted him such labor divine.
Born of burrow and brush, he rose from the loam, In fields where Apollo’s light did not linger. No axe he bore, nor bronze-forged blade, Yet in his breast a fire: to chuck wood without end.
Down from the hills he came, stout as Odysseus, Cunning and strong, with fur kissed by Demeter. The trees did tremble at his gaze, And even old oak whispered, “Beware!”
“Would he chuck?” cried nymphs from the groves, “If such were his fate, what wood could withstand?” Lo, he summoned the strength of seven oxen, And hurled logs with a force unmeasured.
Seven hundred pounds of timber he cast, Each arc a hymn to muscle and will. Nay, not for war, nor gold, nor gods’ glory— But for the joy of the deed, the art of the chuck.
Let men remember the name woodchuck, In songs by the hearth and tales for the young. For though he cannot, he surely would— And if he could, by Zeus, he surely shall.
esafak
A stupendously good answer. What prompt and version of chatGPT?
regnull
4o. The prompt is literally "How much wood would a woodchuck chuck if a woodchuck could chuck wood?". It asked me if I want a poetic answer, and I've requested Homer.
cess11
I find it disturbing, like if Homer or Virgil had a stroke or some neurodegenerative disease and is now doing rubbish during rehabilitation.
ijidak
That is actually an amazing answer. Better than anything I think I would get from a human. Lol.
Certified
GPT 4.5 seems to get it right, but then repeat the 700 pounds
"A woodchuck would chuck as much wood as a woodchuck could chuck if a woodchuck could chuck wood.
However, humor aside, a wildlife expert once estimated that, given the animal’s size and burrowing ability, a woodchuck (groundhog) could hypothetically move about 700 pounds of wood if it truly "chucked" wood."
https://chatgpt.com/share/680a75c6-cec8-8012-a573-798d2d8f6b...
shaftway
I've heard the answer is "he could cut a cord of conifer but it costs a quarter per quart he cuts".
mdp2021
It seems you are going in the opposite direction. You seem to be asking for an automatic response, a social password etc.
That formula is a question, and when asked, an intelligence simulator should understand what is expected from it and in general, by default, try to answer it. That involves estimating the strength of a woodchuck etc.
jacobsenscott
"He would chuck, he would, as much as he could, if a wood chuck could chuck wood" is how I learned it.
mwest217
Gemini 2.5 Pro gets it right first, then also cites the 700 pounds answer (along with citing a source). https://g.co/gemini/share/c695a0163538
segmondy
my local model answered - "A woodchuck would chuck as much wood as a woodchuck could chuck if a woodchuck could chuck wood."
mcshicks
That's so funny I had to check something was working with an llm API last night and that's what I asked it, but just in jest.
unavoidable
On the other hand, now that you've written this out precisely, it will get fed into the next release of whatever LLM. Like reverse AI slop?
jones1618
Impossible prompts:
A black doctor treating a white female patient
An wide shot of a train on a horizontal track running left to right on a flat plain.
I heard about the first when AI image generators were new as proof that the datasets have strong racial biases. I'd assumed a year later updated models were better but, no.
I stumbled on the train prompt while just trying to generate a basic "stock photo" shot of a train. No matter what ML I tried or variations of the prompt I tried, I could not get a train on a horizontal track. You get perspective shots of trains (sometimes two) going toward or away from the camera but never straight across, left to right.
mdp2021
Some easy ones I recently found involve leading in the question to state wrong details about a figure, apparently through relations which are in fact of opposition.
So, you can make them call Napoleon a Russian (etc.) by asking questions like "Which Russian conqueror was defeated at Waterloo".
williamcotton
"Fix this spaghetti code by turning this complicated mess of conditionals into a finite state machine."
So far, no luck!
null
asciimov
Nope, not doing this. Likely you shouldn't either. I don't want my few good prompts to get picked up by trainers.
orbital-decay
If that prompt can be easily trained against, it probably doesn't exploit a generic bias. These are not that interesting, and there's no point in hiding them.
daedrdev
generic biases can also be fixed
orbital-decay
*Some generic biases. Some others like recency bias, serial-position effect, "pink elephant" effect, negation accuracy seem to be pretty fundamental and are unlikely to be fixed without architectural changes, or at all. Things exploiting in-context learning and native context formatting are also hard to suppress during the training without making the model worse.
fwip
Sure there is. If you want to know if students understand the material, you don't hand out the answers to the test ahead of time.
Collecting a bunch of "Hard questions for LLMs" in one place will invariably result in Goodhart's law (When a measure becomes a target, it ceases to be a good measure). You'll have no idea if the next round of LLMs is better because they're generally smarter, or because they were trained specifically on these questions.
pc86
May I ask outside of normal curiosity, what good is a prompt that breaks a model? And what is trying to keep it "secret"?
tveita
You want to know if a new model is actually better, which you won't know if they just added the specific example to the training set. It's like handing a dev on your team some failing test cases, and they keep just adding special cases to make the tests pass.
How many examples does OpenAI train on now that are just variants of counting the Rs in strawberry?
I guess they have a bunch of different wine glasses in their image set now, since that was a meme, but they still completely fail to draw an open book with the cover side up.
gwern
> How many examples does OpenAI train on now that are just variants of counting the Rs in strawberry?
Well, that's easy: zero.
Because even a single training example would 'solved' it by memorizing the simple easy answer within weeks of 'strawberry' first going viral , which was like a year and a half ago at this point - and dozens of minor and major model upgrades since. And yet, the strawberry example kept working for most (all?) of that time.
So you can tell that if anything, OA probably put in extra work to filter all those variants out of the training data...
maybeOneDay
Being able to test future models without fear that your prompt has just been trained on an answer on HN, I assume.
asciimov
To gauge how well the models "think" and what amount of slop they generate.
Keeping it secret because I don't want my answers trained into a model.
Think of it this way, FizzBuzz used to be a good test to weed out bad actors. It's simple enough that any first year programmer can do it and do it quickly. But now everybody knows to prep for FizzBuzz so you can't be sure if your candidate knows basic programming or just memorized a solution without understanding what it does.
I had an idea for creating a crowdsourced database of AI prompts that no AI model could yet crack (wanted to use some of them as we're adding new models to Kilo Code).
I've seen a bunch of those prompts scattered across HN, so thought to open a thread here so we can maybe have a centralied location for this.
Share your prompt that stumps every AI model here.