LLMs as Unbiased Oracles
10 comments
·May 4, 2025bluefirebrand
MarcoDewey
You are correct that the notion of LLMs being completely unbiased or neutral does not make sense due to how they are trained. Perhaps my title is even misleading if taken at face value.
When I talk about "unbiased oracles" I am speaking in the context of black box testing. I'm not suggesting they are free from all forms of bias. Instead, the key distinction I'm trying to draw is their lack of implementation-level bias towards the specific code they are testing.
Jensson
> An LLM, specifically trained for test generation, consumes this specification. Its objective is to generate a diverse and comprehensive test suite that probes the specified behavior from an external perspective.
If one of these tests are wrong though it will ruin the whole thing. And LLM are much more likely to make a math error (which would result in a faulty test) than to implement a math function the wrong way, so this probably wont make it better at generating code.
TazeTSchnitzel
Is this a blogpost that's incomplete or a barely disguised ad?
brahyam
The amount of time it would take to write the formal spec for the code I need is more than it would take to generate the code so doesn't sound like something that will go mainstream. Except for those industries where formal code specs are already in place.
MarcoDewey
Yes, this test-driven approach will likely increase generation time upfront. However, the payoff is more reliable code being generated. This will lead to less debugging and fewer reprompts overall, which saves time in the long run.
Also agree on the specification formality. Even a less formal spec provides a clearer boundary for the LLM during code generation, which should improve code generation results.
satisfice
If your premises and assumptions are sufficiently corrupted, you can come to any conclusion and believe you are being rational. Like those dreams where you walk around without pants on and you are more worried about not having pants than you are about how it could have come to be that your pants kept going missing. Your brain is not present enough to find the root of the problem.
An LLM is not unbiased, and you would know that if you tested LLMs.
Apart from biases, an LLM is not a reliable oracle, you would know that if you tested LLMs.
The reliabilities and unreliabilities of LLMs vary in discontinuous and unpredictable ways from task to task, model to model, and within the same model over time. You would know this if you tested LLMs. I have. Why haven’t you?
Ideas like this are promoted by people who don’t like testing, and don’t respect it. That explains why a concept like this is treated as equivalent to a tested fact. There is a name for it: wishful thinking.
walterbell
> wishful thinking
Given the economic component of LLM wishes, we can look at prior instances of wishing-at-scale, https://en.wikipedia.org/wiki/Tulip_mania
troupo
There's a more recent one: https://blog.mollywhite.net/blockchain/
neuroelectron
Yeah that would be cool
LLMs are absolutely biased
They are biased by the training dataset, which probably also reflects the biases of the people who select the training dataset
They are biased by the system prompts that are embedded into every request to keep them on the rails
They are even biased by the prompt that you write into them, which can lead them to incorrect conclusions if you design the prompt to lead them to it
I think it is a very careless mistake to think of LLMs as unbiased or neutral in any way