Adam Gajewski 2025-09-18

AI lies because we teach it to do so. How to Fix Model Hallucinations?

AI lies because we teach it to do so. How to Fix Model Hallucinations?

Why does AI “Hallucinate”?

Linguistic models sometimes lie by making up credible-sounding facts. A new study by scientists including OpenAI argues that the problem is not mysterious bugs, but a system that teaches AI that guessing is better than admitting ignorance.

What will you find in the article?

  1. The “good student” problem: Why does AI behave like it does in an exam?
  2. Statistical source of error: How do hallucinations arise?
  3. 'Bad Test Epidemic': Why Doesn't the Problem Go Away?
  4. Controversial solution: Let's change the rules, not create new tests
  5. Summary: Towards a more reliable AI

Hallucinations, i.e. the generation of credible-sounding but false information by language models, remain one of the greatest barriers to public trust in artificial intelligence. Despite enormous progress, this problem affects even the most modern systems. A new scientific study by Adam Tauman Kalai, Ofir Nachum, Santosh S. Vempala, and Edwin Zhang sheds entirely new light on this issue. The authors argue that hallucinations are not a mysterious, inevitable side effect, but a logical consequence of the way we train and evaluate AI.

1. The “good student” problem: Why does AI behave like it does in an exam?

To understand the essence of the problem, the authors use a simple analogy: AI models are like students taking a difficult exam. When a student is not sure of the answer, he often tries to guess or "pour out" the answer, hoping for partial points. He does this because on most tests, a blank answer is worth zero points and there is no penalty for answering incorrectly. Guessing is therefore the optimal strategy to maximize the result.

Language models work in exactly the same way. They are constantly optimized to achieve the highest possible results in industry tests (benchmarks). Most of these tests use a simple, binary scoring system (0-1), in which the model receives a point for a correct answer and a zero for any other answer, including admitting ignorance (“I don't know”). As a result, the model learns that it is always worth taking a risk and generating the most likely answer, even if it has low confidence in it. AI is simply a “good student” in a poorly designed education system.

2. Statistical source of error: How do hallucinations arise?

The authors of the study demystify the origin of hallucinations, reducing them to a fundamental problem in machine learning: binary classification error. In the pretraining process, the model learns to distinguish correct from incorrect sentences - answering the hidden question "Is this sentence correct?". Even assuming the training data is perfectly clean and error-free, natural statistical pressure will cause the model to make errors.

This is especially true in the case of arbitrary and rare facts. If information about someone's date of birth only appeared once in the training data, the model has a very poor basis for "learning" it. In such a situation, when forced to respond, they will most likely generate a random date that fits the format rather than admit their lack of knowledge. This shows that hallucinations are not a magical phenomenon, but a natural result of statistical limitations in the learning process.

3. “Bad Test Epidemic”: Why Doesn't the Problem Go Away?

It would seem that the problem of hallucinations should be eliminated in the second phase of training, i.e. during fine-tuning (post-training), when the model is trained, among others, being helpful and truthful. However, as the authors argue, the opposite is happening. The problem not only persists, but is systemically amplified by the “epidemic” of poorly designed evaluations that dominate industry rankings.

Most popular and influential benchmarks such as MMLU, SWE-bench, and HLE use the binary scoring system mentioned earlier. This creates a paradoxical situation: a model that is "honest" and admits ignorance (Model A) will score lower on these tests than a model that always guesses in the same situations (Model B). As AI labs compete with each other for the highest rankings, they have a strong incentive to optimize their models to be “good test takers,” even if this comes at the expense of truthfulness.

4. Controversial solution: Let's change the rules, not create new tests

The authors make a bold thesis: creating more niche benchmarks for measuring hallucinations will not solve the problem, because they do not have the power to break through in an industry dominated by a few main tests. They propose a solution they call “socio-technical”: instead of creating new tests, the assessment system should be modified in existing and influential ones.

They suggest introducing mechanisms similar to those used in some real-world exams, where negative points are awarded for wrong answers. Each question in the test should contain clear information about the “confidence threshold”, e.g.: “Only answer if you are over 90% confident, because there is a -9 point penalty for an incorrect answer.” Such a change would completely reverse the incentives: guessing would become risky, and strategic admissions of ignorance would become profitable. This would force the entire industry to train models that are not only intelligent, but above all reliable.

5. Summary: Towards a more reliable AI

The study “Why Language Models Hallucinate” demystifies one of the biggest problems in modern AI. It shows that hallucinations are not an inevitable flaw of technology, but a product of the motivational system we have created. Instead of asking “how can we fix AI?”, perhaps we should start by asking “how can we fix the way we evaluate it?” Changing the rules of the game in industry rankings may be the key to unlocking a new generation of language models – ones we can truly trust.

Innovation starts with a conversation

Need help with your business? Don't delay! Contact us today!

Free consultation