article

OpenAI’s Rare Paper Exposes Why AI Hallucinates

4 min read

AI’s most notorious bug is not a system crash—but hallucinations: when a model confidently fabricates information, blurring the line between fact and fiction. This persistent challenge has long been the barrier to fully trusting AI systems.

OpenAI has now released a rare research paper that takes a systematic look at the root causes of hallucinations in large language models (LLMs). The paper is titled “Why Language Models Hallucinate” and can be accessed here:
👉 Why Language Models Hallucinate (PDF)

Image


What Are Hallucinations?

OpenAI defines hallucinations simply as:

“When a model confidently generates an untrue answer.”

Hallucinations are not limited to complex problems. Even seemingly simple queries can trigger them. For example, when asked about the PhD thesis title of Adam Tauman Kalai (the paper’s first author), several widely used chatbots confidently gave three different answers—none of which were correct.

Image

Similarly, when asked about his birthday, the models provided three different—but all wrong—dates.

Image


Why Do They Happen?

The paper argues that hallucinations persist largely because current training and evaluation methods create the wrong incentives.

Consider a multiple-choice exam:

This is why models often opt for confident but wrong answers rather than safe disclaimers.


The Evaluation Trap

OpenAI emphasizes that evaluation metrics are a key driver of hallucinations.

Image
Image

OpenAI argues that instead of rewarding only accuracy, evaluations should penalize confident errors more heavily and grant partial credit for admitting uncertainty. This aligns with OpenAI’s value of humility.


How Hallucinations Arise from Next-Word Prediction

Language models learn via pretraining on vast amounts of text by predicting the next word. Unlike typical supervised learning, there are no explicit “true/false” labels.

This statistical mismatch explains why hallucinations emerge: LLMs excel at fluency but stumble on low-frequency factual knowledge.


Key Misconceptions, Debunked

OpenAI’s analysis challenges several common beliefs:


Toward Better Evaluation

OpenAI suggests a straightforward fix:

This approach echoes standardized tests that use negative marking for wrong answers or partial credit for blanks, discouraging blind guessing.

Adopting such metrics at scale could drive the adoption of techniques that reduce hallucinations—not just in OpenAI models, but across the entire field.


Broader Context: Organizational Shifts at OpenAI

Alongside the paper, TechCrunch reports that OpenAI is reorganizing its Model Behavior team—the group responsible for shaping how AI systems interact with people. The team will now report to Max Schwarzer, OpenAI’s post-training lead.

Meanwhile, founding head Joanne Jang announced a new initiative: oai Labs, a research-focused group dedicated to inventing and prototyping new ways for humans and AI to collaborate.

Image


References