Skip to main content

OpenAI’s recently unveiled o3 and o4-mini AI models are being praised for their advanced reasoning and impressive performance in tasks like coding and math — but they are also drawing attention for an unexpected weakness: an increase in hallucinations.


AI Hallucinations Still Unsolved — And Getting Worse?

Hallucination, where an AI model generates false or misleading information, remains one of the biggest unsolved issues in the development of Large Language Models (LLMs). Typically, each new generation of AI systems improves slightly in this regard, but OpenAI’s o3 and o4-mini have unexpectedly shown higher hallucination rates compared to their predecessors.

According to OpenAI’s internal evaluation, the o3 model produced hallucinated answers to around 33% of questions in the PersonQA benchmark, while older models like o1 and o3-mini had much lower rates of 16% and 14.8% respectively.


O4-mini: Even Higher Hallucination Rates

The situation is even more concerning for o4-mini, which hallucinated in almost 50% of its responses. Independent researchers at Transluce AI Lab confirmed similar results, noting that o3 sometimes fabricated non-existent actions, such as claiming it had “executed code on a 2021 MacBook Pro” — an impossible claim for an AI language model.

READ MORE: Nothing Phone (3) Set for Q3 2025 Launch — Promises Breakthrough AI Features and Enhanced User Experience


Why Are Hallucinations Increasing?

AI experts, including Neil Chowdhury from Transluce, suggest the problem could stem from the way these models are fine-tuned using Reinforcement Learning. Although o3 and o4-mini deliver great improvements in technical reasoning and programming workflows, their tendency to fabricate details or produce broken links has raised serious concerns about their real-world reliability.


Web Search Integration: A Path to Higher Accuracy?

One potential solution for reducing hallucination is to integrate web search capabilities. OpenAI’s GPT-4o model, when paired with live web search, achieves up to 90% accuracy in tests like SimpleQA, highlighting the importance of real-time information access for reducing misinformation.

While hallucinations can sometimes produce creative or novel responses, they remain a major obstacle for professional use in fields like law, healthcare, and business, where factual accuracy is non-negotiable.

Leave a Reply