OpenAI’s New o3 and o4-mini AI Models Show Increased Hallucination Rates Despite Advanced Reasoning

OpenAI’s recently unveiled o3 and o4-mini AI models are being praised for their advanced reasoning and impressive performance in tasks like coding and math — but they are also drawing attention for an unexpected weakness: an increase in hallucinations.

AI Hallucinations Still Unsolved — And Getting Worse?

Hallucination, where an AI model generates false or misleading information, remains one of the biggest unsolved issues in the development of Large Language Models (LLMs). Typically, each new generation of AI systems improves slightly in this regard, but OpenAI’s o3 and o4-mini have unexpectedly shown higher hallucination rates compared to their predecessors.

According to OpenAI’s internal evaluation, the o3 model produced hallucinated answers to around 33% of questions in the PersonQA benchmark, while older models like o1 and o3-mini had much lower rates of 16% and 14.8% respectively.

O4-mini: Even Higher Hallucination Rates

The situation is even more concerning for o4-mini, which hallucinated in almost 50% of its responses. Independent researchers at Transluce AI Lab confirmed similar results, noting that o3 sometimes fabricated non-existent actions, such as claiming it had “executed code on a 2021 MacBook Pro” — an impossible claim for an AI language model.

Why Are Hallucinations Increasing?

AI experts, including Neil Chowdhury from Transluce, suggest the problem could stem from the way these models are fine-tuned using Reinforcement Learning. Although o3 and o4-mini deliver great improvements in technical reasoning and programming workflows, their tendency to fabricate details or produce broken links has raised serious concerns about their real-world reliability.

Web Search Integration: A Path to Higher Accuracy?

One potential solution for reducing hallucination is to integrate web search capabilities. OpenAI’s GPT-4o model, when paired with live web search, achieves up to 90% accuracy in tests like SimpleQA, highlighting the importance of real-time information access for reducing misinformation.

While hallucinations can sometimes produce creative or novel responses, they remain a major obstacle for professional use in fields like law, healthcare, and business, where factual accuracy is non-negotiable.

OpenAI’s New o3 and o4-mini AI Models Show Increased Hallucination Rates Despite Advanced Reasoning

AI Hallucinations Still Unsolved — And Getting Worse?

O4-mini: Even Higher Hallucination Rates

Why Are Hallucinations Increasing?

Web Search Integration: A Path to Higher Accuracy?

Sindh Government Launches Free Electric Pink Scooty Scheme for Women

PTA Chairman Blames Government for Pakistan’s Telecom Infrastructure Issues, Urges Immediate Action

Google Launches Gemini 2.0 Pro with Advanced AI Capabilities

Vivo Unveils the World’s Slimmest 5G Smartphone

Leave a Reply Cancel Reply

Never miss out any news

Contact Us

Home

About Us

Contact US

Advertise

Magazine

OpenAI’s New o3 and o4-mini AI Models Show Increased Hallucination Rates Despite Advanced Reasoning

AI Hallucinations Still Unsolved — And Getting Worse?

O4-mini: Even Higher Hallucination Rates

Why Are Hallucinations Increasing?

Web Search Integration: A Path to Higher Accuracy?

Sindh Government Launches Free Electric Pink Scooty Scheme for Women

You May Also Like

Leave a Reply Cancel Reply

Never miss out any news