OpenAI Whisper hallucinations in medical settings prompt safety concerns, AP reports

Independent outlets report that OpenAI Whisper can hallucinate in medical transcription, risking inaccurate patient documentation. The AP investigation notes thousands of healthcare workers use Whisper-based tools, highlighting potential safety concerns in high-risk settings.

OpenAI · Incident Oct 15, 2024 · Indexed Jun 5, 2026 · 3 sources

Whisper’s outputs are predictions of the most likely next token, not verbatim transcripts, which can lead to hallucinations in medical transcription.
What
Independent outlets report that OpenAI Whisper can hallucinate in medical transcription, risking inaccurate patient documentation.
Incident date
Oct 15, 2024
Who
OpenAI
Failure mode
Hallucination
AI surface
Voice Agent
Severity
High

What happened

OpenAI Whisper, used in medical settings to transcribe patient visits, has been reported to hallucinate content, creating risk of inaccurate documentation. The AP report states that over 30,000 medical workers now use Whisper-based tools to transcribe patient visits, illustrating widespread adoption in healthcare. The underlying cause is described as Whisper being a Transformer-based model that predicts the next token rather than producing a literal transcription, enabling confabulations in garbled audio.

What broke inside the model

Failure path · mode profile · Hallucination
  1. 01 · TriggerA user asks for a fact, a citation, or a figure.
  2. 02 · Model stepThe model writes a fluent, confident answer.
  3. 03 · Control gapNothing ties the claim back to a real source.
  4. 04 · FailureA fabricated fact ships as if it were verified.
  5. 05 · ConsequenceThe false claim reaches a customer, a court, or the public.

Confidence holds, and even spikes, as the claim detaches from any source.

Whisper relies on Transformer-based token prediction, not verbatim transcription, enabling it to insert plausible-sounding but inaccurate content when transcribing medical dialogues.

Public visibilityMedium
Regulatory exposurePossible
Customer impactMany customers
Financial impactUnknown
Time to disclosureDays
  1. PressOpenAI Whisper hallucinations in medical settings (AP)apnews.com
  2. PressHospitals adopt error-prone AI transcription tools despite warnings (Ars Technica)arstechnica.com
  3. PressHospitals' AI transcription tools hallucination (Wired)wired.com
Permalinkhttps://failureindex.ai/failures/openai-whisper-hallucinations-medical-settings-prompt
CitationAI Failure Index. "OpenAI Whisper hallucinations in medical settings prompt safety concerns, AP reports" (FI-0188). Realm Labs. https://failureindex.ai/failures/openai-whisper-hallucinations-medical-settings-prompt (indexed Jun 5, 2026).
Share cardA branded image of this record for posts and slides.

Data fields CC-BY 4.0, prose citation permitted. Incident ID FI-0188. Full dataset at /data.

Note from Realm Labs, the Index steward

How Realm would have caught this

Controls for this failure mode
  • Prism
  • OmniGuard
  • AI Detection & Response (AIDR)

A runtime layer that watches the model's internal state can flag the moment a model commits to a claim it has no support for, and hold or reroute the response before it reaches a user. Realm reads those signals in real time rather than grading the transcript after the fact.