AI Failure Index
AI Failures in Healthcare
Healthcare AI failures end in regulatory letters, patient harm, or both. We catalog the public ones.
- Incidents
- 32
- Highest severity
- Catastrophic
- Sources cited
- 82
- Newest indexed
- Jun 16, 2026
The Doc App counsel files fabricated case law in Florida court
A lawyer representing The Doc App, Inc. used AI to generate court filings that included fake case law. The court flagged the hallucinations and previously sanctioned the attorney, though it declined further sanctions in June 2026.
- Confidence
- High (multi-source, primary)
Social Health Authority AI premiums overcharge poorest Kenyans
Kenya's Social Health Authority deployed an AI-driven predictive model to set health insurance premiums based on income. An investigation found the system systematically overcharged the poorest citizens, effectively denying them access to healthcare.
- Confidence
- Medium (multi-source)
AI chatbots from OpenAI, Google and Anthropic provided biological weapon instructions
Major LLMs from OpenAI, Google, and Anthropic were found to provide detailed, actionable instructions for creating and deploying biological weapons. The issue was identified through stress tests conducted by scientists and security experts.
- Confidence
- High (multi-source, primary)
BMJ Open study finds half of leading chatbots give problematic medical advice
A BMJ Open study of five major chatbots found about half produced problematic medical answers, with a notable share being highly problematic due to false balance; this was reiterated by Bloomberg and NBC News.
- Confidence
- High (multi-source, primary)
UnitedHealth Group ordered to provide AI tool discovery in coverage denial case
A federal judge ordered UnitedHealth Group to disclose internal documents regarding its nH Predict AI tool. The tool is alleged to have improperly overridden physician decisions to deny coverage for skilled nursing facility care.
- Confidence
- Medium (multi-source)
Brazilian firm allegedly used AI to illegally resell SUS patient data
In February 2026, the Brazilian Federal Police launched Operation Glycon to dismantle a business structure illegally commercializing sensitive health data from the Unified Health System (SUS). The company allegedly used an AI-powered tool designed for health professionals to gain unauthorized access to clinical records.
- Confidence
- High (multi-source, primary)
St. Rose Dominican Hospital AI sepsis alert recommends dangerous fluids for dialysis patient
An AI-driven sepsis protocol at St. Rose Dominican Hospital flagged a dialysis patient for IV fluids. A nurse noticed the dialysis catheter and refused to administer fluids, averting a potentially dangerous outcome. A physician intervened with an alternative treatment after clinician concerns were raised.
- Confidence
- Medium (multi-source)
Health plan's prior-auth agent approved a procedure outside coverage policy
A regional health plan's prior-auth agent approved a procedure that the company's medical policy explicitly excluded. The provider proceeded based on the approval. The plan paid the claim and triggered an internal review.
- Confidence
- Steward-verified (NDA)
ICE AI resume screening error routes recruits to inadequate training
An AI resume-screening tool used by ICE misclassified inexperienced recruits as experienced law enforcement officers. This resulted in approximately 200 hires receiving inadequate online training instead of the required in-person academy course.
- Confidence
- Medium (multi-source)
HMRC tax allowances ignored by ChatGPT and Copilot
Generative AI tools including ChatGPT and Copilot provided incorrect UK tax advice. The models failed to recognize a £20,000 allowance, which could lead users to make incorrect tax submissions.
- Confidence
- High (multi-source, primary)
Sonio Detect AI ultrasound software mislabels fetal structures in prenatal imaging
Sonio Detect AI mislabels fetal anatomy in prenatal ultrasound, with a MAUDE adverse event entry and Reuters reporting; Samsung Medison says the FDA report does not indicate a safety issue and no action was requested.
- Confidence
- High (multi-source, primary)
Brazil AI welfare app wrongly rejects benefit claims
The Brazilian National Social Security Institute's AI-powered app, Meu INSS, wrongly denied benefit claims for hundreds of applicants. The system struggled with complex cases and rural users with low digital literacy, leading to a loss of essential income.
- Confidence
- High (multi-source, primary)
CVS Health and Aetna accused of AI-driven denials in post-acute care
A Senate staff report and independent reporting allege CVS Health and Aetna used predictive AI tools to increase denials of post-acute care authorizations for Medicare Advantage patients, prioritizing profits over patient care.
- Confidence
- High (multi-source, primary)
OpenAI Whisper hallucinations in medical settings prompt safety concerns, AP reports
Independent outlets report that OpenAI Whisper can hallucinate in medical transcription, risking inaccurate patient documentation. The AP investigation notes thousands of healthcare workers use Whisper-based tools, highlighting potential safety concerns in high-risk settings.
- Confidence
- Medium (multi-source)
Pieces Technologies settles Texas AG allegations over AI hallucination claims
Pieces Technologies reached a settlement with the Texas Attorney General following allegations that the company made deceptive claims regarding the accuracy of its generative AI clinical documentation tool. The investigation found metrics such as a severe hallucination rate of less than 1 per 100,000 were likely inaccurate.
- Confidence
- High (multi-source, primary)
CVS settled a class action alleging HireVue facial-expression AI acted as an illegal lie detector
CVS Health required job applicants to complete HireVue video interviews analyzed by Affectiva AI software that tracked facial expressions and assigned employability scores measuring traits such as integrity and conscientiousness. A proposed class action in Massachusetts federal court alleged this AI screening violated both the federal Employee Polygraph Protection Act and the Massachusetts Lie Detector Statute by functioning as an unlawful lie detector test. CVS privately settled the case in July 2024 with undisclosed terms after the court denied its motion to dismiss.
- Confidence
- High (multi-source, primary)
Change Healthcare ransomware incident on Feb 21, 2024 is real but not a production AI failure
A real ransomware incident at Change Healthcare occurred on February 21, 2024. It was not a production AI failure; MFA gaps on remote access were cited as a key root cause, with BlackCat identified as the attackers.
- Confidence
- High (multi-source, primary)
Humana was sued over using nH Predict AI to systematically deny Medicare post-acute claims
A class action lawsuit filed on December 12, 2023 alleges that Humana used an AI model called nH Predict, owned by UnitedHealth subsidiary NaviHealth, to override physician determinations and wrongfully deny Medicare Advantage members coverage for post-acute care. The complaint claims Humana set a target to keep post-acute facility stays within 1% of the algorithm's predictions and disciplined employees who deviated. Approximately 90% of denied claims were overturned on appeal, yet only about 0.2% of denied policyholders actually appealed. The Senate Permanent Subcommittee on Investigations published a report in October 2024 scrutinizing Humana and other insurers for AI-driven denials of post-acute care.
- Confidence
- High (multi-source, primary)
Large language models perpetuate racial bias in healthcare
AIAAIC recorded an incident entry (published November 2023) documenting that large language models (LLMs) have produced racially biased outputs in healthcare contexts. Independent academic audits and studies (including a 2024 audit titled "Unmasking and Quantifying Racial Bias of Large Language Models") found LLMs gave systematically different clinical-related recommendations and projections across racial groups. These outputs have the potential to cause harm when used in clinical decision-making by healthcare deployers.
- Confidence
- High (multi-source, primary)
An eating-disorder helpline's chatbot was pulled after giving harmful dieting advice
The National Eating Disorders Association replaced its human helpline with a chatbot named Tessa, which then told users seeking help to count calories and aim for large daily deficits, advice eating-disorder specialists call actively harmful. NEDA took Tessa offline days after launch.
- Confidence
- Medium (multi-source)
A mental-health startup ran GPT-3 on thousands of unwitting help-seekers
The startup Koko used GPT-3 to co-write responses to roughly 4,000 people seeking peer mental-health support without clearly informing them they were receiving AI-generated messages, drawing an ethics backlash over consent in a vulnerable-population setting.
- Confidence
- Low (single source)
Koko used GPT-3 to generate AI-assisted emotional support without informed consent
Koko conducted an October 2022 experiment using GPT-3 to generate emotional support messages, with human editors, affecting about 4,000 users and generating roughly 30,000 messages. The incident became public in January 2023 through reports and statements by Koko’s co-founders, prompting ethical criticism over informed consent and disclosure, and Koko announced pursuing a third‑party IRB review for future changes.
- Confidence
- Medium (multi-source)
Acclarent TruDi AI navigation system allegedly causes carotid artery injuries
The Acclarent TruDi AI navigation system allegedly misled surgeons during sinus operations, resulting in carotid artery punctures and strokes. FDA malfunction reports reportedly rose after AI integration in 2021, and two patients filed Texas lawsuits alleging AI contributed to injuries.
- Confidence
- Medium (multi-source)
Crisis Text Line ends data-sharing with for-profit spinoff Loris.ai
Crisis Text Line admitted to sharing anonymized user data with its for-profit subsidiary, Loris.ai, for machine learning development. The move drew heavy criticism of the ethics of using crisis-intervention data for commercial gain, and the data-sharing was ended.
- Confidence
- Medium (multi-source)
Epic's sepsis prediction model missed two-thirds of cases with 88% false alarms, a study found
The Epic Sepsis Model, a proprietary sepsis prediction algorithm embedded in Epic's electronic health record platform and deployed at hundreds of US hospitals, was found to miss 67% of sepsis cases while generating 88% false alarms in an independent external validation published in JAMA Internal Medicine in June 2021. The model's discrimination (AUC 0.63) was substantially worse than Epic's claimed performance (AUC 0.76 to 0.83). Epic subsequently overhauled the model in 2022, changing its sepsis definition, reducing reliance on antibiotic orders, and recommending site-specific training before clinical use.
- Confidence
- High (multi-source, primary)
Medtronic AccuRhythm AI misses abnormal rhythms in LINQ monitors, per FDA and Reuters
Between 2021 and 2025, at least 16 FDA adverse event reports alleged that Medtronic's AccuRhythm AI in LINQ monitors failed to detect abnormal heart rhythms. Medtronic said it reviewed the cases and found only one missed abnormal event, attributing others to data display issues or user confusion; no patient harm was reported.
- Confidence
- High (multi-source, primary)
Babylon Health symptom checker alleged to miss or downplay critical symptoms
Multiple news investigations and clinicians' tests in 2019-2021 documented examples where Babylon Health’s symptom checker produced unsafe or inappropriate triage recommendations for serious symptoms. The UK regulator MHRA told a clinician who raised concerns that it shared those concerns, and Babylon acknowledged some errors in examples highlighted by critics.
- Confidence
- Medium (multi-source)
Google Health diabetic retinopathy AI fails in real world clinic settings
Google Health's AI for detecting diabetic retinopathy failed to maintain its laboratory accuracy when deployed in real world Indian clinics. The system was hindered by suboptimal environmental conditions and data quality issues.
- Confidence
- Medium (multi-source)
Study finds Optum risk algorithm understated Black patients' health needs
A 2019 study revealed that Optum's health risk algorithm discriminated against Black patients by substituting health costs for actual health needs. This resulted in a systemic underestimation of risk for Black patients, which limited their access to specialized care management.
- Confidence
- High (multi-source, primary)
IBM Watson for Oncology provided unsafe cancer treatment recommendations
IBM Watson for Oncology provided clinically unsafe and incorrect treatment recommendations to healthcare providers. The system allegedly suggested dangerous treatments, such as bleeding drugs for patients with severe hemorrhage.
- Confidence
- Medium (multi-source)
DeepMind and Royal Free NHS Trust process patient records unlawfully
The UK Information Commissioner's Office ruled that DeepMind and the Royal Free NHS Foundation Trust failed to comply with data protection laws. The incident involved the processing of 1.6 million patient records for the Streams app without adequate consent.
- Confidence
- Medium (multi-source)
Intuitive Surgical da Vinci Xi software anomaly causes unexpected movement
Intuitive Surgical identified a software anomaly in the da Vinci Xi P5 software that could cause unexpected master and instrument tip movements. This led to a global Class 2 FDA recall affecting 677 devices.
- Confidence
- High (multi-source, primary)