Babylon Health symptom checker alleged to miss or downplay critical symptoms

Multiple news investigations and clinicians' tests in 2019-2021 documented examples where Babylon Health’s symptom checker produced unsafe or inappropriate triage recommendations for serious symptoms. The UK regulator MHRA told a clinician who raised concerns that it shared those concerns, and Babylon acknowledged some errors in examples highlighted by critics.

Babylon Health · Incident Jun 1, 2020 · Indexed Jun 9, 2026 · 2 sources

Key facts

What

Incident date

Jun 1, 2020

Who

Babylon Health

Failure mode

Hallucination

AI surface

Chatbot

Severity

High

What happened

Clinicians and journalists documented cases where Babylon Health’s AI-driven symptom checker failed to identify or appropriately triage potentially serious conditions, including examples reported as possible heart attacks. The Independent reported that an NHS consultant compiled examples the tool misclassified and that the UK regulator, the MHRA, told the clinician it shared his concerns. Independent audits and journalism had previously raised broader accuracy and bias concerns about the symptom checker’s recommendations.

What broke inside the model

The symptom checker’s triage logic and underlying knowledge-graph-based reasoning did not flag some emergency presentations as urgent, producing lower-acuity recommendations for cases clinicians considered potentially serious. Independent reporting and expert critiques pointed to gaps in coverage, validation, and decision thresholds in the system’s clinical rules and training data that likely led to undertriage and biased outputs.

Cite this entry

Permalinkhttps://failureindex.ai/failures/babylon-health-symptom-checker-alleged-miss

Citation

AI Failure Index. "Babylon Health symptom checker alleged to miss or downplay critical symptoms" (FI-0360). Realm Labs. https://failureindex.ai/failures/babylon-health-symptom-checker-alleged-miss (indexed Jun 9, 2026).

Share cardA branded image of this record for posts and slides.

Data fields CC-BY 4.0, prose citation permitted. Incident ID FI-0360. Full dataset at /data.

How Realm fits

Controls for this failure mode

Prism
OmniGuard
AI Detection & Response (AIDR)

This entry sits in the index's predictive wing: a system that scores, ranks, perceives, or steers rather than generates. Realm's runtime layer is built for the generative and agentic systems now moving into these same decision seats, where it watches a model's internal state and holds an unsupported claim or an unchecked action before it commits. The control gap on this record, an automated decision that reached people with no runtime check in front of it, is the same gap. The index keeps predictive failures on the record because the pattern carries straight into the systems shipping today.

Babylon Health symptom checker alleged to miss or downplay critical symptoms

Key facts

What happened

What broke inside the model

What it cost

Sources

Cite this entry

How Realm fits

Key facts

What happened

What broke inside the model

What it cost

Sources

Cite this entry

How Realm fits

Related failures

The 11th Circuit referred Anthony Sabatini for replacing eight hallucinated cases with eight more

A judge struck a Roc Nation filing over AI-fabricated quotes, Tyrone Blackburn's third AI sanction

Coinbase pushed an AI 'breaking news' alert declaring a World Cup result before kickoff