Babylon Health symptom checker alleged to miss or downplay critical symptoms

Multiple news investigations and clinicians' tests in 2019-2021 documented examples where Babylon Health’s symptom checker produced unsafe or inappropriate triage recommendations for serious symptoms. The UK regulator MHRA told a clinician who raised concerns that it shared those concerns, and Babylon acknowledged some errors in examples highlighted by critics.

Babylon Health · Incident Jun 1, 2020 · Indexed Jun 9, 2026 · 2 sources

The symptom checker’s triage logic failed to map some users’ emergency symptoms to urgent diagnoses.
What
Multiple news investigations and clinicians' tests in 2019-2021 documented examples where Babylon Health’s symptom checker produced unsafe or inappropriate triage recommendations for serious symptoms.
Incident date
Jun 1, 2020
Who
Babylon Health
Failure mode
Hallucination
AI surface
Chatbot
Severity
High

What happened

Clinicians and journalists documented cases where Babylon Health’s AI-driven symptom checker failed to identify or appropriately triage potentially serious conditions, including examples reported as possible heart attacks. The Independent reported that an NHS consultant compiled examples the tool misclassified and that the UK regulator, the MHRA, told the clinician it shared his concerns. Independent audits and journalism had previously raised broader accuracy and bias concerns about the symptom checker’s recommendations.

What broke inside the model

Failure path · mode profile · Hallucination
  1. 01 · TriggerA user asks for a fact, a citation, or a figure.
  2. 02 · Model stepThe model writes a fluent, confident answer.
  3. 03 · Control gapNothing ties the claim back to a real source.
  4. 04 · FailureA fabricated fact ships as if it were verified.
  5. 05 · ConsequenceThe false claim reaches a customer, a court, or the public.

Confidence holds, and even spikes, as the claim detaches from any source.

The symptom checker’s triage logic and underlying knowledge-graph-based reasoning did not flag some emergency presentations as urgent, producing lower-acuity recommendations for cases clinicians considered potentially serious. Independent reporting and expert critiques pointed to gaps in coverage, validation, and decision thresholds in the system’s clinical rules and training data that likely led to undertriage and biased outputs.

Public visibilityHigh
Regulatory exposurePossible
Customer impactMany customers
Financial impactUnknown
Time to disclosureMonths
  1. PressRegulator has concerns over symptom checker appindependent.co.uk
  2. PressMedical Advice From a Bot: The Unproven Promise of Babylon Healthundark.org
Permalinkhttps://failureindex.ai/failures/babylon-health-symptom-checker-alleged-miss
CitationAI Failure Index. "Babylon Health symptom checker alleged to miss or downplay critical symptoms" (FI-0360). Realm Labs. https://failureindex.ai/failures/babylon-health-symptom-checker-alleged-miss (indexed Jun 9, 2026).
Share cardA branded image of this record for posts and slides.

Data fields CC-BY 4.0, prose citation permitted. Incident ID FI-0360. Full dataset at /data.

Note from Realm Labs, the Index steward

How Realm fits

Controls for this failure mode
  • Prism
  • OmniGuard
  • AI Detection & Response (AIDR)

This entry sits in the index's predictive wing: a system that scores, ranks, perceives, or steers rather than generates. Realm's runtime layer is built for the generative and agentic systems now moving into these same decision seats, where it watches a model's internal state and holds an unsupported claim or an unchecked action before it commits. The control gap on this record, an automated decision that reached people with no runtime check in front of it, is the same gap. The index keeps predictive failures on the record because the pattern carries straight into the systems shipping today.