Babylon Health symptom checker alleged to miss or downplay critical symptoms
Multiple news investigations and clinicians' tests in 2019-2021 documented examples where Babylon Health’s symptom checker produced unsafe or inappropriate triage recommendations for serious symptoms. The UK regulator MHRA told a clinician who raised concerns that it shared those concerns, and Babylon acknowledged some errors in examples highlighted by critics.
The symptom checker’s triage logic failed to map some users’ emergency symptoms to urgent diagnoses.
Key facts
- What
- Multiple news investigations and clinicians' tests in 2019-2021 documented examples where Babylon Health’s symptom checker produced unsafe or inappropriate triage recommendations for serious symptoms.
- Incident date
- Jun 1, 2020
- Who
- Babylon Health
- Failure mode
- Hallucination
- AI surface
- Chatbot
- Severity
- High
What happened
Clinicians and journalists documented cases where Babylon Health’s AI-driven symptom checker failed to identify or appropriately triage potentially serious conditions, including examples reported as possible heart attacks. The Independent reported that an NHS consultant compiled examples the tool misclassified and that the UK regulator, the MHRA, told the clinician it shared his concerns. Independent audits and journalism had previously raised broader accuracy and bias concerns about the symptom checker’s recommendations.
What broke inside the model
- 01 · TriggerA user asks for a fact, a citation, or a figure.
- 02 · Model stepThe model writes a fluent, confident answer.
- 03 · Control gapNothing ties the claim back to a real source.
- 04 · FailureA fabricated fact ships as if it were verified.
- 05 · ConsequenceThe false claim reaches a customer, a court, or the public.
Confidence holds, and even spikes, as the claim detaches from any source.
The symptom checker’s triage logic and underlying knowledge-graph-based reasoning did not flag some emergency presentations as urgent, producing lower-acuity recommendations for cases clinicians considered potentially serious. Independent reporting and expert critiques pointed to gaps in coverage, validation, and decision thresholds in the system’s clinical rules and training data that likely led to undertriage and biased outputs.
What it cost
Sources
- PressRegulator has concerns over symptom checker appindependent.co.uk
- PressMedical Advice From a Bot: The Unproven Promise of Babylon Healthundark.org
Cite this entry
https://failureindex.ai/failures/babylon-health-symptom-checker-alleged-missAI Failure Index. "Babylon Health symptom checker alleged to miss or downplay critical symptoms" (FI-0360). Realm Labs. https://failureindex.ai/failures/babylon-health-symptom-checker-alleged-miss (indexed Jun 9, 2026).Data fields CC-BY 4.0, prose citation permitted. Incident ID FI-0360. Full dataset at /data.
Note from Realm Labs, the Index steward
How Realm fits
- Prism
- OmniGuard
- AI Detection & Response (AIDR)
This entry sits in the index's predictive wing: a system that scores, ranks, perceives, or steers rather than generates. Realm's runtime layer is built for the generative and agentic systems now moving into these same decision seats, where it watches a model's internal state and holds an unsupported claim or an unchecked action before it commits. The control gap on this record, an automated decision that reached people with no runtime check in front of it, is the same gap. The index keeps predictive failures on the record because the pattern carries straight into the systems shipping today.