UK GOV.UK Chat gave citizens incorrect tax, VAT, and immigration advice in its alpha pilot

The UK Government Digital Service's GOV.UK Chat prototype produced inaccurate or misleading responses during a private pilot with approximately 1,000 users, scoring only 76% accuracy at its earliest benchmark. The system gave incorrect advice on tax, VAT registration, EU Settlement Scheme, and flight refund matters before GDS added filters to block certain question categories. The Times later reported that the chatbot gave misleading tax information, drawing criticism from tax professionals.

UK Government Digital Service (GDS) · Incident Jan 18, 2024 · Indexed Jun 4, 2026 · 3 sources

Key facts

What

Incident date

Jan 18, 2024

Who

UK Government Digital Service (GDS)

Failure mode

Hallucination

AI surface

Chatbot

Severity

High

What happened

During a private pilot with approximately 1,000 users in late 2023, the GOV.UK Chat prototype provided citizens with inaccurate or misleading responses on topics including tax, VAT registration, EU Settlement Scheme, and flight refunds. GDS published findings acknowledging that answers were not accurate enough and that the system made outright mistakes, with the earliest accuracy benchmark at just 76%. GDS subsequently added filters and rules to prevent the chatbot from answering certain question categories before broader pilot deployment. The Times later reported that the chatbot gave misleading tax information, drawing criticism from tax professionals.

What broke inside the model

Failure path · mode profile · Hallucination

01 · TriggerA user asks for a fact, a citation, or a figure.
02 · Model stepThe model writes a fluent, confident answer.
03 · Control gapNothing ties the claim back to a real source.
04 · FailureA fabricated fact ships as if it were verified.
05 · ConsequenceThe false claim reaches a customer, a court, or the public.

Confidence holds, and even spikes, as the claim detaches from any source.

The retrieval-augmented generation system combined semantic search over GOV.UK content with a large language model, but when retrieved context was incomplete or ambiguous, the LLM overgeneralized and produced responses not strictly grounded in source material. The model generated hallucinated hyperlinks and failed to reliably distinguish between topics where published guidance was sufficient and those where it was too limited to support a confident answer. GDS identified core failure areas including groundedness, factual accuracy, factual completeness, and reputational safety.

Cite this entry

Permalinkhttps://failureindex.ai/failures/uk-gov-uk-chat-gave-citizens

Citation

AI Failure Index. "UK GOV.UK Chat gave citizens incorrect tax, VAT, and immigration advice in its alpha pilot" (FI-0108). Realm Labs. https://failureindex.ai/failures/uk-gov-uk-chat-gave-citizens (indexed Jun 4, 2026).

Share cardA branded image of this record for posts and slides.

Data fields CC-BY 4.0, prose citation permitted. Incident ID FI-0108. Full dataset at /data.

How Realm would have caught this

Controls for this failure mode

Prism
OmniGuard
AI Detection & Response (AIDR)

A runtime layer that watches the model's internal state can flag the moment a model commits to a claim it has no support for, and hold or reroute the response before it reaches a user. Realm reads those signals in real time rather than grading the transcript after the fact.

UK GOV.UK Chat gave citizens incorrect tax, VAT, and immigration advice in its alpha pilot

Key facts

What happened

What broke inside the model

What it cost

Sources

Cite this entry

How Realm would have caught this

Key facts

What happened

What broke inside the model

What it cost

Sources

Cite this entry

How Realm would have caught this

Related failures

The 11th Circuit referred Anthony Sabatini for replacing eight hallucinated cases with eight more

A judge struck a Roc Nation filing over AI-fabricated quotes, Tyrone Blackburn's third AI sanction

Coinbase pushed an AI 'breaking news' alert declaring a World Cup result before kickoff