Canada Revenue Agency's $18M Charlie chatbot gave wrong tax answers 66% of the time

The Canada Revenue Agency deployed an AI chatbot named Charlie that cost over $18 million to develop and operate since fiscal year 2018-19. An audit by Auditor General Karen Hogan found the chatbot provided correct answers in fewer than half of tested cases, with only 2 out of 6 questions answered accurately. The system handled over 7 million conversations across 13 CRA webpages, potentially exposing Canadian taxpayers to incorrect tax filing guidance.

Canada Revenue Agency · Incident Oct 21, 2025 · Indexed Jun 4, 2026 · 3 sources

What happened

The Canada Revenue Agency deployed Charlie the Chatbot in March 2020 to provide tax filing guidance to Canadians, spending over $18 million on its development and operation. Auditor General Karen Hogan's audit, tabled October 21, 2025, found that Charlie answered correctly in only 2 out of 6 test questions, an accuracy rate of roughly 44 percent. The chatbot handled over 7 million conversations and 18 million questions from the public while remaining active on 13 CRA webpages, and the AG report warned that errors in taxes and missed deadlines can be costly to taxpayers. Finance Minister François-Philippe Champagne subsequently demanded a 100-day service improvement plan from the CRA.

What broke inside the model

Failure path · mode profile · Hallucination

01 · TriggerA user asks for a fact, a citation, or a figure.
02 · Model stepThe model writes a fluent, confident answer.
03 · Control gapNothing ties the claim back to a real source.
04 · FailureA fabricated fact ships as if it were verified.
05 · ConsequenceThe false claim reaches a customer, a court, or the public.

Confidence holds, and even spikes, as the claim detaches from any source.

Charlie the Chatbot generated brief, context-poor responses that were factually wrong in the majority of test cases, indicating the underlying model failed to reliably retrieve and convey accurate tax information. The CRA had set a pre-Generative AI accuracy threshold of only 70 percent, meaning it accepted a 30 percent error rate as a baseline, and the upgraded GenAI version's accuracy could not even be precisely determined without a comprehensive review. The system lacked sufficient guardrails or verification mechanisms to prevent incorrect tax guidance from reaching the public at scale.

Cite this entry

Permalinkhttps://failureindex.ai/failures/canada-revenue-agency-18m-charlie-chatbot

Citation

AI Failure Index. "Canada Revenue Agency's $18M Charlie chatbot gave wrong tax answers 66% of the time" (FI-0145). Realm Labs. https://failureindex.ai/failures/canada-revenue-agency-18m-charlie-chatbot (indexed Jun 4, 2026).

Share cardA branded image of this record for posts and slides.

Data fields CC-BY 4.0, prose citation permitted. Incident ID FI-0145. Full dataset at /data.

How Realm would have caught this

Controls for this failure mode

Prism
OmniGuard
AI Detection & Response (AIDR)

A runtime layer that watches the model's internal state can flag the moment a model commits to a claim it has no support for, and hold or reroute the response before it reaches a user. Realm reads those signals in real time rather than grading the transcript after the fact.

Canada Revenue Agency's $18M Charlie chatbot gave wrong tax answers 66% of the time

Key facts

What happened

What broke inside the model

What it cost

Sources

Cite this entry

How Realm would have caught this

Key facts

What happened

What broke inside the model

What it cost

Sources

Cite this entry

How Realm would have caught this

Related failures

The 11th Circuit referred Anthony Sabatini for replacing eight hallucinated cases with eight more

A judge struck a Roc Nation filing over AI-fabricated quotes, Tyrone Blackburn's third AI sanction

Coinbase pushed an AI 'breaking news' alert declaring a World Cup result before kickoff