TurboTax's Intuit Assist gave wrong tax advice on over half of test questions, the Post found
Washington Post tech columnist Geoffrey A. Fowler tested TurboTax's Intuit Assist AI chatbot with 16 tax questions and found it gave wrong or irrelevant answers on more than half. Specific failures included recommending incorrect filing statuses and fabricating irrelevant education credit advice when asked about air conditioner tax credits. Even after Intuit updated the software, the chatbot remained unhelpful on a quarter of the questions.
The AI treated a question about air conditioner tax credits as a prompt to hallucinate education credit advice, proving confident retrieval without relevance checking is dangerous in tax compliance.
Key facts
- What
- Washington Post tech columnist Geoffrey A.
- Incident date
- Mar 4, 2024
- Who
- Intuit
- Failure mode
- Hallucination
- AI surface
- Chatbot
- Severity
- High
What happened
Washington Post tech columnist Geoffrey A. Fowler tested TurboTax's Intuit Assist AI chatbot with 16 tax questions during the 2024 tax season and found it gave wrong or irrelevant answers on more than half. When asked about tax credits for a new air conditioner, the chatbot responded with irrelevant information about education credits and 1098-T forms instead of the correct residential energy credit. The chatbot also failed to provide correct filing status guidance and pasted irrelevant content from community forums instead of answering specific questions. H&R Block's competing AI Tax Assist also gave unhelpful answers on over 30 percent of the same test questions.
What broke inside the model
- 01 · TriggerA user asks for a fact, a citation, or a figure.
- 02 · Model stepThe model writes a fluent, confident answer.
- 03 · Control gapNothing ties the claim back to a real source.
- 04 · FailureA fabricated fact ships as if it were verified.
- 05 · ConsequenceThe false claim reaches a customer, a court, or the public.
Confidence holds, and even spikes, as the claim detaches from any source.
The generative AI chatbot's retrieval and generation pipeline failed to match user questions to the correct tax code provisions, instead surfacing tangentially related content from community forums and unrelated tax topics. When asked about air conditioner tax credits, it returned irrelevant education credit information including 1098-T forms, demonstrating a fundamental failure in relevance matching and contextual understanding. The system lacked sufficient guardrails to prevent confident presentation of fabricated or mismatched tax advice.
What it cost
Sources
- PressTurboTax and H&R Block's AI chatbots are giving bad tax advicewashingtonpost.com
- PressDangers of AI-Powered Chatbot Tax Advicebankler.com
- PressOverreliance on AI for Tax Advice: A Cautionary Perspectivetaxexecutive.org
Cite this entry
https://failureindex.ai/failures/turbotax-intuit-assist-gave-wrong-taxAI Failure Index. "TurboTax's Intuit Assist gave wrong tax advice on over half of test questions, the Post found" (FI-0084). Realm Labs. https://failureindex.ai/failures/turbotax-intuit-assist-gave-wrong-tax (indexed Jun 4, 2026).Data fields CC-BY 4.0, prose citation permitted. Incident ID FI-0084. Full dataset at /data.
Note from Realm Labs, the Index steward
How Realm would have caught this
- Prism
- OmniGuard
- AI Detection & Response (AIDR)
A runtime layer that watches the model's internal state can flag the moment a model commits to a claim it has no support for, and hold or reroute the response before it reaches a user. Realm reads those signals in real time rather than grading the transcript after the fact.