AI Failure Index · Assessment

AI Chatbot failure assessment

The failure modes that hit Chatbot systems in production, the real indexed incidents behind each, and the runtime control that would have caught them.

Chatbot failure surface

198failures on this surface
13catastrophic
40%under active regulatory exposure

Hallucination
101 on this surface
5 Catastrophic 43 High 49 Medium 4 Low
Runtime control Prism observes hallucination signatures in the model's internal state. AIDR flags the moment the model commits to a fabricated claim. OmniGuard can block the response inline.
- Jisuh Lee referred for criminal contempt over AI-generated fake citations in Ontario courtCatastrophic May 2025
- New York City's small-business chatbot told users to break the lawCatastrophic Mar 2024
- Pak'nSave Savey Meal-bot suggests recipes using toxic household chemicalsCatastrophic Aug 2023
Brand & Safety Incident
35 on this surface
4 Catastrophic 17 High 11 Medium 3 Low
Runtime control Prism reads the model's representation against brand and safety policy. OmniGuard blocks inline. AIDR provides the post-incident audit trail.
- Palm Springs fertility clinic bombing suspects used AI chatbot to research explosivesCatastrophic May 2025
- A second lawsuit alleged Character.AI bots encouraged a teen toward self-harm and violenceCatastrophic Dec 2024
- Character.AI settled the first AI chatbot product-liability rulingCatastrophic Oct 2024
Policy Violation
19 on this surface
2 Catastrophic 10 High 6 Medium 1 Low
Runtime control OmniGuard authors policy at the runtime layer and enforces it inline. Prism reads the model's intent against the policy boundary.
- Hagens Berman sued OpenAI alleging ChatGPT-4o reinforced a man's delusions before a tragedyCatastrophic Aug 2025
- Chai AI chatbot incident: Belgian man urged to commit suicide; safety patch addedCatastrophic Mar 2023
- AI chatbots from OpenAI, Google and Anthropic provided biological weapon instructionsHigh Apr 2026
Data Leakage
13 on this surface
1 Catastrophic 10 High 2 Medium
Runtime control OmniGuard redacts inline. Prism observes the model's representations to flag identity-bound content before it reaches a response. AIDR provides the audit trail.
- Brazilian firm allegedly used AI to illegally resell SUS patient dataCatastrophic Feb 2026
- Sears Home Services AI chatbot databases expose millions of customer recordsHigh Mar 2026
- McKinsey Lilli AI platform database accessed via CodeWall autonomous agent SQL injectionHigh Feb 2026
Prompt Injection
12 on this surface
5 High 5 Medium 2 Low
Runtime control OmniGuard intercepts injection patterns at the prompt and tool-call layer. Prism flags concept activations that indicate the model is being redirected.
Tool Misuse
12 on this surface
4 High 6 Medium 2 Low
Runtime control AgentRealm inspects each function call against the agent's stated intent. OmniGuard can require human-in-the-loop for high-risk tools.
- Tesla Austin robotaxi fleet logs 14 crashes prompting NHTSA investigationHigh Feb 2026
- Argentine judge's use of ChatGPT leads to annulment of criminal convictionHigh Jun 2025
- Javier Milei campaign uses AI deepfakes to manipulate Argentine election contentHigh Nov 2023
Agentic Action Error
5 on this surface
3 High 1 Medium 1 Low
Runtime control AgentRealm is purpose-built for this. The agent-runtime layer above Prism and OmniGuard inspects each tool call against intent and scope, and intervenes before the action commits.
Identity & Access Drift
1 on this surface
1 Catastrophic
Runtime control OmniGuard enforces identity-bound scope at every tool call. AgentRealm reconciles agent action with the assigned principal in real time.
- Telangana AI Samagra Vedika wrongly denied food subsidies to thousandsCatastrophic Jan 2024

Where this surface bites hardest

See how Realm catches these failure modes at runtime, before they reach a user.

Book a Demo

AI Chatbot failure assessment

Chatbot failure surface

Email me this assessment