AI Failure Index

AI Data Leakage failures

Data leakage takes three shapes. The model regurgitates training data verbatim. The model surfaces secrets that ended up in its context window because of poor retrieval boundaries. The model serves one tenant data that belongs to another tenant. All three end the same way: a customer sees something they should not have seen.

Incidents
26
Highest severity
Catastrophic
Sources cited
66
Newest indexed
Jun 16, 2026
FI-0099SaaSHigh
Data Leakage

Anthropic shipped a source map in its Claude Code npm package, exposing 512,000 lines of code

On March 31, 2026, Anthropic published version 2.1.88 of the @anthropic-ai/claude-code npm package that inadvertently included a 59.8 MB JavaScript source map file (cli.js.map), exposing approximately 512,000 lines of unobfuscated TypeScript source across roughly 1,900 files. The source map also referenced a ZIP archive hosted on Anthropic's Cloudflare R2 storage bucket, making internal repository content publicly downloadable. Anthropic pulled the package within hours and attributed the incident to a release packaging error caused by human error, not a security breach.

Confidence
High (multi-source, primary)
Anthropic3 sourcesPrimaryPublicMar 2026
FI-0218Cross-industryHigh
Data Leakage

Sears Home Services AI chatbot databases expose millions of customer records

A security researcher discovered three unsecured databases containing sensitive customer information tied to Sears Home Services’ AI assistant, exposing chat logs and audio recordings.

Confidence
Medium (multi-source)
Sears Home Services3 sourcesPressPublicMar 2026
FI-0547Cross-industryHigh
Data Leakage

McKinsey Lilli AI platform database accessed via CodeWall autonomous agent SQL injection

An autonomous AI agent from CodeWall exploited a SQL injection vulnerability in McKinsey's Lilli AI platform. This allowed the agent to gain unauthorized access to the platform's database.

Confidence
High (multi-source, primary)
McKinsey2 sourcesPrimaryPublicFeb 2026
FI-0022Retail BankingHigh
Data Leakage

Retail bank onboarding chatbot served one user another user's KYC document

A US retail bank's onboarding chatbot returned a partial KYC document from another applicant during a brief retrieval-layer misconfiguration. The exposure window was 4 hours.

Confidence
Steward-verified (NDA)
Anonymized: Retail Bank · US · $300B+ assetsSteward-verified · NDAFeb 2026
FI-0555Cross-industryHigh
Data Leakage

DJI Romo Cloud authorization bug exposes 7,000 robot vacuums

A backend permission validation error in DJI's cloud servers allowed unauthorized access to thousands of DJI Romo robot vacuums. The vulnerability exposed live camera feeds, microphone audio, and home maps to any authenticated user.

Confidence
Medium (multi-source)
DJI2 sourcesPressPublicFeb 2026
FI-0262HealthcareCatastrophic
Data Leakage

Brazilian firm allegedly used AI to illegally resell SUS patient data

In February 2026, the Brazilian Federal Police launched Operation Glycon to dismantle a business structure illegally commercializing sensitive health data from the Unified Health System (SUS). The company allegedly used an AI-powered tool designed for health professionals to gain unauthorized access to clinical records.

Confidence
High (multi-source, primary)
Unnamed company (investigated in Operation Glycon)2 sourcesPrimaryPublicFeb 2026
FI-0463SaaSHigh
Data Leakage

Clawdbot/Moltbot exposed admin dashboards enabled unauthenticated RCE and data leaks

Security researchers and vendors reported on 2026-01-27 that hundreds of internet-facing Clawdbot (rebranded Moltbot) admin dashboards were reachable without proper authentication. Some exposed panels allowed retrieval of API keys, conversation histories and, in certain deployments, unauthenticated command execution that could enable remote code execution. Multiple independent writeups described misconfigurations, plaintext secret storage, and unmoderated plugins as contributing factors.

Confidence
Medium (multi-source)
Clawdbot (rebranded Moltbot) open-source project3 sourcesPressPublicJan 2026
FI-0078SaaSHigh
Data Leakage

A Microsoft 365 Copilot bug ignored DLP labels, exposing confidential emails to AI summaries

A server-side code error in Microsoft 365 Copilot Chat caused the AI assistant to process and summarize emails carrying confidential sensitivity labels, bypassing configured DLP policies. The bug specifically affected messages in Outlook Drafts and Sent Items folders that were explicitly labeled to block automated access. Microsoft tracked the issue as Service Health Advisory CW1226324 and deployed a configuration update to affected environments beginning in February 2026.

Confidence
Medium (multi-source)
Microsoft3 sourcesPressPublicJan 2026
FI-0240SaaSHigh
Data Leakage

Nx npm malware allegedly weaponized AI agents to exfiltrate data

Two or more independent security outlets describe an alleged Nx npm package attack that used AI code assistants to inventory and exfiltrate developer files. The reports rely on security researchers and vendor blogs, not official adjudications, and describe post-install behaviors and unsafe flags as part of the mechanism.

Confidence
Medium (multi-source)
Nx3 sourcesPressPublicAug 2025
FI-0061Retail & E-commerceHigh
Data Leakage

McDonald's AI hiring chatbot exposed millions of applicants' data behind the password 123456

Security researchers found that McHire, the McDonald's hiring chatbot built by Paradox.ai, exposed the personal data of tens of millions of job applicants. An admin account secured with the password 123456 and an insecure API let researchers pull names, contact details, and chat histories.

Confidence
High (multi-source, primary)
McDonald's (Paradox.ai McHire)2 sourcesPrimaryPublicJul 2025
FI-0234InsuranceCatastrophic
Data Leakage

HCIactive data breach exposes over 3 million records from AI-insurance software

AI-powered insurance software provider HCIactive suffered a data breach in July 2025, resulting in the potential exposure of over 3 million records. The incident involved the unauthorized exfiltration of sensitive files from the company's network.

Confidence
High (multi-source, primary)
Healthcare Interactive, Inc. (HCIactive)4 sourcesPrimaryPublicJul 2025
FI-0311Cross-industryHigh
Data Leakage

xAI developer leaks API key for private SpaceX and Tesla LLMs

An xAI employee accidentally exposed a private API key on a public GitHub repository. The exposed key potentially allowed unauthorized access to private LLM projects for SpaceX and Tesla.

Confidence
Medium (multi-source)
xAI2 sourcesPressPublicMar 2025
FI-0073SaaSHigh
Data Leakage

Microsoft Copilot kept thousands of once-private GitHub repositories accessible

Researchers found that Microsoft Copilot could still surface content from tens of thousands of GitHub repositories that had been public briefly and then made private, because the data lingered in a cached index, exposing secrets and code their owners believed were no longer reachable.

Confidence
Medium (multi-source)
Microsoft2 sourcesPressPublicFeb 2025
FI-0395Legal ServicesHigh
Data Leakage

Google AI breaches New Zealand court name suppression orders

Google's AI search functions, including AI Overviews, revealed the identities of individuals protected by court-ordered name suppressions in New Zealand. The AI surfaced this information despite legal mandates intended to keep the identities confidential.

Confidence
Medium (multi-source)
Google2 sourcesPressPublicFeb 2025
FI-0081SaaSHigh
Data Leakage

A hacker claimed to breach OmniGPT, exposing 30,000 user records and 34M chat messages

A threat actor known as Gloomer claimed to have infiltrated OmniGPT, an AI chatbot platform aggregating models like ChatGPT-4, Claude 3.5, and Gemini. The hacker posted stolen data for sale on Breach Forums, including 30,000 user email addresses, phone numbers, 34 million lines of chat messages, API keys, login credentials, and billing information. OmniGPT never publicly confirmed the breach, though third-party analysis of sample data supported the hacker's claims.

Confidence
Medium (multi-source)
OmniGPT3 sourcesPressPublicJan 2025
FI-0226InsuranceHigh
Data Leakage

Texas AG sues Allstate and Arity over alleged unlawful collection and sale of driving data

The Texas Attorney General filed a lawsuit against Allstate and its subsidiary Arity, alleging unlawful collection, analysis, and sale of driving data from over 45 million Americans without proper notice or consent. The action centers on a lack of transparency in Arity’s data collection pipeline and consent mechanisms, with multiple independent sources corroborating the filing.

Confidence
High (multi-source, primary)
Allstate Insurance and its subsidiary Arity3 sourcesPrimaryPublicJan 2025
FI-0217SaaSHigh
Data Leakage

WotNot AI chatbot platform exposes 346,000 customer files

WotNot left a Google Cloud Storage bucket publicly accessible, exposing 346,381 files including passports, medical records, and resumes from customer deployments.

Confidence
High (multi-source, primary)
WotNot3 sourcesPrimaryPublicDec 2024
FI-0312Cross-industryHigh
Data Leakage

Common Crawl December 2024 dump exposes 12,000 live API keys and passwords

A security analysis of the Common Crawl December 2024 archive revealed thousands of live secrets. These credentials were captured from the open web and incorporated into a massive dataset used by AI developers to train LLMs.

Confidence
Medium (multi-source)
Common Crawl2 sourcesPressPublicDec 2024
FI-0155SaaSHigh
Data Leakage

AllHere's Ed chatbot for LAUSD exposed student PII to offshore servers before its collapse

AllHere built an AI chatbot called Ed for the Los Angeles Unified School District under a $6 million contract, but a whistleblower revealed that the system appended students' personally identifiable information to every prompt regardless of relevance and routed requests to offshore servers in violation of district data privacy rules. The chatbot was unplugged on June 14, 2024, and AllHere filed for Chapter 7 bankruptcy in July 2024 after furloughing most of its staff. Federal prosecutors later subpoenaed bankruptcy documents and the CEO was charged with defrauding investors in November 2024.

Confidence
High (multi-source, primary)
AllHere3 sourcesCourt FilingPublicJul 2024
FI-0253Public SectorHigh
Data Leakage

LAUSD disables Ed AI chatbot after AllHere collapses

LAUSD disabled its Ed AI chatbot after the vendor AllHere collapsed and could not supervise the system. Reports also describe whistleblower claims of student data privacy violations and ongoing regulatory scrutiny culminating in a federal inquiry into AllHere's bankruptcy.

Confidence
Medium (multi-source)
Los Angeles Unified School District (LAUSD)3 sourcesPressPublicJun 2024
FI-0051SaaSHigh
Data Leakage

Microsoft's Recall AI feature stored sensitive data in a way researchers called a security risk

Microsoft's Recall feature, which takes continuous screenshots of a PC and makes them searchable with AI, was found to store that data, including passwords and sensitive content, in an unencrypted local database. The backlash forced Microsoft to delay and re-engineer the feature.

Confidence
Medium (multi-source)
Microsoft2 sourcesPressPublicMay 2024
FI-0296HealthcareMedium
Data Leakage

Change Healthcare ransomware incident on Feb 21, 2024 is real but not a production AI failure

A real ransomware incident at Change Healthcare occurred on February 21, 2024. It was not a production AI failure; MFA gaps on remote access were cited as a key root cause, with BlackCat identified as the attackers.

Confidence
High (multi-source, primary)
Change Healthcare (a subsidiary of UnitedHealth Group/Optum)2 sourcesPrimaryPublicFeb 2024
FI-0052SaaSMedium
Data Leakage

Samsung banned ChatGPT after engineers pasted confidential code into it

Samsung's semiconductor staff reportedly entered confidential source code and internal meeting notes into ChatGPT to get help, sending the data to a third-party service. After discovering the leaks Samsung restricted and then banned generative-AI tools on company devices.

Confidence
High (multi-source, primary)
Samsung Electronics4 sourcesPrimaryPublicApr 2023
FI-0050SaaSHigh
Data Leakage

A bug briefly exposed other users' ChatGPT chat titles and some payment data

OpenAI disclosed that a bug in an open-source library let some ChatGPT users see other users' chat history titles, and exposed limited payment information for a subset of ChatGPT Plus subscribers, before the company took the service offline to fix it.

Confidence
High (multi-source, primary)
OpenAI2 sourcesPrimaryPublicMar 2023
FI-0245Public SectorHigh
Data Leakage

Serbia Social Card registry automation causes benefit losses for marginalized groups

Serbia implemented a Social Card registry to automate eligibility for social assistance. The system used inaccurate and misclassified data, leading to the loss of benefits for thousands of marginalized people.

Confidence
High (multi-source, primary)
Serbia Ministry of Labour, Employment, Veterans and Social Affairs2 sourcesPrimaryPublicMar 2022
FI-0362HealthcareHigh
Data Leakage

DeepMind and Royal Free NHS Trust process patient records unlawfully

The UK Information Commissioner's Office ruled that DeepMind and the Royal Free NHS Foundation Trust failed to comply with data protection laws. The incident involved the processing of 1.6 million patient records for the Streams app without adequate consent.

Confidence
Medium (multi-source)
DeepMind3 sourcesPressPublicJul 2017