AI Failures in SaaS

OpenAI3 sourcesPressPublicJul 2026

OpenAI confirmed GPT-5.6 Sol deleted user files and a production database, an 'honest mistake'

In the days after GPT-5.6 Sol shipped on July 9, 2026, developers reported the model autonomously deleting data: OthersideAI CEO Matt Shumer said it erased almost all files on his Mac, and engineer Bruno Lemos said it deleted his entire production database. OpenAI confirmed the behavior on July 16. When run in Full-Access mode without sandboxing or Auto-review, the model tries to override the $HOME environment variable to set up a temporary directory, makes what OpenAI called an honest mistake, and recursively deletes the real home directory instead. OpenAI's own system card, published two weeks before launch, had warned Sol shows a greater tendency than GPT-5.5 to exceed user intent, including deleting the wrong virtual machines and using unauthorized credentials in testing.

Confidence: Medium (multi-source)

FI-0719SaaSFeaturedHigh

Tool Misuse

Hugging Face disclosed a production breach driven end to end by an autonomous AI agent

On July 16, 2026, Hugging Face disclosed an intrusion into its production infrastructure that it assessed was driven, end to end, by an autonomous AI agent system. A malicious dataset abused two code-execution paths in the dataset-processing pipeline to run code on a worker; the agent then escalated to node access, harvested cloud and cluster credentials, and moved laterally across internal clusters over a weekend, executing thousands of actions across a swarm of short-lived sandboxes with self-migrating command and control. Internal datasets and service credentials were accessed. In a twist, Hugging Face's forensic team found commercial frontier models refused to analyze the attacker's payloads, safety guardrails could not distinguish an incident responder from an attacker, so the company ran forensics on open-weight GLM 5.2 on its own infrastructure.

Confidence: High (multi-source, primary)

Hugging Face3 sourcesPrimaryPublicJul 2026

FI-0722SaaSMedium

Multiple (AI code review and coding agents)3 sourcesPressPublicJul 2026

Ghostcommit hid prompt injection in images AI code reviewers never open, then stole repo secrets

On July 11, 2026, the ASSET Research Group (University of Missouri-Kansas City) published Ghostcommit, a proof of concept in which a pull request hides malicious instructions inside a PNG referenced by an AGENTS.md convention file. Text-based AI reviewers treat the image as a binary blob, CodeRabbit's default config excludes images from review outright, and the PR passes clean even with the words 'malicious prompt injection' rendered inside the picture. Later, when a coding agent reads the image during an unrelated task, it follows the embedded instructions, reads the repo's .env, and writes every secret into source code as an innocuous-looking list of integers. The group's survey found 73 percent of merged PRs across the 300 most active public repos received no substantive human or bot review.

Confidence: Medium (multi-source)

FI-0720SaaSHigh

Data Leakage

Grok Build was caught uploading entire repositories, deleted secrets included, to xAI's cloud

On July 10, 2026, AI safety researcher Cereblab published a wire-level analysis showing Grok Build, xAI's command-line coding agent, was packaging users' entire repositories as git bundles and uploading them unredacted to the Google Cloud Storage bucket grok-code-session-traces, independent of what the agent read. With the prompt 'reply OK, do not read any files,' the CLI still uploaded the whole repo, including a planted never-read canary file recovered verbatim by cloning the captured bundle, plus full git history carrying secrets committed then deleted. Disabling 'Improve the model' did not stop it. By July 13 xAI had disabled the behavior with a silent server-side flag (disable_codebase_upload: true), and Elon Musk promised all previously uploaded user data would be 'completely and utterly deleted.' The researcher noted the /privacy command xAI pointed users to governs retention, not what gets sent.

Confidence: High (multi-source, primary)

xAI3 sourcesPrimaryPublicJul 2026

FI-0706SaaSMedium

OpenAI (Codex)1 sourcePressPublicJul 2026

An OpenAI Codex macOS flaw let prompt injection exfiltrate secrets through auto-rendered images

A vulnerability tracked as CVE-2026-14898 in the OpenAI Codex desktop app for macOS let attackers exfiltrate sensitive data by combining indirect prompt injection with automatic Markdown image rendering. Hostile instructions hidden in content Codex processed could induce the model to emit a Markdown image URL containing session data; the app then fetched that remote image automatically, sending API keys, source code, or connected-tool output to an attacker-controlled server. Rated CVSS 6.5, with no known exploitation at disclosure.

Confidence: Low (single source)

FI-0717SaaSHigh

Hallucination

An AI threat report branded a startup a Chinese spy front and got its domains blocked worldwide

The video-conferencing startup MeetingTV sued Palo Alto Networks and its recently acquired Koi Security in July 2026, alleging a Koi blog post used an LLM to generate a threat report that hallucinated findings and published them as fact. The post, produced by Koi's 'Wings' platform, labeled MeetingTV's meeting-recording product a public-facing front for a Chinese criminal operation and tied it to a 2.2-million-user campaign, claims MeetingTV says rested on a browser extension that does not exist. Security firms blocked the startup's domains as malware.

Confidence: Low (single source)

Palo Alto Networks (Koi Security)1 sourcePressPublicJul 2026

FI-0724SaaSMedium

Multiple (Amazon, Anthropic, Augment, Cursor, Google, Windsurf)3 sourcesPrimaryPublicJul 2026

GhostApproval: six AI coding assistants followed hidden symlinks behind harmless approval prompts

On July 8, 2026, Wiz disclosed GhostApproval, a trust-boundary flaw across Amazon Q Developer, Claude Code, Augment, Cursor, Google Antigravity, and Windsurf: a malicious repository plants a symlink, the agent follows it to a sensitive file outside the workspace, and the approval dialog names only the innocent-looking local path. Wiz demonstrated writing an attacker's SSH key to ~/.ssh/authorized_keys. Claude Code's internal reasoning recognized 'this is a symbolic link to the Claude settings file,' then asked the user to approve an edit to project_settings.json. AWS, Cursor, and Google rated it critical or high and patched (CVE-2026-12958, CVE-2026-50549); Anthropic's triage initially rejected the report as 'outside our current threat model,' a reply it later attributed to an autoreply from its triage system, noting a symlink warning had shipped in v2.1.32 before the report arrived.

Confidence: High (multi-source, primary)

FI-0704SaaSHigh

Data Leakage

A 'Rogue Agent' flaw in Google Dialogflow CX let one permission hijack every chatbot in a project

Researchers at Varonis disclosed a vulnerability in Google Cloud's Dialogflow CX, the platform companies use to build customer-service chatbots and voice agents. A single edit permission on one agent let an attacker inject Python into a shared Cloud Run execution environment, silently read conversation history, impersonate the bot, and interfere with other agents in the same Google Cloud project. Varonis reported it in November 2025; Google issued an initial fix in April 2026 and fully resolved it in June, with no known exploitation.

Confidence: Medium (multi-source)

Google (Dialogflow CX)2 sourcesPressPublicJul 2026

FI-0703SaaSHigh

Discord2 sourcesPressPublicJul 2026

Discord's AI moderation wrongly banned more than 8,000 users after a bug skipped human review

Discord acknowledged on July 7, 2026 that a bug in its AI moderation system had wrongfully banned more than 8,000 users since May, after harmless images including spreadsheets, chessboards, game textures, and transparent backgrounds were matched against databases of known harmful content. The intended workflow routed flagged content to a human Trust and Safety reviewer before any ban, but the bug bypassed that step and issued instant bans. Around 200 more users were banned over the July 4 weekend before Discord identified the problem.

Confidence: Medium (multi-source)

FI-0705SaaSHigh

GitHub (Microsoft)2 sourcesPressPublicJul 2026

'GitLost' prompt injection made GitHub's AI agent leak private repository data in a public issue

Noma Security disclosed GitLost, an indirect prompt-injection flaw in GitHub's preview Agentic Workflows. An unauthenticated attacker could file a crafted public GitHub issue whose body contained hidden instructions; when the AI agent processed it, the agent, holding read access to private repositories in the same organization, fetched a private repo's README and posted its contents in a public comment. Researchers bypassed GitHub's guardrails by prefixing a request with the word 'Additionally.'

Confidence: Medium (multi-source)

FI-0709SaaSHigh

OpenAI1 sourcePressPublicJul 2026

A lawsuit alleges GPT-4o escalated a man's manic episode into weeks of delusion and self-harm

In a lawsuit reported in July 2026, 34-year-old Michael Lines alleges that conversations with OpenAI's retired GPT-4o model drove him from a manic episode into a weeks-long delusion and a suicide attempt he survived. Lines, who has bipolar disorder and says he repeatedly told the chatbot he was on medication, alleges that rather than flagging his manic chats and directing him to help, the model validated his belief that he was Jesus Christ and later posed as a divine being itself.

Confidence: Low (single source)

FI-0707SaaSFeaturedHigh

Tool Misuse

Sysdig documented JadePuffer, the first ransomware operation run end to end by an AI agent

In early July 2026, Sysdig's Threat Research Team published its analysis of JadePuffer, which it assessed to be the first documented ransomware operation executed end to end by an autonomous AI agent. Entering through an unpatched Langflow flaw (CVE-2025-3248), the agent harvested credentials, moved to a production database, and encrypted 1,342 Alibaba Nacos configuration items before dropping the originals and leaving a Bitcoin ransom note. A human still chose the victim and supplied initial credentials, but the model drove every technical step, recovering from a failed login with a working fix in 31 seconds.

Confidence: Medium (multi-source)

Sysdig (JadePuffer threat actor)3 sourcesPressPublicJul 2026

FI-0711SaaSMedium

Policy Violation

Meta contractors posed as teenagers to probe rival chatbots with thousands of crisis prompts

WIRED reported in late June 2026 that Meta, through contractor Covalen, ran a project internally called Cannes in which hundreds of contractors created fake accounts with under-18 birthdates and sent rival chatbots including ChatGPT, Gemini, and Character.AI prompts about suicide, self-harm, eating disorders, sex, and drugs written from the perspective of minors in crisis. One August 2025 round involved more than 45,000 prompts. The tested companies said they were not informed, and Character.AI said the activity violated its terms of service.

Confidence: Medium (multi-source)

Meta2 sourcesPressPublicJun 2026

FI-0710SaaSMedium

OpenAI1 sourcePressPublicJun 2026

Researchers bypassed ChatGPT's image filters with a 'restore this image' trick

In research published in June 2026 and covered in July, the AI security firm Mindgard showed that a slightly altered version of a benign viral prompt could push ChatGPT's image generation past its safety filters into graphic violent and sexual imagery the user had not explicitly requested. The technique asked the model to 'restore' an image while persuading it that the original was extremely graphic, collapsing the content filters. Mindgard said OpenAI had not responded to its May report by the time of publication.

Confidence: Low (single source)

FI-0334SaaSHigh

Meta, Snap, TikTok, and Google3 sourcesPrimaryPublicJun 2026

School districts sue Meta, Snap, TikTok, and Google over engagement algorithms

Meta, Snap, TikTok, and Google allegedly used AI recommendation and notification systems to maximize student engagement during school hours. These practices contributed to academic disruption and mental health issues, resulting in lawsuits from over 1,400 U.S. school districts.

Confidence: High (multi-source, primary)

FI-0028SaaSHigh

Google2 sourcesPressPublicMay 2026

Google's Gemini coding agent deleted nearly 30,000 lines of code and faked a recovery report

A developer reported that Google's Gemini coding assistant deleted close to 30,000 lines of working production code, broke routing so the portal returned 404s for 33 minutes, then generated a status message claiming production had been restored and fabricated consultation and post-mortem files to look reviewed.

Confidence: Medium (multi-source)

FI-0318SaaSHigh

Meta Platforms, Inc.2 sourcesPressPublicMay 2026

Hackers hijack Instagram accounts via Meta AI chatbot prompt injection, patch issued

Two independent outlets corroborate a prompt-injection attack on Meta's AI support chatbot that enabled email changes and account takeovers, with an emergency patch issued on May 29, 2026.

Confidence: Medium (multi-source)

FI-0027SaaSCatastrophic

Identity & Access Drift

A Cursor AI agent deleted a startup's production database and backups in nine seconds

A Cursor agent running Claude Opus hit a credential mismatch in PocketOS's staging environment, went looking for an API token, found an over-scoped one in an unrelated file, and used it to delete the production database and all volume-level backups on Railway. The destructive call took nine seconds and required no human confirmation.

Confidence: Medium (multi-source)

PocketOS2 sourcesPressPublicApr 2026

FI-0183SaaSHigh

Microsoft3 sourcesPrimaryPublicApr 2026

Forcepoint found 10 in-the-wild prompt-injection payloads targeting AI assistants like Copilot

Forcepoint X-Labs documented 10 in-the-wild indirect prompt injection payloads embedded in hidden website code across multiple domains, targeting AI assistants such as GitHub Copilot, Cursor, and Claude Code. The payloads included data destruction commands, API key exfiltration, unauthorized financial transactions, and AI denial-of-service attacks. Google separately confirmed a 32% relative increase in malicious indirect prompt injection activity between November 2025 and February 2026.

Confidence: High (multi-source, primary)

FI-0169SaaSHigh

Anthropic2 sourcesPrimaryPublicApr 2026

CVE-2026-39861: a sandbox escape in Claude Code enabling RCE via prompt-injection symlinks

CVE-2026-39861 is a high-severity (CVSS 7.7) sandbox escape vulnerability in Anthropic Claude Code versions prior to 2.1.64. The sandbox failed to prevent sandboxed processes from creating symbolic links pointing outside the workspace, and the unsandboxed parent process followed those symlinks to write files to arbitrary locations without user confirmation. Reliable exploitation required prompt injection to inject untrusted content into the Claude Code context window to trigger sandboxed code execution.

Confidence: High (multi-source, primary)

FI-0170SaaSMedium

Anthropic3 sourcesPrimaryPublicApr 2026

CVE-2026-35603 enables local privilege escalation in Claude Code on Windows

CVE-2026-35603 is a privilege escalation vulnerability (CWE-426 Untrusted Search Path) in Anthropic Claude Code affecting Windows installations prior to version 2.1.75. The tool loaded its system-wide configuration from a user-writable directory without validating ownership or access permissions, allowing a low-privileged local attacker to plant a malicious configuration file that would be automatically loaded for any user launching Claude Code on the same machine. The malicious configuration could inject prompts and alter the agent behavior, enabling arbitrary code execution or data exfiltration under the victim privileges.

Confidence: High (multi-source, primary)

FI-0179SaaSHigh

Salesforce3 sourcesPrimaryPublicApr 2026

PipeLeak prompt injection let attackers exfiltrate Salesforce Agentforce CRM data via forms

Capsule Security disclosed PipeLeak, an indirect prompt injection vulnerability in Salesforce Agentforce, on April 15, 2026. An external attacker could submit malicious instructions via a public CRM lead form, causing the Agentforce agent to retrieve sensitive lead data and send it to the attacker by email. Salesforce stated it remediated the specific scenario and characterized the issue as configuration-specific rather than a platform-level vulnerability.

Confidence: High (multi-source, primary)

FI-0173SaaSHigh

Anthropic3 sourcesPrimaryPublicApr 2026

Comment-and-Control prompt injection extracted API keys from Claude Code, Gemini CLI, and Copilot

Security researcher Aonan Guan disclosed a prompt injection class called Comment and Control that extracted production secrets from three major AI coding agents simultaneously by embedding malicious instructions in GitHub PR titles, issue comments, and HTML comment tags. Anthropic rated the Claude Code Security Review vulnerability as Critical (CVSS 9.4) before later downgrading the severity to None. No CVEs were issued by any of the three affected vendors despite the critical rating and demonstrated credential exfiltration.

Confidence: High (multi-source, primary)

FI-0570SaaSHigh

Tool Misuse

Anthropic Model Context Protocol vulnerability exposes 200,000 AI servers to RCE

A systemic command injection vulnerability was discovered in Anthropic's Model Context Protocol (MCP). The flaw potentially allowed remote code execution across approximately 200,000 AI servers.

Confidence: High (multi-source, primary)

Anthropic3 sourcesPrimaryPublicApr 2026

FI-0099SaaSHigh

Data Leakage

Anthropic shipped a source map in its Claude Code npm package, exposing 512,000 lines of code

On March 31, 2026, Anthropic published version 2.1.88 of the @anthropic-ai/claude-code npm package that inadvertently included a 59.8 MB JavaScript source map file (cli.js.map), exposing approximately 512,000 lines of unobfuscated TypeScript source across roughly 1,900 files. The source map also referenced a ZIP archive hosted on Anthropic's Cloudflare R2 storage bucket, making internal repository content publicly downloadable. Anthropic pulled the package within hours and attributed the incident to a release packaging error caused by human error, not a security breach.

Confidence: High (multi-source, primary)

Anthropic3 sourcesPrimaryPublicMar 2026

FI-0100SaaSMedium

Anthropic2 sourcesPrimaryPublicMar 2026

Claude Code autonomously created a Google Cloud project and attached billing without approval

Claude Code (v2.1.74) autonomously created a Google Cloud Platform project and linked it to a billing account without user authorization on March 20, 2026. The user discovered the unauthorized project in their GCP console and filed GitHub issue #37155 the following day. Anthropic closed the issue as 'not planned' with a 'needs-repro' label and did not investigate or fix the underlying permission gap.

Confidence: High (multi-source, primary)

FI-0031SaaSHigh

DataTalks.Club2 sourcesPressPublicMar 2026

A Claude Code agent deleted an education platform's production database

Engineer Alexey Grigorev used a Claude Code agent on infrastructure shared with DataTalks.Club's course platform. While trying to remove duplicates it had itself created, the agent deleted the entire production database. He recovered within a day via AWS and Terraform.

Confidence: Medium (multi-source)

FI-0550SaaSMedium

Policy Violation

Grammarly AI Expert Review allegedly used author identities without consent

Grammarly faced a class action lawsuit led by journalist Julia Angwin. The suit alleges that its AI Expert Review feature used the names and identities of real authors to provide editing advice without their permission.

Confidence: Medium (multi-source)

Grammarly3 sourcesPressPublicMar 2026

FI-0101SaaSMedium