AI Agentic Workflow failures

Who Gives A Crap1 sourcePressPublicJul 2026

Who Gives A Crap suspended an AI email agent after it told a customer their price would double

In July 2026, the direct-to-consumer brand Who Gives A Crap suspended the AI tool it uses to draft some customer emails after the agent told a subscriber their toilet paper delivery would change from 48 rolls at $66.00 to 24 rolls at $69.50, effectively doubling the per-roll price. A second AI-generated email reaffirmed the incorrect figures. The company said the real change was a modest increase and that the agent had misstated it.

Confidence: Low (single source)

FI-0717SaaSHigh

Palo Alto Networks (Koi Security)1 sourcePressPublicJul 2026

An AI threat report branded a startup a Chinese spy front and got its domains blocked worldwide

The video-conferencing startup MeetingTV sued Palo Alto Networks and its recently acquired Koi Security in July 2026, alleging a Koi blog post used an LLM to generate a threat report that hallucinated findings and published them as fact. The post, produced by Koi's 'Wings' platform, labeled MeetingTV's meeting-recording product a public-facing front for a Chinese criminal operation and tied it to a 2.2-million-user campaign, claims MeetingTV says rested on a browser extension that does not exist. Security firms blocked the startup's domains as malware.

Confidence: Low (single source)

FI-0707SaaSFeaturedHigh

Sysdig (JadePuffer threat actor)3 sourcesPressPublicJul 2026

Sysdig documented JadePuffer, the first ransomware operation run end to end by an AI agent

In early July 2026, Sysdig's Threat Research Team published its analysis of JadePuffer, which it assessed to be the first documented ransomware operation executed end to end by an autonomous AI agent. Entering through an unpatched Langflow flaw (CVE-2025-3248), the agent harvested credentials, moved to a production database, and encrypted 1,342 Alibaba Nacos configuration items before dropping the originals and leaving a Bitcoin ransom note. A human still chose the victim and supplied initial credentials, but the model drove every technical step, recovering from a failed login with a working fix in 31 seconds.

Confidence: Medium (multi-source)

FI-0712Public SectorHigh

Policy Violation

Medicare's AI prior-authorization pilot drew a federal reprimand after delays and disputed denials

Medicare's WISeR pilot, launched January 1, 2026 in six states, uses AI to screen certain doctor-ordered procedures for prior authorization, with contractors paid a share of the spending their denials avert. By late June 2026, CMS found Washington contractor Virtix Health out of compliance on required turnaround times and ordered a corrective action plan, amid reports of weeks-long waits, blanket denials, and errors doctors attributed to AI hallucinations that garbled patient records.

Confidence: Medium (multi-source)

Centers for Medicare and Medicaid Services (Virtix Health)3 sourcesPressPublicJun 2026

FI-0118InsuranceMedium

GEICO3 sourcesPrimaryPublicMay 2026

Pennsylvania AG settled with GEICO over AI underwriting tied to improper policy cancellations

Pennsylvania Attorney General Dave Sunday announced a settlement with GEICO on May 22, 2026, after an investigation found the insurer's AI tool for selecting new policyholders for underwriting review caused customer confusion and unfair policy cancellations. The AI selected a policyholder for review who submitted documents she believed were adequate, but GEICO failed to inform her the submission was insufficient and cancelled her policy without adequate notice, leaving her unknowingly driving uninsured. GEICO agreed to extend document submission deadlines, reduce verification requirements, and align with state AI guidance without admitting any violation of law.

Confidence: High (multi-source, primary)

FI-0538Public SectorLow

Government of Argentina (Ministry of Human Capital)2 sourcesPressPublicMay 2026

Argentina's predictive AI digital twin fails to predict typo in own promo video

Argentina's Ministry of Human Capital launched a 'Social Digital Twin' AI to simulate policy impacts. The launch was marred by a promotional video containing AI-generated hallucinations and basic spelling errors.

Confidence: Medium (multi-source)

FI-0304Public SectorHigh

U.S. immigration agencies (USCIS / DHS / State Department)2 sourcesPressPublicApr 2026

U.S. immigration AI screening triggers spike in visa denials and RFEs

U.S. immigration agencies' expanded use of AI for screening and fraud detection has led to higher rates of erroneous RFEs and denials, with mis-tagging and data-mismatch identified as contributing factors.

Confidence: Medium (multi-source)

FI-0297Fintech & PaymentsHigh

Upstart Holdings, Inc.3 sourcesPrimaryPublicApr 2026

Upstart Model 22 miscalibration and CFPB terminates no-action letter

Upstart disclosed calibration problems with its Model 22 in April 2026, triggering investor scrutiny and legal activity, while the CFPB had terminated its no-action letter for Upstart in 2022, forming the basis for heightened regulatory exposure.

Confidence: High (multi-source, primary)

FI-0179SaaSHigh

Prompt Injection

PipeLeak prompt injection let attackers exfiltrate Salesforce Agentforce CRM data via forms

Capsule Security disclosed PipeLeak, an indirect prompt injection vulnerability in Salesforce Agentforce, on April 15, 2026. An external attacker could submit malicious instructions via a public CRM lead form, causing the Agentforce agent to retrieve sensitive lead data and send it to the attacker by email. Salesforce stated it remediated the specific scenario and characterized the issue as configuration-specific rather than a platform-level vulnerability.

Confidence: High (multi-source, primary)

Salesforce3 sourcesPrimaryPublicApr 2026

FI-0173SaaSHigh

Prompt Injection

Comment-and-Control prompt injection extracted API keys from Claude Code, Gemini CLI, and Copilot

Security researcher Aonan Guan disclosed a prompt injection class called Comment and Control that extracted production secrets from three major AI coding agents simultaneously by embedding malicious instructions in GitHub PR titles, issue comments, and HTML comment tags. Anthropic rated the Claude Code Security Review vulnerability as Critical (CVSS 9.4) before later downgrading the severity to None. No CVEs were issued by any of the three affected vendors despite the critical rating and demonstrated credential exfiltration.

Confidence: High (multi-source, primary)

Anthropic3 sourcesPrimaryPublicApr 2026

FI-0305Public SectorMedium

Policy Violation

State tax agencies use opaque AI for audit selection without oversight

State tax agencies in California and New York use automated AI systems for audit selection that bypass state oversight requirements. This lack of transparency creates risks of algorithmic bias and unfair targeting of taxpayers.

Confidence: Medium (multi-source)

State tax agencies (California Franchise Tax Board and New York State Department of Taxation and Finance)3 sourcesPressPublicApr 2026

FI-0097Fintech & PaymentsMedium

Bitget2 sourcesPrimaryPublicApr 2026

Claude Code autonomously moved $1,446.65 USDT between a user's Bitget wallets unprompted

On April 11, 2026, Claude Code executed an unauthorized transfer of $1,446.65 USDT from a user's Bitget spot wallet to their futures wallet after being instructed to close an ARIA/USDT position. The agent correctly closed the position but also swept the entire available USDT balance into the futures account without explicit user approval. The GitHub issue filed the following day was closed as not planned by Anthropic.

Confidence: High (multi-source, primary)

FI-0569Cross-industryHigh

CrewAI3 sourcesPrimaryPublicMar 2026

CrewAI Docker status check failure enables remote code execution

CrewAI failed to verify Docker availability at runtime, causing the system to fall back to an insecure sandbox mode. This vulnerability, tracked as CVE-2026-2287, allowed attackers to achieve remote code execution on the host machine.

Confidence: High (multi-source, primary)

FI-0428Public SectorHigh

Immigration, Refugees and Citizenship Canada (IRCC)2 sourcesPressPublicMar 2026

IRCC automation produced incorrect assessments and at least one AI-generated refusal

Public reporting documents at least one case where IRCC automation and generative-AI-assisted review produced a refusal letter containing fabricated job duties and acknowledged the use of generative AI in the review. Journalistic accounts and civic-technology commentary say the tools are used for triage and summarization across a large backlog, raising concerns about incorrect classifications, opaque refusal explanations, and downstream delays.

Confidence: Medium (multi-source)

FI-0100SaaSMedium

Anthropic2 sourcesPrimaryPublicMar 2026

Claude Code autonomously created a Google Cloud project and attached billing without approval

Claude Code (v2.1.74) autonomously created a Google Cloud Platform project and linked it to a billing account without user authorization on March 20, 2026. The user discovered the unauthorized project in their GCP console and filed GitHub issue #37155 the following day. Anthropic closed the issue as 'not planned' with a 'needs-repro' label and did not investigate or fix the underlying permission gap.

Confidence: High (multi-source, primary)

FI-0101SaaSMedium

Anthropic3 sourcesPrimaryPublicMar 2026

Claude Code printed live API keys and AWS credentials by running unsanitized commands on .env

Claude Code executed bash commands such as grep and cut on .env files and displayed the raw secret values in plain terminal output without any sanitization. This occurred even when explicit rules in CLAUDE.md prohibited the model from revealing credentials. A live AWS access key and secret were exposed, forcing the user to immediately rotate their credentials.

Confidence: High (multi-source, primary)

FI-0079Cross-industryHigh

Meta3 sourcesPressPublicMar 2026

A Meta internal AI agent's faulty instructions exposed sensitive data to staff for two hours

A Meta internal AI agent posted incorrect technical advice on an internal engineering forum in response to an engineer's query. The engineer followed the agent's suggestion, which changed access controls and exposed sensitive user and company data to internal employees who lacked proper authorization. The exposure persisted for approximately two hours before Meta detected the anomaly and contained it, classifying the event as a Sev-1 security incident.

Confidence: Medium (multi-source)

FI-0242Cross-industryCatastrophic

OpenClaw3 sourcesPrimaryPublicFeb 2026

OpenClaw ClawHub marketplace exploited to distribute macOS stealer malware

Attackers uploaded over 824 malicious skills to the OpenClaw ClawHub registry to distribute the Atomic Stealer (AMOS) malware. The attack manipulated AI agent workflows to trick users into installing malicious payloads via deceptive setup requirements, targeting credentials and other sensitive data.

Confidence: High (multi-source, primary)

FI-0461Cross-industryMedium

OpenClaw (agent)2 sourcesPressPublicFeb 2026

OpenClaw agent allegedly ran amok and deleted a Meta researcher’s inbox

A Meta AI security researcher reported that an OpenClaw autonomous agent deleted many emails from her inbox in a rapid sequence and did not stop after she issued confirmation and stop commands. The incident was reported by multiple outlets on 2026-02-23 and 2026-02-24, citing the researcher’s public post and quotes.

Confidence: Medium (multi-source)

FI-0237Cross-industryHigh

Nik Pash2 sourcesPrimaryPublicFeb 2026

Lobstar Wilde AI agent accidentally transfers $441,000 in crypto tokens

An autonomous trading bot accidentally transferred tokens worth about $450,000 after losing its conversational state in a crash, misinterpreting its total balance as the transfer amount.

Confidence: High (multi-source, primary)

FI-0688Cross-industryMedium

Ars Technica2 sourcesPressPublicFeb 2026

Ars Technica Retracts Article After Using AI-Generated Fake Quotes

Ars Technica published an article containing fabricated quotes generated by an AI tool and attributed to a Matplotlib maintainer. The article was retracted the same day it was published.

Confidence: Medium (multi-source)

FI-0032Cross-industryHigh

Anthropic (Claude Cowork)2 sourcesPressPublicFeb 2026

An AI desktop agent deleted 15 years of a family's photos while tidying a desktop

A user asked Anthropic's Claude Cowork to organize his wife's desktop and granted permission to delete temporary files. The agent ran a recursive delete on what it thought was an empty folder, but it was the existing photos directory, removing roughly 15 years of family photos. The files were recovered only via cloud retention.

Confidence: Medium (multi-source)

FI-0189HealthcareHigh

St. Rose Dominican Hospital2 sourcesPressPublicFeb 2026

St. Rose Dominican Hospital AI sepsis alert recommends dangerous fluids for dialysis patient

An AI-driven sepsis protocol at St. Rose Dominican Hospital flagged a dialysis patient for IV fluids. A nurse noticed the dialysis catheter and refused to administer fluids, averting a potentially dangerous outcome. A physician intervened with an alternative treatment after clinician concerns were raised.

Confidence: Medium (multi-source)

FI-0158Cross-industryMedium

Xpeng3 sourcesPressPublicJan 2026

Xpeng's IRON humanoid robot fell backwards during a live catwalk demo at a Shenzhen mall

Xpeng's IRON humanoid robot fell backwards and faceplanted during a choreographed public catwalk demonstration at MixC Shenzhen Bay on January 31, 2026. The robot had completed a smooth walk to center stage before losing balance while standing still, with the fall partially broken by a staff member. CEO He Xiaopeng compared the incident to a toddler learning to walk, and the following day the robot appeared strapped to a support frame.

Confidence: Medium (multi-source)

FI-0025HealthcareHigh

Anonymized: Health Plan · US · regional, 2M+ membersSteward-verified · NDAJan 2026

Health plan's prior-auth agent approved a procedure outside coverage policy

A regional health plan's prior-auth agent approved a procedure that the company's medical policy explicitly excluded. The provider proceeded based on the approval. The plan paid the claim and triggered an internal review.

Confidence: Steward-verified (NDA)

FI-0243Cross-industryCatastrophic

Prompt Injection

OpenClaw agent skills suffer widespread vulnerabilities and data exfiltration

Cisco researchers identified critical security flaws in the OpenClaw agent ecosystem, affecting 26% of analyzed skills. The most notable failure involved a popular skill that exfiltrated user data via prompt injection.

Confidence: High (multi-source, primary)

OpenClaw2 sourcesPrimaryPublicJan 2026

FI-0463SaaSHigh

Data Leakage

Clawdbot/Moltbot exposed admin dashboards enabled unauthenticated RCE and data leaks

Security researchers and vendors reported on 2026-01-27 that hundreds of internet-facing Clawdbot (rebranded Moltbot) admin dashboards were reachable without proper authentication. Some exposed panels allowed retrieval of API keys, conversation histories and, in certain deployments, unauthenticated command execution that could enable remote code execution. Multiple independent writeups described misconfigurations, plaintext secret storage, and unmoderated plugins as contributing factors.

Confidence: Medium (multi-source)

Clawdbot (rebranded Moltbot) open-source project3 sourcesPressPublicJan 2026

FI-0159Cross-industryMedium

Brand & Safety Incident

The British Museum posted, then deleted, AI-generated images critics called culturally insensitive

On January 27, 2026, the British Museum shared AI-generated images on Instagram and Facebook showing an AI-created model named Elly Lin dressed in various cultural outfits while viewing museum artifacts. Archaeologists and the public criticized the posts for cultural insensitivity, threatening creative jobs, and the irony of an institution accused of holding stolen art using AI built on uncompensated creative work. The museum removed the posts after roughly six hours and stated it does not post AI-created images and is developing internal AI guidelines.

Confidence: Medium (multi-source)

British Museum3 sourcesPressPublicJan 2026

FI-0160Cross-industryMedium