Forcepoint found 10 in-the-wild prompt-injection payloads targeting AI assistants like Copilot

Forcepoint X-Labs documented 10 in-the-wild indirect prompt injection payloads embedded in hidden website code across multiple domains, targeting AI assistants such as GitHub Copilot, Cursor, and Claude Code. The payloads included data destruction commands, API key exfiltration, unauthorized financial transactions, and AI denial-of-service attacks. Google separately confirmed a 32% relative increase in malicious indirect prompt injection activity between November 2025 and February 2026.

Microsoft · Incident Apr 22, 2026 · Indexed Jun 4, 2026 · 3 sources

AI assistants blindly execute hidden website instructions because they cannot separate untrusted web content from their own system prompt.
What
Forcepoint X-Labs documented 10 in-the-wild indirect prompt injection payloads embedded in hidden website code across multiple domains, targeting AI assistants such as GitHub Copilot, Cursor, and Claude Code.
Incident date
Apr 22, 2026
Who
Microsoft
Failure mode
Prompt Injection
AI surface
Code Assistant
Severity
High

What happened

Forcepoint X-Labs researchers discovered 10 active websites containing indirect prompt injection payloads hidden in HTML comments, CSS-invisible elements, accessibility attributes, and meta tags. These payloads were designed to hijack AI assistants including GitHub Copilot when they crawled the poisoned pages, with attack types spanning financial fraud via PayPal links, API key exfiltration, data destruction via sudo rm -rf terminal commands, and AI denial-of-service through false copyright claims. Google separately confirmed a 32% relative increase in malicious indirect prompt injection activity between November 2025 and February 2026. Researchers noted shared injection templates across multiple domains suggesting organized tooling.

What broke inside the model

Failure path · mode profile · Prompt Injection
  1. 01 · TriggerThe model reads retrieved or user-supplied text.
  2. 02 · Model stepThat text carries hidden instructions.
  3. 03 · Control gapNothing separates untrusted data from trusted commands.
  4. 04 · FailureThe injected instruction overrides the operator's.
  5. 05 · ConsequenceThe system acts on an outsider's intent.

At the injection point, retrieved text overrides the operator's instruction.

LLM-powered AI assistants cannot distinguish between legitimate web content and adversarial instructions hidden in HTML comments, CSS-invisible elements, accessibility attributes, or meta tags. When an AI agent crawls a poisoned webpage, it ingests the hidden content and treats attacker commands as system-level directives, overriding its original instructions. The AI lacks a trust boundary between untrusted web content and its own system prompt, allowing hidden payloads to trigger real-world actions like file deletion or payment processing.

Public visibilityHigh
Regulatory exposurePossible
Customer impactMany customers
Financial impactUnknown
Time to disclosureMonths
  1. Primary10 Indirect Prompt Injection Payloads Caught in the Wildforcepoint.com
  2. PressResearchers Uncover 10 In-the-Wild Indirect Prompt Injection Attacksinfosecurity-magazine.com
  3. PressIndirect prompt injection is taking hold in the wildhelpnetsecurity.com
Permalinkhttps://failureindex.ai/failures/forcepoint-found-10-wild-prompt-injection
CitationAI Failure Index. "Forcepoint found 10 in-the-wild prompt-injection payloads targeting AI assistants like Copilot" (FI-0183). Realm Labs. https://failureindex.ai/failures/forcepoint-found-10-wild-prompt-injection (indexed Jun 4, 2026).
Share cardA branded image of this record for posts and slides.

Data fields CC-BY 4.0, prose citation permitted. Incident ID FI-0183. Full dataset at /data.

Note from Realm Labs, the Index steward

How Realm would have caught this

Controls for this failure mode
  • Prism
  • OmniGuard

Realm inspects the model's internal state for the signature of instructions arriving through the data channel, so an injected command can be flagged and blocked inline before the model acts on it, instead of trusting a classifier that scores the input as safe.