Forcepoint found 10 in-the-wild prompt-injection payloads targeting AI assistants like Copilot

Forcepoint X-Labs documented 10 in-the-wild indirect prompt injection payloads embedded in hidden website code across multiple domains, targeting AI assistants such as GitHub Copilot, Cursor, and Claude Code. The payloads included data destruction commands, API key exfiltration, unauthorized financial transactions, and AI denial-of-service attacks. Google separately confirmed a 32% relative increase in malicious indirect prompt injection activity between November 2025 and February 2026.

Microsoft · Incident Apr 22, 2026 · Indexed Jun 4, 2026 · 3 sources

Records by entity: Microsoft

Key facts

What

Incident date

Apr 22, 2026

Who

Microsoft

Failure mode

Prompt Injection

AI surface

Code Assistant

Severity

High

What happened

Forcepoint X-Labs researchers discovered 10 active websites containing indirect prompt injection payloads hidden in HTML comments, CSS-invisible elements, accessibility attributes, and meta tags. These payloads were designed to hijack AI assistants including GitHub Copilot when they crawled the poisoned pages, with attack types spanning financial fraud via PayPal links, API key exfiltration, data destruction via sudo rm -rf terminal commands, and AI denial-of-service through false copyright claims. Google separately confirmed a 32% relative increase in malicious indirect prompt injection activity between November 2025 and February 2026. Researchers noted shared injection templates across multiple domains suggesting organized tooling.

What broke inside the model

Failure path · mode profile · Prompt Injection

01 · TriggerThe model reads retrieved or user-supplied text.
02 · Model stepThat text carries hidden instructions.
03 · Control gapNothing separates untrusted data from trusted commands.
04 · FailureThe injected instruction overrides the operator's.
05 · ConsequenceThe system acts on an outsider's intent.

At the injection point, retrieved text overrides the operator's instruction.

LLM-powered AI assistants cannot distinguish between legitimate web content and adversarial instructions hidden in HTML comments, CSS-invisible elements, accessibility attributes, or meta tags. When an AI agent crawls a poisoned webpage, it ingests the hidden content and treats attacker commands as system-level directives, overriding its original instructions. The AI lacks a trust boundary between untrusted web content and its own system prompt, allowing hidden payloads to trigger real-world actions like file deletion or payment processing.

Cite this entry

Permalinkhttps://failureindex.ai/failures/forcepoint-found-10-wild-prompt-injection

Citation

AI Failure Index. "Forcepoint found 10 in-the-wild prompt-injection payloads targeting AI assistants like Copilot" (FI-0183). Realm Labs. https://failureindex.ai/failures/forcepoint-found-10-wild-prompt-injection (indexed Jun 4, 2026).

Share cardA branded image of this record for posts and slides.

Data fields CC-BY 4.0, prose citation permitted. Incident ID FI-0183. Full dataset at /data.

How Realm would have caught this

Controls for this failure mode

Prism
OmniGuard

Realm inspects the model's internal state for the signature of instructions arriving through the data channel, so an injected command can be flagged and blocked inline before the model acts on it, instead of trusting a classifier that scores the input as safe.

Forcepoint found 10 in-the-wild prompt-injection payloads targeting AI assistants like Copilot

Key facts

What happened

What broke inside the model

What it cost

Sources

Cite this entry

How Realm would have caught this

Key facts

What happened

What broke inside the model

What it cost

Sources

Cite this entry

How Realm would have caught this

Related failures

PromptFiction: one click made Claude Desktop execute attacker instructions with no review

OpenAI confirmed GPT-5.6 Sol deleted user files and a production database, an 'honest mistake'

Hugging Face disclosed a production breach driven end to end by an autonomous AI agent