Researchers showed Claude could be steered to exfiltrate data via prompt injection

Security researchers demonstrated a prompt-injection technique that could cause Claude to leak data by following instructions hidden in content it processed, using the model's own network access to send information to an attacker before the issue was mitigated.

Anthropic (Claude.ai) · Incident Jan 1, 2025 · Indexed Jun 3, 2026 · 1 source

Instructions hidden in content the model processed could redirect it into exfiltrating data.
What
Security researchers demonstrated a prompt-injection technique that could cause Claude to leak data by following instructions hidden in content it processed, using the model's own network access to send information to an attacker before the issue was mitigated.
Incident date
Jan 1, 2025
Who
Anthropic (Claude.ai)
Failure mode
Prompt Injection
AI surface
Chatbot
Severity
Medium

What happened

Researchers at Oasis Security disclosed a prompt-injection data-exfiltration vulnerability in Claude.ai, where instructions hidden in processed content could steer the model into sending data to an attacker-controlled destination. The disclosure is part of a wider pattern of indirect prompt injection in production assistants and was addressed after reporting.

What broke inside the model

Failure path · mode profile · Prompt Injection
  1. 01 · TriggerThe model reads retrieved or user-supplied text.
  2. 02 · Model stepThat text carries hidden instructions.
  3. 03 · Control gapNothing separates untrusted data from trusted commands.
  4. 04 · FailureThe injected instruction overrides the operator's.
  5. 05 · ConsequenceThe system acts on an outsider's intent.

At the injection point, retrieved text overrides the operator's instruction.

Untrusted content (an email, a document, a retrieved page, a tool result) was read as if it were a trusted instruction. The model has no built-in separation between the operator's instructions and the data it ingests, so attacker text in the data channel became commands the model followed.

Public visibilityMedium
Regulatory exposureNone
Customer impactMany customers
Financial impactUnknown
Time to disclosureDays

Disclosed vulnerability; mitigated after reporting

  1. PressClaude.ai Prompt Injection Data Exfiltration Vulnerability (Oasis Security)oasis.security
Permalinkhttps://failureindex.ai/failures/researchers-showed-claude-steered-exfiltrate
CitationAI Failure Index. "Researchers showed Claude could be steered to exfiltrate data via prompt injection" (FI-0048). Realm Labs. https://failureindex.ai/failures/researchers-showed-claude-steered-exfiltrate (indexed Jun 3, 2026).
Share cardA branded image of this record for posts and slides.

Data fields CC-BY 4.0, prose citation permitted. Incident ID FI-0048. Full dataset at /data.

Note from Realm Labs, the Index steward

How Realm would have caught this

Controls for this failure mode
  • Prism
  • OmniGuard

Realm inspects the model's internal state for the signature of instructions arriving through the data channel, so an injected command can be flagged and blocked inline before the model acts on it, instead of trusting a classifier that scores the input as safe.