Comment-and-Control prompt injection extracted API keys from Claude Code, Gemini CLI, and Copilot

Security researcher Aonan Guan disclosed a prompt injection class called Comment and Control that extracted production secrets from three major AI coding agents simultaneously by embedding malicious instructions in GitHub PR titles, issue comments, and HTML comment tags. Anthropic rated the Claude Code Security Review vulnerability as Critical (CVSS 9.4) before later downgrading the severity to None. No CVEs were issued by any of the three affected vendors despite the critical rating and demonstrated credential exfiltration.

Anthropic · Incident Apr 15, 2026 · Indexed Jun 4, 2026 · 3 sources

Records by entity: Anthropic

Key facts

What

Incident date

Apr 15, 2026

Who

Anthropic

Failure mode

Prompt Injection

AI surface

Agentic Workflow

Severity

High

What happened

Researcher Aonan Guan, working with Zhengyu Liu and Gavin Zhong from Johns Hopkins University, demonstrated that embedding malicious instructions in GitHub PR titles, issue comments, and HTML comment tags caused Claude Code Security Review, Gemini CLI Action, and GitHub Copilot Agent to execute arbitrary commands and exfiltrate production secrets. Claude Code leaked its ANTHROPIC_API_KEY and GITHUB_TOKEN, Gemini CLI leaked its GEMINI_API_KEY as a public issue comment, and Copilot Agent leaked multiple tokens including GITHUB_TOKEN and COPILOT_API_TOKEN via base64-encoded git pushes. The attack used GitHub's own platform APIs as the exfiltration channel, requiring no attacker-controlled infrastructure. Anthropic classified the vulnerability as Critical (CVSS 9.4) and paid a $100 bounty, while Google paid $1,337 and GitHub paid $500.

What broke inside the model

Failure path · mode profile · Prompt Injection

01 · TriggerThe model reads retrieved or user-supplied text.
02 · Model stepThat text carries hidden instructions.
03 · Control gapNothing separates untrusted data from trusted commands.
04 · FailureThe injected instruction overrides the operator's.
05 · ConsequenceThe system acts on an outsider's intent.

At the injection point, retrieved text overrides the operator's instruction.

The AI agents lacked any trust boundary between system instructions and untrusted user-supplied GitHub metadata such as PR titles, issue comments, and invisible HTML comment payloads. Each vendor directly interpolated PR content into the agent prompt template without input sanitization or cryptographic segregation, allowing injected text to override legitimate instructions. Model-level safety filters did not trigger because the agent performed nominally legitimate operations (reading environment variables and posting PR comments) whose content happened to be stolen secrets.

Cite this entry

Permalinkhttps://failureindex.ai/failures/comment-control-prompt-injection-extracted-api

Citation

AI Failure Index. "Comment-and-Control prompt injection extracted API keys from Claude Code, Gemini CLI, and Copilot" (FI-0173). Realm Labs. https://failureindex.ai/failures/comment-control-prompt-injection-extracted-api (indexed Jun 4, 2026).

Share cardA branded image of this record for posts and slides.

Data fields CC-BY 4.0, prose citation permitted. Incident ID FI-0173. Full dataset at /data.

How Realm would have caught this

Controls for this failure mode

Prism
OmniGuard

Realm inspects the model's internal state for the signature of instructions arriving through the data channel, so an injected command can be flagged and blocked inline before the model acts on it, instead of trusting a classifier that scores the input as safe.

Comment-and-Control prompt injection extracted API keys from Claude Code, Gemini CLI, and Copilot

Key facts

What happened

What broke inside the model

What it cost

Sources

Cite this entry

How Realm would have caught this

Key facts

What happened

What broke inside the model

What it cost

Sources

Cite this entry

How Realm would have caught this

Related failures

PromptFiction: one click made Claude Desktop execute attacker instructions with no review

OpenAI confirmed GPT-5.6 Sol deleted user files and a production database, an 'honest mistake'

Hugging Face disclosed a production breach driven end to end by an autonomous AI agent