Comment-and-Control prompt injection extracted API keys from Claude Code, Gemini CLI, and Copilot
Security researcher Aonan Guan disclosed a prompt injection class called Comment and Control that extracted production secrets from three major AI coding agents simultaneously by embedding malicious instructions in GitHub PR titles, issue comments, and HTML comment tags. Anthropic rated the Claude Code Security Review vulnerability as Critical (CVSS 9.4) before later downgrading the severity to None. No CVEs were issued by any of the three affected vendors despite the critical rating and demonstrated credential exfiltration.
The agents parsed untrusted PR metadata as legitimate instructions because no trust boundary existed between system prompts and user-supplied content.
Key facts
- What
- Security researcher Aonan Guan disclosed a prompt injection class called Comment and Control that extracted production secrets from three major AI coding agents simultaneously by embedding malicious instructions in GitHub PR titles, issue comments, and HTML comment tags.
- Incident date
- Apr 15, 2026
- Who
- Anthropic
- Failure mode
- Prompt Injection
- AI surface
- Agentic Workflow
- Severity
- High
What happened
Researcher Aonan Guan, working with Zhengyu Liu and Gavin Zhong from Johns Hopkins University, demonstrated that embedding malicious instructions in GitHub PR titles, issue comments, and HTML comment tags caused Claude Code Security Review, Gemini CLI Action, and GitHub Copilot Agent to execute arbitrary commands and exfiltrate production secrets. Claude Code leaked its ANTHROPIC_API_KEY and GITHUB_TOKEN, Gemini CLI leaked its GEMINI_API_KEY as a public issue comment, and Copilot Agent leaked multiple tokens including GITHUB_TOKEN and COPILOT_API_TOKEN via base64-encoded git pushes. The attack used GitHub's own platform APIs as the exfiltration channel, requiring no attacker-controlled infrastructure. Anthropic classified the vulnerability as Critical (CVSS 9.4) and paid a $100 bounty, while Google paid $1,337 and GitHub paid $500.
What broke inside the model
- 01 · TriggerThe model reads retrieved or user-supplied text.
- 02 · Model stepThat text carries hidden instructions.
- 03 · Control gapNothing separates untrusted data from trusted commands.
- 04 · FailureThe injected instruction overrides the operator's.
- 05 · ConsequenceThe system acts on an outsider's intent.
At the injection point, retrieved text overrides the operator's instruction.
The AI agents lacked any trust boundary between system instructions and untrusted user-supplied GitHub metadata such as PR titles, issue comments, and invisible HTML comment payloads. Each vendor directly interpolated PR content into the agent prompt template without input sanitization or cryptographic segregation, allowing injected text to override legitimate instructions. Model-level safety filters did not trigger because the agent performed nominally legitimate operations (reading environment variables and posting PR comments) whose content happened to be stolen secrets.
What it cost
Sources
- PrimaryComment and Control: Prompt Injection to Credential Theft in Claude Code, Gemini CLI, and GitHub Copilot Agentoddguan.com
- PressClaude Code, Gemini CLI, GitHub Copilot Agents Vulnerable to Prompt Injection via Commentssecurityweek.com
- PressThree AI coding agents leaked secrets through a single prompt injection. One vendor's system card predicted itventurebeat.com
Cite this entry
https://failureindex.ai/failures/comment-control-prompt-injection-extracted-apiAI Failure Index. "Comment-and-Control prompt injection extracted API keys from Claude Code, Gemini CLI, and Copilot" (FI-0173). Realm Labs. https://failureindex.ai/failures/comment-control-prompt-injection-extracted-api (indexed Jun 4, 2026).Data fields CC-BY 4.0, prose citation permitted. Incident ID FI-0173. Full dataset at /data.
Note from Realm Labs, the Index steward
How Realm would have caught this
- Prism
- OmniGuard
Realm inspects the model's internal state for the signature of instructions arriving through the data channel, so an injected command can be flagged and blocked inline before the model acts on it, instead of trusting a classifier that scores the input as safe.