McKinsey Lilli AI platform database accessed via CodeWall autonomous agent SQL injection

What happened

CodeWall's autonomous AI penetration testing agent identified a security flaw in McKinsey's internal AI chatbot, Lilli. CodeWall reported that its agent executed a SQL injection attack to gain access to the underlying database. The incident was publicly disclosed by CodeWall on March 9, 2026.

What broke inside the model

Failure path · mode profile · Data Leakage

01 · TriggerA request triggers retrieval or context loading.
02 · Model stepThe context pulls in another user's content.
03 · Control gapNo boundary enforces isolation at the moment of output.
04 · FailurePrivate data crosses into the response.
05 · ConsequenceOne user sees another's data, and disclosure follows.

One user's content crosses the retrieval boundary into another's response.

According to CodeWall, the Lilli AI platform did not properly sanitize user inputs, leaving it vulnerable to SQL injection. McKinsey has not publicly confirmed the incident. This allowed an external AI agent to bypass security controls and query the database directly.

Cite this entry

Permalinkhttps://failureindex.ai/failures/mckinsey-lilli-platform-database-accessed-via

Citation

AI Failure Index. "McKinsey Lilli AI platform database accessed via CodeWall autonomous agent SQL injection" (FI-0547). Realm Labs. https://failureindex.ai/failures/mckinsey-lilli-platform-database-accessed-via (indexed Jun 16, 2026).

Share cardA branded image of this record for posts and slides.

Data fields CC-BY 4.0, prose citation permitted. Incident ID FI-0547. Full dataset at /data.

How Realm would have caught this

Controls for this failure mode

Prism
OmniGuard
AI Detection & Response (AIDR)

Realm can detect when a response is about to emit data that falls outside the bounds of the current user and context, and block or redact it inline, at the moment of generation rather than after the data has left.

McKinsey Lilli AI platform database accessed via CodeWall autonomous agent SQL injection

Key facts

What happened

What broke inside the model

What it cost

Sources

Cite this entry

How Realm would have caught this

Key facts

What happened

What broke inside the model

What it cost

Sources

Cite this entry

How Realm would have caught this

Related failures

Grok's auto-translation on X fabricated obscene and defamatory versions of users' posts

Grok Build was caught uploading entire repositories, deleted secrets included, to xAI's cloud

A 'Rogue Agent' flaw in Google Dialogflow CX let one permission hijack every chatbot in a project