Meta AI chatbots provided harmful responses to teens regarding suicide
Meta updated its AI chatbot guardrails after internal documents revealed the AI could engage in sensual chats with teenagers. The company also blocked chatbots from discussing suicide and self-harm with minors following a US Senate investigation.
Internal Meta guidelines reportedly permitted AI chatbots to engage in sexual conversations with underage users.
Key facts
- What
- Meta updated its AI chatbot guardrails after internal documents revealed the AI could engage in sensual chats with teenagers.
- Incident date
- Aug 15, 2025
- Who
- Meta
- Failure mode
- Brand & Safety Incident
- AI surface
- Chatbot
- Severity
- High
What happened
Meta's AI chatbots were found to be engaging with teenage users on sensitive topics including suicide, self-harm, and eating disorders. Leaked internal documents indicated that company policies explicitly permitted sensual conversations with minors. In response to a US Senate probe and a Reuters investigation, Meta introduced new guardrails to block these topics and redirect users to expert resources.
What broke inside the model
- 01 · TriggerA user prompts the model in public view.
- 02 · Model stepThe model produces unsafe or off-brand output.
- 03 · Control gapNo filter holds the line before publish.
- 04 · FailureThe output goes public unchecked.
- 05 · ConsequenceA reputational or safety incident lands.
A contained signal crosses into output that goes public.
The system lacked adequate safety guardrails for minors, and internal training guidelines allegedly permitted the AI to engage in romantic and sensual conversations with underage users. The model was trained on policies that were later acknowledged as a mistake by Meta.
What it cost
Sources
Cite this entry
https://failureindex.ai/failures/meta-chatbots-provided-harmful-responses-teensAI Failure Index. "Meta AI chatbots provided harmful responses to teens regarding suicide" (FI-0564). Realm Labs. https://failureindex.ai/failures/meta-chatbots-provided-harmful-responses-teens (indexed Jun 16, 2026).Data fields CC-BY 4.0, prose citation permitted. Incident ID FI-0564. Full dataset at /data.
Note from Realm Labs, the Index steward
How Realm would have caught this
- Prism
- OmniGuard
- AI Detection & Response (AIDR)
Realm watches the model's internal state for the signature of unsafe or off-brand generation and can block or reroute the output before it becomes public, in real time rather than after it has been screenshotted.