LAION-5B dataset used to train Stability AI models found to contain child sexual abuse material
Researchers from the Stanford Internet Observatory identified thousands of CSAM images in the LAION-5B dataset used to train Stability AI's models. This highlighted a critical failure in the safety and curation of large-scale training data.
The ingestion of over 3,200 CSAM images into LAION-5B demonstrates the critical failure of uncurated web-scale data scraping in AI training.
Key facts
- What
- Researchers from the Stanford Internet Observatory identified thousands of CSAM images in the LAION-5B dataset used to train Stability AI's models.
- Incident date
- Dec 21, 2023
- Who
- Stability AI
- Failure mode
- Brand & Safety Incident
- AI surface
- Chatbot
- Severity
- High
What happened
The Stanford Internet Observatory discovered over 3,200 images of suspected child sexual abuse material in the LAION-5B dataset. This dataset was used by Stability AI to train Stable Diffusion 1.5 and other generative AI models. The discovery highlighted significant safety failures in the curation of massive web-scraped datasets.
What broke inside the model
- 01 · TriggerA user prompts the model in public view.
- 02 · Model stepThe model produces unsafe or off-brand output.
- 03 · Control gapNo filter holds the line before publish.
- 04 · FailureThe output goes public unchecked.
- 05 · ConsequenceA reputational or safety incident lands.
A contained signal crosses into output that goes public.
The failure was caused by the ingestion of uncurated web data without adequate filtration for illegal content. The dataset relied on a massive scrape of the open web, which allowed child sexual abuse material to be included in the training set.
What it cost
Sources
- PrimaryIdentifying and Eliminating CSAM in Generative ML Training Datapurl.stanford.edu
- PressChild sexual abuse material found on popular dataset shows risks to AI researchfedscoop.com
- PrimaryReleasing Re-LAION-5B: transparent iteration on LAION-5Blaion.ai
Cite this entry
https://failureindex.ai/failures/stability-training-dataset-laion-found-containAI Failure Index. "LAION-5B dataset used to train Stability AI models found to contain child sexual abuse material" (FI-0574). Realm Labs. https://failureindex.ai/failures/stability-training-dataset-laion-found-contain (indexed Jun 16, 2026).Data fields CC-BY 4.0, prose citation permitted. Incident ID FI-0574. Full dataset at /data.
Note from Realm Labs, the Index steward
How Realm would have caught this
- Prism
- OmniGuard
- AI Detection & Response (AIDR)
Realm watches the model's internal state for the signature of unsafe or off-brand generation and can block or reroute the output before it becomes public, in real time rather than after it has been screenshotted.