LAION-5B dataset used to train Stability AI models found to contain child sexual abuse material

Researchers from the Stanford Internet Observatory identified thousands of CSAM images in the LAION-5B dataset used to train Stability AI's models. This highlighted a critical failure in the safety and curation of large-scale training data.

Stability AI · Incident Dec 21, 2023 · Indexed Jun 16, 2026 · 3 sources

The ingestion of over 3,200 CSAM images into LAION-5B demonstrates the critical failure of uncurated web-scale data scraping in AI training.
What
Researchers from the Stanford Internet Observatory identified thousands of CSAM images in the LAION-5B dataset used to train Stability AI's models.
Incident date
Dec 21, 2023
Who
Stability AI
Failure mode
Brand & Safety Incident
AI surface
Chatbot
Severity
High

What happened

The Stanford Internet Observatory discovered over 3,200 images of suspected child sexual abuse material in the LAION-5B dataset. This dataset was used by Stability AI to train Stable Diffusion 1.5 and other generative AI models. The discovery highlighted significant safety failures in the curation of massive web-scraped datasets.

What broke inside the model

Failure path · mode profile · Brand & Safety Incident
  1. 01 · TriggerA user prompts the model in public view.
  2. 02 · Model stepThe model produces unsafe or off-brand output.
  3. 03 · Control gapNo filter holds the line before publish.
  4. 04 · FailureThe output goes public unchecked.
  5. 05 · ConsequenceA reputational or safety incident lands.

A contained signal crosses into output that goes public.

The failure was caused by the ingestion of uncurated web data without adequate filtration for illegal content. The dataset relied on a massive scrape of the open web, which allowed child sexual abuse material to be included in the training set.

Public visibilityHigh
Regulatory exposureActive
Customer impactClass-wide
Financial impactUnknown
Time to disclosureMonths
  1. PrimaryIdentifying and Eliminating CSAM in Generative ML Training Datapurl.stanford.edu
  2. PressChild sexual abuse material found on popular dataset shows risks to AI researchfedscoop.com
  3. PrimaryReleasing Re-LAION-5B: transparent iteration on LAION-5Blaion.ai
Permalinkhttps://failureindex.ai/failures/stability-training-dataset-laion-found-contain
CitationAI Failure Index. "LAION-5B dataset used to train Stability AI models found to contain child sexual abuse material" (FI-0574). Realm Labs. https://failureindex.ai/failures/stability-training-dataset-laion-found-contain (indexed Jun 16, 2026).
Share cardA branded image of this record for posts and slides.

Data fields CC-BY 4.0, prose citation permitted. Incident ID FI-0574. Full dataset at /data.

Note from Realm Labs, the Index steward

How Realm would have caught this

Controls for this failure mode
  • Prism
  • OmniGuard
  • AI Detection & Response (AIDR)

Realm watches the model's internal state for the signature of unsafe or off-brand generation and can block or reroute the output before it becomes public, in real time rather than after it has been screenshotted.