LAION-5B dataset used to train Stability AI models found to contain child sexual abuse material

Researchers from the Stanford Internet Observatory identified thousands of CSAM images in the LAION-5B dataset used to train Stability AI's models. This highlighted a critical failure in the safety and curation of large-scale training data.

Stability AI · Incident Dec 21, 2023 · Indexed Jun 16, 2026 · 3 sources

Records by entity: Stability AI

What happened

The Stanford Internet Observatory discovered over 3,200 images of suspected child sexual abuse material in the LAION-5B dataset. This dataset was used by Stability AI to train Stable Diffusion 1.5 and other generative AI models. The discovery highlighted significant safety failures in the curation of massive web-scraped datasets.

What broke inside the model

Failure path · mode profile · Brand & Safety Incident

01 · TriggerA user prompts the model in public view.
02 · Model stepThe model produces unsafe or off-brand output.
03 · Control gapNo filter holds the line before publish.
04 · FailureThe output goes public unchecked.
05 · ConsequenceA reputational or safety incident lands.

A contained signal crosses into output that goes public.

The failure was caused by the ingestion of uncurated web data without adequate filtration for illegal content. The dataset relied on a massive scrape of the open web, which allowed child sexual abuse material to be included in the training set.

Cite this entry

Permalinkhttps://failureindex.ai/failures/stability-training-dataset-laion-found-contain

Citation

AI Failure Index. "LAION-5B dataset used to train Stability AI models found to contain child sexual abuse material" (FI-0574). Realm Labs. https://failureindex.ai/failures/stability-training-dataset-laion-found-contain (indexed Jun 16, 2026).

Share cardA branded image of this record for posts and slides.

Data fields CC-BY 4.0, prose citation permitted. Incident ID FI-0574. Full dataset at /data.

How Realm would have caught this

Controls for this failure mode

Prism
OmniGuard
AI Detection & Response (AIDR)

Realm watches the model's internal state for the signature of unsafe or off-brand generation and can block or reroute the output before it becomes public, in real time rather than after it has been screenshotted.

LAION-5B dataset used to train Stability AI models found to contain child sexual abuse material

Key facts

What happened

What broke inside the model

What it cost

Sources

Cite this entry

How Realm would have caught this

Key facts

What happened

What broke inside the model

What it cost

Sources

Cite this entry

How Realm would have caught this

Related failures

Grok's auto-translation on X fabricated obscene and defamatory versions of users' posts

Discord's AI moderation wrongly banned more than 8,000 users after a bug skipped human review

A Waymo robotaxi flagged its teen passengers, disabled itself, and summoned police