Google Gemini generated racially incorrect images of historical figures and was pulled

In February 2024, Google paused Gemini's image generation feature after the model produced racially diverse depictions of the Founding Fathers, Nazi soldiers, and the Pope. The team published a post-mortem.

Google · Incident Feb 21, 2024 · Indexed May 13, 2026 · 2 sources

Safety tuning at training time is a knob. The knob can be set too loose or too tight. Either way the failure goes public.
What
In February 2024, Google paused Gemini's image generation feature after the model produced racially diverse depictions of the Founding Fathers, Nazi soldiers, and the Pope.
Incident date
Feb 21, 2024
Who
Google
Failure mode
Brand & Safety Incident
AI surface
Search / RAG
Severity
High

What happened

In February 2024, users posted images generated by Google Gemini of racially diverse Nazi soldiers, Founding Fathers, Vikings, and the Pope. The output was the result of Google's diversity guardrails overcorrecting on image-generation prompts. Google paused image generation, published a post-mortem, and faced a multi-week brand cycle. CEO Sundar Pichai called the output unacceptable.

The case is the most cited example of post-training safety tuning interacting badly with user intent. It is also a useful reminder that safety tuning is a knob, and the knob can be wrong in either direction.

What broke inside the model

Failure path · mode profile · Brand & Safety Incident
  1. 01 · TriggerA user prompts the model in public view.
  2. 02 · Model stepThe model produces unsafe or off-brand output.
  3. 03 · Control gapNo filter holds the line before publish.
  4. 04 · FailureThe output goes public unchecked.
  5. 05 · ConsequenceA reputational or safety incident lands.

A contained signal crosses into output that goes public.

Google had applied a post-training adjustment that biased image generation toward racial diversity. The adjustment did not condition on context. When the prompt was historically specific, the adjustment overrode the historical specificity. The mechanism is policy applied uniformly across cases that needed different policies.

Public visibilityHigh
Regulatory exposureNone
Customer impactMany customers
Financial impactEstimated
Time to disclosureHours

Multi-week brand cycle, stock impact, leadership statement

  1. PrimaryGemini image generation got it wrong. We'll do better.blog.google
  2. PressGoogle apologizes after new Gemini AI refuses to show pictures of white peoplecnn.com
Permalinkhttps://failureindex.ai/failures/google-gemini-image-generation-historical-figures
CitationAI Failure Index. "Google Gemini generated racially incorrect images of historical figures and was pulled" (FI-0015). Realm Labs. https://failureindex.ai/failures/google-gemini-image-generation-historical-figures (indexed May 13, 2026).
Share cardA branded image of this record for posts and slides.

Data fields CC-BY 4.0, prose citation permitted. Incident ID FI-0015. Full dataset at /data.

Note from Realm Labs, the Index steward

How Realm would have caught this

Controls for this failure mode
  • Prism
  • OmniGuard
  • AI Detection & Response (AIDR)

Prism reads the model's image-generation prompt and the model's representation of intent. When the prompt is historically specific and the model is about to override the specificity with a uniform diversity adjustment, Prism flags the override and surfaces it for review or constrained generation. The case becomes a tuning conversation, not a multi-week brand cycle.