MIT study finds Amazon Rekognition facial analysis least accurate for darker-skinned women

A 2018 study revealed that Amazon Rekognition exhibited significant inaccuracies in identifying gender and skin type. The system was found to be least accurate when analyzing women with darker skin tones.

Amazon · Incident Feb 11, 2018 · Indexed Jun 9, 2026 · 2 sources

The system demonstrated a significant drop in accuracy for women of color, exposing a failure in dataset diversity.
What
A 2018 study revealed that Amazon Rekognition exhibited significant inaccuracies in identifying gender and skin type.
Incident date
Feb 11, 2018
Who
Amazon
Failure mode
Policy Violation
AI surface
Computer Vision
Severity
High

What happened

Research conducted by Joy Buolamwini and Timnit Gebru found that Amazon Rekognition frequently misidentified people of color. The system showed an accuracy rate as low as 68.6% for women of color. Subsequent testing by the ACLU further indicated that nearly 40 percent of the system's false matches were of people of color.

What broke inside the model

Failure path · mode profile · Policy Violation
  1. 01 · TriggerA prompt pushes against a deployment boundary.
  2. 02 · Model stepThe model produces the disallowed output.
  3. 03 · Control gapNo enforcement blocks it at generation time.
  4. 04 · FailureThe output crosses the policy line.
  5. 05 · ConsequenceA limit the business set is breached in public.

The output crosses a policy boundary the deployment had defined.

The failure was caused by algorithmic bias stemming from unrepresentative training datasets. The model failed to generalize effectively across diverse skin tones and gender presentations, leading to skewed error rates.

Public visibilityHigh
Regulatory exposureNone
Customer impactMany customers
Financial impactUnknown
Time to disclosureDays
  1. PressStudy finds gender and skin-type bias in commercial AI systemsnews.mit.edu
  2. PressAmazon's Face Recognition Falsely Matched 28 Members of Congressaclu.org
Permalinkhttps://failureindex.ai/failures/amazon-rekognition-facial-analysis-demonstrates-bias
CitationAI Failure Index. "MIT study finds Amazon Rekognition facial analysis least accurate for darker-skinned women" (FI-0355). Realm Labs. https://failureindex.ai/failures/amazon-rekognition-facial-analysis-demonstrates-bias (indexed Jun 9, 2026).
Share cardA branded image of this record for posts and slides.

Data fields CC-BY 4.0, prose citation permitted. Incident ID FI-0355. Full dataset at /data.

Note from Realm Labs, the Index steward

How Realm fits

Controls for this failure mode
  • Prism
  • OmniGuard

This entry sits in the index's predictive wing: a system that scores, ranks, perceives, or steers rather than generates. Realm's runtime layer is built for the generative and agentic systems now moving into these same decision seats, where it watches a model's internal state and holds an unsupported claim or an unchecked action before it commits. The control gap on this record, an automated decision that reached people with no runtime check in front of it, is the same gap. The index keeps predictive failures on the record because the pattern carries straight into the systems shipping today.