IBM Watson visual recognition exhibits gender and race bias
A study by MIT researcher Joy Buolamwini revealed that IBM Watson's visual recognition software had a high error rate when identifying darker-skinned women. The findings highlighted significant algorithmic bias in the system.
IBM Watson's visual recognition platform had an almost 35 percent error rate when it came to identifying darker-skinned females.
Key facts
- What
- A study by MIT researcher Joy Buolamwini revealed that IBM Watson's visual recognition software had a high error rate when identifying darker-skinned women.
- Incident date
- Feb 11, 2018
- Who
- IBM
- Failure mode
- Policy Violation
- AI surface
- Computer Vision
- Severity
- High
What happened
An MIT Media Lab study published in February 2018 revealed that IBM Watson's visual recognition platform demonstrated severe bias. The system misidentified darker-skinned women at a rate of nearly 35 percent while maintaining high accuracy for lighter-skinned men.
What broke inside the model
- 01 · TriggerA prompt pushes against a deployment boundary.
- 02 · Model stepThe model produces the disallowed output.
- 03 · Control gapNo enforcement blocks it at generation time.
- 04 · FailureThe output crosses the policy line.
- 05 · ConsequenceA limit the business set is breached in public.
The output crosses a policy boundary the deployment had defined.
The system failed due to a lack of diversity in the training datasets. This led the model to develop biased patterns that disproportionately affected darker-skinned females.
What it cost
Sources
- PrimaryStudy finds gender and skin-type bias in commercial AI systemsnews.mit.edu
- PressIBM releases diverse dataset to fight facial recognition biascnbc.com
- PressIBM hopes to fight bias in facial recognition with new diverse datasettheverge.com
Cite this entry
https://failureindex.ai/failures/ibm-watson-visual-recognition-exhibits-genderAI Failure Index. "IBM Watson visual recognition exhibits gender and race bias" (FI-0357). Realm Labs. https://failureindex.ai/failures/ibm-watson-visual-recognition-exhibits-gender (indexed Jun 9, 2026).Data fields CC-BY 4.0, prose citation permitted. Incident ID FI-0357. Full dataset at /data.
Note from Realm Labs, the Index steward
How Realm fits
- Prism
- OmniGuard
This entry sits in the index's predictive wing: a system that scores, ranks, perceives, or steers rather than generates. Realm's runtime layer is built for the generative and agentic systems now moving into these same decision seats, where it watches a model's internal state and holds an unsupported claim or an unchecked action before it commits. The control gap on this record, an automated decision that reached people with no runtime check in front of it, is the same gap. The index keeps predictive failures on the record because the pattern carries straight into the systems shipping today.