UNCW researchers scrape transgender YouTube videos for facial recognition dataset
Researchers at the University of North Carolina Wilmington scraped over one million images from YouTube videos of transgender people to train facial recognition AI without their consent. The project aimed to improve the system's ability to identify individuals undergoing hormone replacement therapy.
Researchers bypassed ethical oversight by claiming the data was public, thus avoiding Institutional Review Board approval and informed consent.
Key facts
- What
- Researchers at the University of North Carolina Wilmington scraped over one million images from YouTube videos of transgender people to train facial recognition AI without their consent.
- Incident date
- Sep 13, 2013
- Who
- University of North Carolina Wilmington
- Failure mode
- Policy Violation
- AI surface
- Computer Vision
- Severity
- High
What happened
Researchers led by Karl Ricanek at the University of North Carolina Wilmington scraped more than one million frames from YouTube videos of 38 transgender individuals documenting their medical transitions. This data was used to create the HRT Transgender Dataset without the explicit permission of the subjects. The project aimed to enhance the ability of facial recognition systems to track individuals before and after hormone replacement therapy.
What broke inside the model
- 01 · TriggerA prompt pushes against a deployment boundary.
- 02 · Model stepThe model produces the disallowed output.
- 03 · Control gapNo enforcement blocks it at generation time.
- 04 · FailureThe output crosses the policy line.
- 05 · ConsequenceA limit the business set is breached in public.
The output crosses a policy boundary the deployment had defined.
The failure occurred during data procurement, where researchers bypassed ethical oversight by claiming the data was public, thus avoiding Institutional Review Board approval and informed consent. This neglect was compounded by the failure to secure the resulting dataset, which remained in an unprotected Dropbox folder until 2021.
What it cost
Sources
- PressTransgender YouTubers had their videos grabbed to train facial recognition softwaretheverge.com
- PressFacial Recognition Researcher Left a Trans Database Exposed for Years After Using Images Without Permissionvice.com
- PressFace recognition data set of trans people still available online years after it was supposedly taken downalgorithmwatch.org
Cite this entry
https://failureindex.ai/failures/uncw-researchers-scrape-transgender-youtube-videosAI Failure Index. "UNCW researchers scrape transgender YouTube videos for facial recognition dataset" (FI-0663). Realm Labs. https://failureindex.ai/failures/uncw-researchers-scrape-transgender-youtube-videos (indexed Jun 22, 2026).Data fields CC-BY 4.0, prose citation permitted. Incident ID FI-0663. Full dataset at /data.
Note from Realm Labs, the Index steward
How Realm would have caught this
- Prism
- OmniGuard
Realm compares what the model is about to output or do against the policy that governs the deployment, in real time, and can deny or redact the action before it takes effect, which is the gap an after-the-fact review never closes in time.