UNCW researchers scrape transgender YouTube videos for facial recognition dataset

Researchers at the University of North Carolina Wilmington scraped over one million images from YouTube videos of transgender people to train facial recognition AI without their consent. The project aimed to improve the system's ability to identify individuals undergoing hormone replacement therapy.

University of North Carolina Wilmington · Incident Sep 13, 2013 · Indexed Jun 22, 2026 · 3 sources

Researchers bypassed ethical oversight by claiming the data was public, thus avoiding Institutional Review Board approval and informed consent.
What
Researchers at the University of North Carolina Wilmington scraped over one million images from YouTube videos of transgender people to train facial recognition AI without their consent.
Incident date
Sep 13, 2013
Who
University of North Carolina Wilmington
Failure mode
Policy Violation
AI surface
Computer Vision
Severity
High

What happened

Researchers led by Karl Ricanek at the University of North Carolina Wilmington scraped more than one million frames from YouTube videos of 38 transgender individuals documenting their medical transitions. This data was used to create the HRT Transgender Dataset without the explicit permission of the subjects. The project aimed to enhance the ability of facial recognition systems to track individuals before and after hormone replacement therapy.

What broke inside the model

Failure path · mode profile · Policy Violation
  1. 01 · TriggerA prompt pushes against a deployment boundary.
  2. 02 · Model stepThe model produces the disallowed output.
  3. 03 · Control gapNo enforcement blocks it at generation time.
  4. 04 · FailureThe output crosses the policy line.
  5. 05 · ConsequenceA limit the business set is breached in public.

The output crosses a policy boundary the deployment had defined.

The failure occurred during data procurement, where researchers bypassed ethical oversight by claiming the data was public, thus avoiding Institutional Review Board approval and informed consent. This neglect was compounded by the failure to secure the resulting dataset, which remained in an unprotected Dropbox folder until 2021.

Public visibilityHigh
Regulatory exposureActive
Customer impactClass-wide
Financial impactUnknown
Time to disclosureMonths
  1. PressTransgender YouTubers had their videos grabbed to train facial recognition softwaretheverge.com
  2. PressFacial Recognition Researcher Left a Trans Database Exposed for Years After Using Images Without Permissionvice.com
  3. PressFace recognition data set of trans people still available online years after it was supposedly taken downalgorithmwatch.org
Permalinkhttps://failureindex.ai/failures/uncw-researchers-scrape-transgender-youtube-videos
CitationAI Failure Index. "UNCW researchers scrape transgender YouTube videos for facial recognition dataset" (FI-0663). Realm Labs. https://failureindex.ai/failures/uncw-researchers-scrape-transgender-youtube-videos (indexed Jun 22, 2026).
Share cardA branded image of this record for posts and slides.

Data fields CC-BY 4.0, prose citation permitted. Incident ID FI-0663. Full dataset at /data.

Note from Realm Labs, the Index steward

How Realm would have caught this

Controls for this failure mode
  • Prism
  • OmniGuard

Realm compares what the model is about to output or do against the policy that governs the deployment, in real time, and can deny or redact the action before it takes effect, which is the gap an after-the-fact review never closes in time.