Gates Foundation algorithmic teacher evaluation program fails to improve student outcomes
A $575 million initiative funded by the Gates Foundation used student test scores and algorithmic value-added models to evaluate teacher effectiveness. A 2018 RAND report concluded the program failed to significantly improve student achievement or graduation rates, particularly for low-income minority students.
The initiative succeeded in helping schools measure effectiveness but not in how to increase it.
Key facts
- What
- A $575 million initiative funded by the Gates Foundation used student test scores and algorithmic value-added models to evaluate teacher effectiveness.
- Incident date
- Sep 1, 2009
- Who
- Bill & Melinda Gates Foundation
- Failure mode
- Brand & Safety Incident
- AI surface
- Algorithmic Decision
- Severity
- Medium
What happened
The Intensive Partnerships for Effective Teaching initiative implemented algorithmic systems to identify and reward effective teachers based on student test scores. Despite substantial funding, the program failed to improve student outcomes or increase access to effective teaching for minority students. Educators criticized the system for being statistically invalid and alleged that the metrics were unfair.
What broke inside the model
- 01 · TriggerA user prompts the model in public view.
- 02 · Model stepThe model produces unsafe or off-brand output.
- 03 · Control gapNo filter holds the line before publish.
- 04 · FailureThe output goes public unchecked.
- 05 · ConsequenceA reputational or safety incident lands.
A contained signal crosses into output that goes public.
The failure centered on the use of value-added algorithmic models that lacked statistical validity. The system erroneously evaluated some teachers based on subjects or students they were not responsible for instructing.
What it cost
Sources
- PrimaryImproving Teacher Effectiveness: Final Reportrand.org
- PressBill Gates Spent Hundreds of Millions of Dollars to Improve Teaching. New Report Says It Was a Bustnepc.colorado.edu
Cite this entry
https://failureindex.ai/failures/gates-foundation-algorithmic-teacher-evaluation-programAI Failure Index. "Gates Foundation algorithmic teacher evaluation program fails to improve student outcomes" (FI-0677). Realm Labs. https://failureindex.ai/failures/gates-foundation-algorithmic-teacher-evaluation-program (indexed Jun 22, 2026).Data fields CC-BY 4.0, prose citation permitted. Incident ID FI-0677. Full dataset at /data.
Note from Realm Labs, the Index steward
How Realm would have caught this
- Prism
- OmniGuard
- AI Detection & Response (AIDR)
Realm watches the model's internal state for the signature of unsafe or off-brand generation and can block or reroute the output before it becomes public, in real time rather than after it has been screenshotted.