Ofqual's grading algorithm downgraded 39% of A-level results before being reversed in days

In August 2020, Ofqual deployed a statistical standardisation algorithm to moderate teacher-predicted A-level grades after COVID-19 cancelled summer exams. The algorithm downgraded approximately 39% of results, with students at historically lower-performing state schools hit hardest while private school students benefited from more favorable adjustments. Following nationwide protests and political pressure, the government reversed the decision on August 17 and replaced algorithm grades with teacher-assessed Centre Assessment Grades.

Ofqual · Incident Aug 13, 2020 · Indexed Jun 4, 2026 · 3 sources

The algorithm assumed a student's potential was bounded by their school's past, baking historical inequality into individual outcomes.
What
In August 2020, Ofqual deployed a statistical standardisation algorithm to moderate teacher-predicted A-level grades after COVID-19 cancelled summer exams.
Incident date
Aug 13, 2020
Who
Ofqual
Failure mode
Policy Violation
AI surface
Agentic Workflow
Severity
High

What happened

After COVID-19 cancelled summer exams, Ofqual deployed a statistical standardisation algorithm (the Direct Centre Performance model) to moderate teacher-predicted grades for approximately 730,000 A-level students. When results were released on August 13, 2020, approximately 39% of grades were downgraded from teacher predictions, with private schools seeing a larger increase in top grades while state schools lost ground. Students from disadvantaged backgrounds were disproportionately affected. Following widespread protests, legal challenges, and political outcry, the government reversed the policy on August 17, replacing algorithm grades with teacher-assessed Centre Assessment Grades, and Ofqual's chief regulator Sally Collier subsequently resigned.

What broke inside the model

Failure path · mode profile · Policy Violation
  1. 01 · TriggerA prompt pushes against a deployment boundary.
  2. 02 · Model stepThe model produces the disallowed output.
  3. 03 · Control gapNo enforcement blocks it at generation time.
  4. 04 · FailureThe output crosses the policy line.
  5. 05 · ConsequenceA limit the business set is breached in public.

The output crosses a policy boundary the deployment had defined.

The Direct Centre Performance model used each school's historical grade distribution to constrain individual student results, meaning a high-achieving student at a historically low-performing school could be capped below their individual merit. The algorithm treated school-level historical performance as a proxy for individual student potential, systematically reproducing existing educational inequalities. Small cohort adjustments further benefited private schools, which often had smaller class sizes subject to different statistical treatment, while penalizing large state school cohorts.

Public visibilityHigh
Regulatory exposureActive
Customer impactClass-wide
Financial impactUnknown
Time to disclosureDays
  1. PrimaryGCSE and A level students to receive centre assessment gradesgov.uk
  2. PressHow a computer algorithm caused a grading crisis in British schoolscnbc.com
  3. Press2020 United Kingdom school exam grading controversyen.wikipedia.org
Permalinkhttps://failureindex.ai/failures/ofqual-grading-algorithm-downgraded-39-level
CitationAI Failure Index. "Ofqual's grading algorithm downgraded 39% of A-level results before being reversed in days" (FI-0147). Realm Labs. https://failureindex.ai/failures/ofqual-grading-algorithm-downgraded-39-level (indexed Jun 4, 2026).
Share cardA branded image of this record for posts and slides.

Data fields CC-BY 4.0, prose citation permitted. Incident ID FI-0147. Full dataset at /data.

Note from Realm Labs, the Index steward

How Realm would have caught this

Controls for this failure mode
  • Prism
  • OmniGuard

Realm compares what the model is about to output or do against the policy that governs the deployment, in real time, and can deny or redact the action before it takes effect, which is the gap an after-the-fact review never closes in time.