Methodology

How the Index works

The AI Failure Index is a public registry of real AI failures in production. It is stewarded by Realm Labs and free to cite under CC-BY 4.0. This page is the editorial standard: how entries are sourced, produced, categorized, scored, merged, and corrected, and why Realm Labs maintains the registry.

Last updated June 16, 2026Taxonomy v1.0.0Entries 532 verifiedSources 1379 citedLicense CC-BY 4.0Corrections active

What this index covers

We catalog failures of production AI systems across twelve surfaces: the generative wing (chatbots, copilots, voice agents, agentic workflows, code assistants, search and retrieval, media generation, machine translation) and the predictive wing (computer vision, recommenders, autonomous systems, algorithmic decision systems). Each entry is a discrete, sourced case with a permanent URL and a system class. Inclusion is not an accusation. Several companies in this index are doing the right things and met a genuinely hard failure mode.

How we source incidents

Named-company entries at catastrophic or high severity require at least two independent sources before publish. Named entries below two sources publish marked Low confidence and are queued for corroboration; they are excluded from the catastrophic and high severity headline counts until corroborated (16 entries currently carry that label). Court filings cite the docket and link the primary document. Press is secondary to primary sources, and primary sources are prioritized.

How entries are produced

Entries enter the index through two paths. Reader submissions are verified by hand. The majority of entries are produced by a machine-assisted harvest: automated collection against public reporting, automated checks for reality, duplication, and source resolution, then human review before publish. Review depth scales with stakes: every catastrophic and high-severity entry naming a company receives a full human pass; medium and low entries are spot-checked against the confidence grade shown on each card. Confidence grades are displayed, not hidden: High (multi-source, primary), Medium (multi-source), Low (single source, queued for corroboration).

What counts, what does not

An entry is a failure of a production AI system that reached real users, customers, courts, or the public. We exclude academic demonstrations without a production victim, model benchmarks, jailbreak screenshots with no operational consequence, attacker-side misuse where no deployed system failed, data practices that are not failures, and rumored incidents nobody will go on record about. This is why our count is smaller than broader harm databases. Smaller and verifiable is the point.

How we categorize failure modes

Every entry is classified into one of eight failure modes: hallucination, prompt injection, data leakage, policy violation, agentic action error, tool misuse, identity and access drift, and brand and safety incident. The taxonomy is versioned. Where the mechanism is not publicly known, the entry says so and classifies by the published behavior.

How we score severity

Catastrophic means a class-action lawsuit, regulatory enforcement, material financial harm, or fatal or near-fatal user harm. High means a press cycle longer than 72 hours, named-customer harm, or an executive apology or resignation. Medium means a public incident with a brief press cycle and no named regulatory action. Low means a case surfaced on social media with no broad press and no enforcement.

How this differs from the AI Incident Database

The AI Incident Database catalogs the broad universe of AI harms in society. The MIT AI Incident Tracker classifies that universe by risk domain. This index is narrower and deeper: production failures at companies, classified by the mechanism that broke inside the model, with the operational cost on the record. Where an entry overlaps with AIID, we cite their incident ID. Use theirs to study harm at large; use ours to not repeat a production failure.

Entry merges and removals

Registries never silently delete. When two entries are found to document the same incident, the lower FI number becomes canonical, unique sources and facts are ported over, and the duplicate becomes a tombstone: it keeps its ID, declares what it was merged into, and its URL permanently redirects to the canonical record. Tombstones are excluded from every count and list but stay in the repository and in the version history below. 12 entries have been merged to date.

Anonymization policy

Some entries are anonymized at the affected company's request. An anonymized entry carries no name, no dates that triangulate, and no quotes that identify the company. It is de-identified by industry and scenario shape and reviewed before publish. Anonymized cases are marked as steward-disclosed under NDA and are held to the same evidentiary standard as named entries. 5 of 532 entries are anonymized. An affected company can claim or correct its entry at any time via corrections@failureindex.ai.

Correction policy

If an entry is wrong, we want to fix it. Email corrections@failureindex.ai with the entry URL and the correction. We review corrections on a rolling basis and run a quarterly correction sweep across the full index. Categorization reflects publicly reported facts at the time of indexing and is updated when the public record changes.

Who runs this

Andrew Wesbecher is the editor of record (Realm Labs). Entries are produced by the pipeline above and reviewed under the editor's standard. A standing slot exists for a named external reviewer of catastrophic-severity entries; the name will appear here when confirmed.

Why Realm Labs maintains this

Realm Labs is the AI Detection and Response company. We maintain this index because we work on these failures every day and believe the public record should be open. The registry voice stays neutral. The only place Realm speaks in the first person is the marked steward note on each entry. Realm detects deviations from normal or intended operations in real time by observing the model's internal state. We do not claim to understand AI; we read its internal signals.

Editorial independence and disclosures

There is an editorial firewall between this catalog and the Realm Labs sales motion. Companies in this index are not Realm Labs customers unless explicitly noted. The data is open under CC-BY 4.0 so that journalists, analysts, regulators, and researchers can use it freely. Realm Labs is the disclosed steward, not the narrator.

Citation guidelines

Cite the index as "AI Failure Index, stewarded by Realm Labs," with a link to the entry URL. The data fields (the JSON and CSV artifacts on the open data page) are licensed CC-BY 4.0: attribution required, reuse welcome. The narrative prose on entry pages carries standard copyright; quoting with citation is permitted and encouraged. Permanent entry URLs follow the pattern failureindex.ai/failures/{slug} and do not change when a headline is edited; merged entries redirect to their canonical record.

Example citationAI Failure Index. “Entry headline” (FI-0001). Realm Labs. failureindex.ai/failures/{slug} (indexed 2026).

Version history

  • June 2026 Consolidation sweep: a corpus-wide duplicate audit merged 12 duplicate entries into their canonical records (tombstoned, redirected, logged in the dedupe worksheet). A blocking duplicate lint now runs on every batch.
  • June 2026 Taxonomy extended to twelve AI surfaces with a system class (generative or predictive) on every entry, opening the predictive wing: computer vision, recommenders, autonomous systems, and algorithmic decision systems.
  • Taxonomy v1.0.0 Eight failure modes, eleven industries, and four severity levels. The classification standard for every entry.
  • Established 2026 The AI Failure Index opens as a public, openly licensed registry of AI failures in production, stewarded by Realm Labs.