Instacart AI pricing tests showed shoppers different prices for identical grocery items

A December 2025 study by Consumer Reports, Groundwork Collaborative and More Perfect Union found that Instacart ran AI-driven pricing experiments that resulted in different shoppers seeing different prices for the same items, with some differences reported up to 23%. After public reporting and regulatory questions, Instacart said it would end item price tests on its platform on December 22, 2025. The company had acquired Eversight, an AI pricing and promotions platform, in 2022 and said retailers control prices listed on the app.

Instacart · Incident Dec 9, 2025 · Indexed Jun 9, 2026 · 3 sources

An AI-driven pricing test randomized item prices across users, producing different displayed prices for identical products.
What
A December 2025 study by Consumer Reports, Groundwork Collaborative and More Perfect Union found that Instacart ran AI-driven pricing experiments that resulted in different shoppers seeing different prices for the same items, with some differences reported up to 23%.
Incident date
Dec 9, 2025
Who
Instacart
Failure mode
Policy Violation
AI surface
Algorithmic Decision
Severity
Medium

What happened

In December 2025 a study by Consumer Reports, Groundwork Collaborative and More Perfect Union reported that Instacart ran AI-driven pricing experiments which led to some shoppers being shown different prices for identical grocery items; the study documented price variations as large as 23% in some tests. News outlets reported the study’s findings and the experiments drew public criticism and questions from regulators. On December 22, 2025 Instacart announced it would end item price tests on its platform.

What broke inside the model

Failure path · mode profile · Policy Violation
  1. 01 · TriggerA prompt pushes against a deployment boundary.
  2. 02 · Model stepThe model produces the disallowed output.
  3. 03 · Control gapNo enforcement blocks it at generation time.
  4. 04 · FailureThe output crosses the policy line.
  5. 05 · ConsequenceA limit the business set is breached in public.

The output crosses a policy boundary the deployment had defined.

The experiments used technology from Eversight, an AI-enabled pricing and promotions platform Instacart acquired in 2022, that allowed retailers to run randomized item price tests on the platform. Those randomized tests produced inconsistent displayed prices for identical items across different users, and the company’s implementation lacked transparency and controls that would have prevented consumer-facing price discrepancies.

Public visibilityHigh
Regulatory exposureActive
Customer impactMany customers
Financial impactUnknown
Time to disclosureWeeks
  1. PressInstacart ends AI-driven price experiments after criticism | Reutersreuters.com
  2. PressStudy: Instacart's AI pricing tools drive up the cost of groceries | CNBCcnbc.com
  3. PressInstacart to end AI price tests for retailers following investigation - CBS Newscbsnews.com
Permalinkhttps://failureindex.ai/failures/instacart-pricing-tests-showed-shoppers-different
CitationAI Failure Index. "Instacart AI pricing tests showed shoppers different prices for identical grocery items" (FI-0344). Realm Labs. https://failureindex.ai/failures/instacart-pricing-tests-showed-shoppers-different (indexed Jun 9, 2026).
Share cardA branded image of this record for posts and slides.

Data fields CC-BY 4.0, prose citation permitted. Incident ID FI-0344. Full dataset at /data.

Note from Realm Labs, the Index steward

How Realm fits

Controls for this failure mode
  • Prism
  • OmniGuard

This entry sits in the index's predictive wing: a system that scores, ranks, perceives, or steers rather than generates. Realm's runtime layer is built for the generative and agentic systems now moving into these same decision seats, where it watches a model's internal state and holds an unsupported claim or an unchecked action before it commits. The control gap on this record, an automated decision that reached people with no runtime check in front of it, is the same gap. The index keeps predictive failures on the record because the pattern carries straight into the systems shipping today.