← AI Forecast Ledger
Hitbenchmarks

An AI built before the contest earns a gold medal at the International Mathematical Olympiad by 2025

Hit · resolved by 2025

3 receiptsOpen verificationClose

Hit

Hit · resolved by 2025

IMO score vs. the gold cutoff (35/42 in 2025)Google DeepMind's Gemini Deep Think scored 35/42 — gold-medal standard — at the 2025 IMO

Verified blind: DeepSeek V4 + Grok — agree

How it's graded

Met when, at an IMO through 2025, an AI built beforehand scores at or above the gold-medal cutoff under competition conditions

Receipts · 3
  • An advanced version of Gemini Deep Think solved five out of the six IMO problems perfectly, earning 35 total points, and achieving gold-medal level performance.

    Google DeepMind archived↗  ·  2025-07-21

  • In July 2025, we reached gold medal-level performance on the International Mathematical Olympiad with a general-purpose reasoning model (35/42 points).

    OpenAI archived↗  ·  2026-02-20

  • So I think we have Paul at <8%, Eliezer at >16% for AI made before the IMO is able to get a gold (under time controls etc. of grand challenge) in one of 2022-2025.

    LessWrong — IMO challenge bet with Eliezer (Paul Christiano) archived↗context · not model-verified  ·  2022-02-25

Ledger history
  • 2026-06-24: seeded
  • 2026-06-25: demoted pending bulletproof re-grade
  • 2026-06-25: The claim targets 2025. Google DeepMind's official blog (source 0) confirms an advanced Gemini Deep Think scored 35/42 at IMO 2025, meeting the gold-medal cutoff. OpenAI's First Proof blog (source 1)
  • 2026-06-26: added the Yudkowsky–Christiano bet-odds source (LessWrong, >16% vs <8%) to the receipt; context only — claim, grade, and the 2-model verification binding are unchanged

Every verdict on the ledger is graded against dated, archived third-party evidence and blind-verified by two independent models.