Hitbenchmarks

An AI built before the contest earns a gold medal at the International Mathematical Olympiad by 2025

Hit · resolved by 2025

3 receiptsOpen verificationClose▾

Hit

Hit · resolved by 2025

IMO score vs. the gold cutoff (35/42 in 2025)Google DeepMind's Gemini Deep Think scored 35/42 — gold-medal standard — at the 2025 IMO

Verified blind: DeepSeek V4 + Grok — agree

How it's graded

Met when, at an IMO through 2025, an AI built beforehand scores at or above the gold-medal cutoff under competition conditions

Receipts · 3

An advanced version of Gemini Deep Think solved five out of the six IMO problems perfectly, earning 35 total points, and achieving gold-medal level performance.
Google DeepMind archived↗ · 2025-07-21
In July 2025, we reached gold medal-level performance on the International Mathematical Olympiad with a general-purpose reasoning model (35/42 points).
OpenAI archived↗ · 2026-02-20
So I think we have Paul at <8%, Eliezer at >16% for AI made before the IMO is able to get a gold (under time controls etc. of grand challenge) in one of 2022-2025.
LessWrong — IMO challenge bet with Eliezer (Paul Christiano) archived↗context · not model-verified · 2022-02-25

Ledger history

2026-06-24: seeded
2026-06-25: demoted pending bulletproof re-grade
2026-06-25: The claim targets 2025. Google DeepMind's official blog (source 0) confirms an advanced Gemini Deep Think scored 35/42 at IMO 2025, meeting the gold-medal cutoff. OpenAI's First Proof blog (source 1)
2026-06-26: added the Yudkowsky–Christiano bet-odds source (LessWrong, >16% vs <8%) to the receipt; context only — claim, grade, and the 2-model verification binding are unchanged

Every verdict on the ledger is graded against dated, archived third-party evidence and blind-verified by two independent models.