article

OpenAI’s o3 Stuns AI Math Olympiad with Near-Perfect Debut, Open-Source Models Just 5 Points Behind

4 min read

The second Artificial Intelligence Mathematical Olympiad (AIMO2) has delivered groundbreaking results. While NVIDIA’s NemoSkills topped the leaderboard in the previous edition, this year, OpenAI’s o3 model entered the competition for the first time—and swept the stage with near-perfect scores.

Even the legendary mathematician Terence Tao expressed excitement, noting that the competition had previously been limited to open-source participants with constrained compute budgets. With o3’s entry, the gap between commercial and open-source AI reasoning systems has become clearer than ever.

Image


Key Highlights of AIMO2

With compute fully unleashed, OpenAI’s o3 achieved 47/50 (near full marks). In fact, given two attempts per problem, the model could plausibly score a perfect 50/50.

Interestingly, under compute-matched conditions, open-source and closed-source models showed only marginal differences, underscoring how much scaling impacts performance.

Image
📄 Full report: The Gap Is Shrinking


Olympiad-Level Math, AI-Style

The AIMO benchmark targets Olympiad-grade mathematical reasoning—a notoriously difficult domain for machines.

Image

This aligns with Epoch AI’s 2024 forecast, which estimated open-source AI would lag closed systems by about one year in reasoning capabilities.

Launched in 2023, the Artificial Intelligence Mathematical Olympiad was designed to accelerate progress in open, reproducible, high-level mathematical reasoning.

Image
🔗 Competition Portal: Kaggle AIMO2


AIMO2: Raising the Difficulty

The second edition, completed in April 2025, increased problem difficulty to match national Olympiads like the UK’s BMO and the US USAMO.

Image

AIMO2 Leaderboard (Private Scores)

Kaggle uses two leaderboards:

Despite harder problems than AIMO1, the results were remarkably strong.

But a bigger question loomed: What happens when closed-source AI like OpenAI’s o3 is tested head-to-head?


OpenAI o3 vs. AIMO2 Champions

OpenAI partnered with AIMO organizers to evaluate o3-preview (a pre-release version) against Olympiad math benchmarks.

Comparison included:

Results:

Crucially, when compute cost is factored in, open-source models appear far more competitive.


o3 Performance Across Compute Levels

Three configurations were tested:

Scores:

Image

Even at low compute, o3 solved 7 more problems than NemoSkills’ champion model, despite NemoSkills being run on stronger hardware.


NVIDIA and Tsinghua Reruns on H100

To test full potential, top open-source teams reran their models on 8×H100 GPUs (640GB VRAM), compared to the original Kaggle cap of 4×L4 GPUs (96GB VRAM).

Results:

Image
Image

Even with vastly more compute, their gains were modest—just 1–2 points—highlighting the scaling edge of o3.


Open vs. Closed: The Shrinking Gap

AIMO organizers caution that score types differ:

Nonetheless, the conclusion is clear: closed-source AI still leads, but open-source AI is catching up faster than expected.


Looking Ahead: AIMO3

The competition’s next edition, AIMO3, launches in Fall 2025, with IMO-level difficulty at the core. Details on schedule, prize pool, and new competition rules will be released soon.

This year’s results already mark a milestone in AI reasoning performance—suggesting that solving Olympiad-level mathematics may soon be within reach of open-source AI systems.


📚 References: