U-MATH / μ-MATH leaderboard

These datasets are designed to test the mathematical reasoning and meta-evaluation capabilities of Large Language Models (LLMs) on university-level problems. U-MATH provides a set of 1,100 university-level mathematical problems, while µ-MATH complements it with a meta-evaluation framework focusing on solution judgment with 1084 LLM solutions.

Select Columns to Display:

Visible Columns:

Filter model types:
Filter model sizes:
Filter model families:
10
🟥
🚀

86.8

93.1
58.5
90.5