U-MATH / ฮผ-MATH leaderboard

These datasets are designed to test the mathematical reasoning and meta-evaluation capabilities of Large Language Models (LLMs) on university-level problems. U-MATH provides a set of 1,100 university-level mathematical problems, while ยต-MATH complements it with a meta-evaluation framework focusing on solution judgment with 1084 LLM solutions.

Select Columns to Display:

Visible Columns:

Filter model types:
Filter model sizes:
Filter model families:
Rank
T
S
Model Name
U-MATH Acc
U-MATH Text Acc
U-MATH Visual Acc
10
๐ŸŸฅ
๐Ÿš€

73.6

77.8
45.2