U-MATH / μ-MATH leaderboard

These datasets are designed to test the mathematical reasoning and meta-evaluation capabilities of Large Language Models (LLMs) on university-level problems. U-MATH provides a set of 1,100 university-level mathematical problems, while µ-MATH complements it with a meta-evaluation framework focusing on solution judgment with 1084 LLM solutions.

Select Columns to Display:

Type #Params (B) Architecture License Family Judge Model Name U-MATH Text Acc U-MATH Visual Acc U-MATH TextHard Acc Diff Calc Acc Diff Calc Text Acc Diff Calc Visual Acc Integral Calc Acc Integral Calc Text Acc Integral Calc Visual Acc Algebra Acc Algebra Text Acc Algebra Visual Acc Multivar Calc Acc Multivar Calc Text Acc Multivar Calc Visual Acc Precalc Acc Precalc Text Acc Precalc Visual Acc Seq & Series Acc Seq & Series Text Acc Seq & Series Visual Acc

Visible Columns:

Textbox

Filter model types:

Filter model sizes:

Filter model families:


10	🟥	🚀	meta-llama/Llama-3.2-90B-Vision-Instruct	nvidia/Llama-3.1-Nemotron-70B-Instruct	Open-Weights	684.5	684.5	Qwen2VLForConditionalGeneration	tongyi-qianwen	DeepSeek	https://huggingface.co/unsloth/Llama-3.1-Nemotron-70B-Instruct	gpt-4o-2024-08-06	86.8	93.1	58.5	90.5	76.8	80.7	57.1	83.2	90.7	63.8	89.4	97.3	16.7	87.1	85.3	60.7	96.2	99.3	50	92.9	93.3	75


10	🟥	🚀	nvidia/Llama-3.1-Nemotron-70B-Instruct	86.8	93.1	58.5	90.5


1	🟥	❓	o1	86.8	93.1	58.5	90.5
2	🟥	❓	gemini-2.0-flash-thinking-exp-01-21	83.6	89.2	58.5	86.2
3	🟥	❓	o3-mini	82.2	92.8	34.5	89.5
4	💙	🚀	deepseek-ai/DeepSeek-R1	80.7	91.3	33	88.2
5	🟥	❓	o1-mini	76.3	82.9	46.5	75.8
6	💙	🚗	Qwen/QwQ-32B-Preview	73.1	82.7	30	75.8
7	💙	🚚	Qwen/QVQ-72B-Preview	65	69.7	44	57.2
8	💙	🚀	deepseek-ai/DeepSeek-V3	62.6	69.3	32.5	57.5
9	🟥	❓	gemini-1.5-pro	60.1	63.4	45	50.5
10	💙	🚚	Qwen/Qwen2.5-Math-72B-Instruct	59.5	68.7	18	57
11	🟥	❓	gemini-1.5-flash	57.8	61.2	42.5	48.5
12	💙	🚚	Nexusflow/Athene-V2-Chat	54.9	62.9	19	49.8
13	💙	🚗	microsoft/phi-4	54.5	58.3	37	43.2
14	💙	🚗	Qwen/Qwen2.5-32B-Instruct	52.4	60.4	16	46.3
15	💙	🚚	Qwen/Qwen2.5-72B-Instruct	51.2	58.9	16.5	44.7
16	🟥	❓	gpt-4o-2024-08-06	50.2	53.9	33.5	38.3
17	💙	🚀	mistralai/Pixtral-Large-Instruct-2411	47.8	51.4	31.5	38.2
18	💙	🚀	mistralai/Mistral-Large-Instruct-2411	47.6	55.6	12	42.5
19	💙	🚗	Qwen/Qwen2.5-Math-7B-Instruct	45.5	53	11.5	38
20	💙	🚚	meta-llama/Llama-3.3-70B-Instruct	44.7	51.7	13.5	39.5
21	🟥	❓	gpt-4o-mini-2024-07-18	43.4	47.2	26	30
22	💙	🚗	Qwen/Qwen2.5-7B-Instruct	43.3	50.4	11	34.5
23	💙	❓	nvidia/Llama-3.1-Nemotron-70B-Instruct	42.5	47.7	19.5	33.7
24	💙	🚚	Qwen/Qwen2-VL-72B-Instruct	41.8	43.9	32.5	29.3
25	🟥	❓	claude-sonnet-3.5	38.7	40.7	30	26.2
26	💙	🚚	meta-llama/Llama-3.2-90B-Vision-Instruct	37.2	41.8	16.5	24.7
27	💙	🚗	mistralai/Mistral-Small-Instruct-2501	34.8	39.9	12	22
28	💙	🚚	meta-llama/Llama-3.1-70B-Instruct	34.3	39.6	10.5	22.8
29	💙	🚗	meta-llama/Llama-3.1-8B-Instruct	29.5	33.7	11	22.8
30	💙	🚗	Qwen/Qwen2-VL-7B-Instruct	26.3	27.1	22.5	15.3
31	🟥	❓	LFM-7B	25.8	28	16	10.7
32	💙	🚗	mistralai/Ministral-8B-Instruct-2410	23.1	26.9	6	13.5
33	💙	🚗	meta-llama/Llama-3.2-11B-Vision-Instruct	20.4	22.9	9	10.3
34	💙	🚗	mistralai/Pixtral-12B-2409	17.5	17.9	16	8.8

Select Columns to Display:

Type #Params (B) Architecture License Family Extract Model Name μ-MATH TPR μ-MATH TNR μ-MATH PPV μ-MATH NPV GPT-4o Subset F1 GPT-4o Subset TPR GPT-4o Subset TNR GPT-4o Subset PPV GPT-4o Subset NPV Gemini-1.5-Pro Subset F1 Gemini-1.5-Pro Subset TPR Gemini-1.5-Pro Subset TNR Gemini-1.5-Pro Subset PPV Gemini-1.5-Pro Subset NPV Llama-3.1-70B Subset F1 Llama-3.1-70B Subset TPR Llama-3.1-70B Subset TNR Llama-3.1-70B Subset PPV Llama-3.1-70B Subset NPV Qwen2.5-72B Subset F1 Qwen2.5-72B Subset TPR Qwen2.5-72B Subset TNR Qwen2.5-72B Subset PPV Qwen2.5-72B Subset NPV

Visible Columns:

Textbox

Filter model types:

Filter model sizes:

Filter model families:


10	🟥	🚗	mistralai/Mistral-Large-Instruct-2411	mistralai/Mistral-Large-Instruct-2411	Open-Weights	684.5	684.5	DeepseekV3ForCausalLM	qwen	DeepSeek	https://huggingface.co/unsloth/Mistral-Large-Instruct-2411	Qwen/Qwen2.5-72B-Instruct	89.5	90.6	88.4	88.7	90.4	88.4	88.9	87.9	88.9	87.9	90.6	95.3	84.6	91.1	91.7	93.2	90.5	95.8	90.5	95.8	83.8	86.8	80.6	84.6	83.3


10	🟥	🚗	mistralai/Mistral-Large-Instruct-2411	89.5	88.4	90.6	93.2	83.8


1	🟥	❓	o1	89.5	88.4	90.6	93.2	83.8
2	🟥	❓	o1-mini	84.8	81.2	86.2	89.7	79.5
3	💙	🚗	Qwen/QwQ-32B-Preview	83.2	78	80	84	86.7
4	💙	🚀	deepseek-ai/DeepSeek-R1	82.2	79.7	82	88.2	76.8
5	🟥	❓	gemini-2.0-flash-thinking-exp-01-21	81	74.3	85.8	83.3	76
6	🟥	❓	google/gemini-1.5-pro	80.7	78.2	79.5	83.6	77.7
7	🟥	❓	gpt-4o-2024-08-06	77.4	77.5	72.6	81.8	74.2
8	💙	🚀	mistralai/Mistral-Large-Instruct-2411	76.6	76	75	78.6	72.5
9	💙	🚚	Qwen/Qwen2.5-72B-Instruct	75.6	73.7	74.2	79.3	70.5
10	🟥	❓	google/gemini-1.5-flash	74.8	70.1	73.9	80.6	71.2
11	🟥	❓	claude-sonnet-3-5	74.8	72.2	73.8	77.9	70.8
12	💙	🚚	Qwen/Qwen2.5-Math-72B-Instruct	74	68.2	76.8	77.3	69.3
13	🟥	❓	gpt-4o-mini-2024-07-18	72.3	70.4	69.6	76.2	69.3
14	💙	🚗	Qwen/Qwen2.5-7B-Instruct	69.3	68.3	69.1	72.3	62.4
15	💙	🚗	Qwen/Qwen2.5-Math-7B-Instruct	61.9	57.2	63.8	63.8	59.7
16	💙	🚚	meta-llama/Llama-3.1-70B-Instruct	61	69.4	58.8	57	56
17	💙	🚗	mistralai/Ministral-8B-Instruct-2410	60.5	62.9	58.3	63.1	52.8
18	💙	🚗	meta-llama/Llama-3.1-8B-Instruct	52	51.2	55.5	49.2	48.7

Select Columns to Display:

Type #Params (B) Architecture License Family U-MATH Acc U-MATH Text Acc U-MATH Visual Acc Judge Model Name Extract Model Name μ-MATH F1 μ-MATH TPR μ-MATH TNR μ-MATH PPV μ-MATH NPV

Visible Columns:

Textbox

Filter model types:

Filter model sizes:

Filter model families:


10	12	🟥	🚀	mistralai/Mistral-Large-Instruct-2411	mistralai/Mistral-Large-Instruct-2411	Open-Weights	684.5	684.5	DeepseekV3ForCausalLM	qwen	DeepSeek	https://huggingface.co/unsloth/Mistral-Large-Instruct-2411	86.8	93.1	58.5	gpt-4o-2024-08-06	Qwen/Qwen2.5-72B-Instruct	89.5	90.6	88.4	88.7	90.4


10	12	🟥	🚀	mistralai/Mistral-Large-Instruct-2411	86.8	89.5


1	1	🟥	❓	o1	86.8	89.5
2	5	🟥	❓	gemini-2.0-flash-thinking-exp-01-21	83.6	81
4	4	💙	🚀	deepseek-ai/DeepSeek-R1	80.7	82.2
5	2	🟥	❓	o1-mini	76.3	84.8
6	3	💙	🚗	Qwen/QwQ-32B-Preview	73.1	83.2
10	12	💙	🚚	Qwen/Qwen2.5-Math-72B-Instruct	59.5	74
15	9	💙	🚚	Qwen/Qwen2.5-72B-Instruct	51.2	75.6
16	7	🟥	❓	gpt-4o-2024-08-06	50.2	77.4
18	8	💙	🚀	mistralai/Mistral-Large-Instruct-2411	47.6	76.6
19	15	💙	🚗	Qwen/Qwen2.5-Math-7B-Instruct	45.5	61.9
21	13	🟥	❓	gpt-4o-mini-2024-07-18	43.4	72.3
22	14	💙	🚗	Qwen/Qwen2.5-7B-Instruct	43.3	69.3
28	16	💙	🚚	meta-llama/Llama-3.1-70B-Instruct	34.3	61
29	18	💙	🚗	meta-llama/Llama-3.1-8B-Instruct	29.5	52
32	17	💙	🚗	mistralai/Ministral-8B-Instruct-2410	23.1	60.5

This repository contains the official leaderboard code for the U-MATH and $\mu$-MATH benchmarks. These datasets are designed to test the mathematical reasoning and meta-evaluation capabilities of Large Language Models (LLMs) on university-level problems.

Overview

U-MATH provides a set of 1,100 university-level mathematical problems, while µ-MATH complements it with a meta-evaluation framework focusing on solution judgment with 1084 LLM solutions.

📊 U-MATH benchmark at Huggingface
🔎 μ-MATH benchmark at Huggingface
🗞️ Paper
👾 Evaluation Code at GitHub

Licensing Information

The contents of the μ-MATH's machine-generated model_output column are subject to the underlying LLMs' licensing terms.
Contents of all the other dataset U-MATH and μ-MATH fields, as well as the code, are available under the MIT license.

📙 Citation

Built with Gradio logo


1	🟥	❓	o1	o1	Proprietary	null	150	Unknown	Unknown	GPT	null	gpt-4o-2024-08-06	86.8	93.1	58.5	90.5	76.8	86	57.1	83.2	90.7	63.8	89.4	97.3	50	87.1	92	60.7	96.2	99.3	50	92.9	93.3	75
2	🟥	❓	gemini-2.0-flash-thinking-exp-01-21	gemini-2.0-flash-thinking-exp-01-21	Proprietary	null	150	Unknown	Unknown	Gemini	null	gpt-4o-2024-08-06	83.6	89.2	58.5	86.2	70.5	80.7	48.6	82.2	88.7	65.5	89.4	95.3	60	83.7	85.3	75	92.5	95.3	50	88.3	90	25
3	🟥	❓	o3-mini	o3-mini	Proprietary	null	150	Unknown	Unknown	Unknown	null	gpt-4o-2024-08-06	82.2	92.8	34.5	89.5	65.5	88	17.1	82.2	90.7	60.3	84.4	99.3	10	79.8	85.3	50	94.4	99.3	20	93.5	94	75
4	💙	🚀	deepseek-ai/DeepSeek-R1	deepseek-ai/DeepSeek-R1	Open-Weights	684.5	684.5	DeepseekV3ForCausalLM	Unknown	DeepSeek	https://huggingface.co/unsloth/DeepSeek-R1	gpt-4o-2024-08-06	80.7	91.3	33	88.2	65.5	85.3	22.9	76.9	87.3	50	83.3	96.7	16.7	79.8	86.7	42.9	93.1	98.7	10	92.9	93.3	75
5	🟥	❓	o1-mini	o1-mini	Proprietary	null	150	Unknown	Unknown	GPT	null	gpt-4o-2024-08-06	76.3	82.9	46.5	75.8	68.2	75.3	52.9	64.9	72	46.6	87.8	97.3	40	73	78.7	42.9	92.5	96.7	30	76.6	77.3	50
6	💙	🚗	Qwen/QwQ-32B-Preview	Qwen/QwQ-32B-Preview	Open-Weights	32.8	32.8	Qwen2ForCausalLM	None	Qwen	https://huggingface.co/unsloth/QwQ-32B-Preview	gpt-4o-2024-08-06	73.1	82.7	30	75.8	55.5	70	24.3	62.5	67.3	50	80	95.3	3.3	73	80.7	32.1	92.5	97.3	20	84.4	85.3	50
7	💙	🚚	Qwen/QVQ-72B-Preview	Qwen/QVQ-72B-Preview	Open-Weights	73.4	73.4	Qwen2VLForConditionalGeneration	qwen	Qwen	https://huggingface.co/unsloth/QVQ-72B-Preview	gpt-4o-2024-08-06	65	69.7	44	57.2	50	54	41.4	45.2	41.3	55.2	83.9	94	33.3	62.9	65.3	50	91.2	95.3	30	66.2	68	0
8	💙	🚀	deepseek-ai/DeepSeek-V3	deepseek-ai/DeepSeek-V3	Open-Weights	684.5	684.5	DeepseekV3ForCausalLM	Unknown	DeepSeek	https://huggingface.co/unsloth/DeepSeek-V3	gpt-4o-2024-08-06	62.6	69.3	32.5	57.5	43.2	49.3	30	38.9	38.7	39.7	81.7	96	10	65.2	69.3	42.9	86.9	90	40	72.1	72.7	50
9	🟥	❓	gemini-1.5-pro	gemini-1.5-pro	Proprietary	null	150	Unknown	Unknown	Gemini	null	gpt-4o-2024-08-06	60.1	63.4	45	50.5	49.6	50.7	47.1	26.4	27.3	24.1	86.1	91.3	60	60.1	60.7	57.1	86.2	87.3	70	63	63.3	50
10	💙	🚚	Qwen/Qwen2.5-Math-72B-Instruct	Qwen/Qwen2.5-Math-72B-Instruct	Open-Weights	72.7	72.7	Qwen2ForCausalLM	qwen	Qwen	https://huggingface.co/unsloth/Qwen2.5-Math-72B-Instruct	gpt-4o-2024-08-06	59.5	68.7	18	57	35.5	46	12.9	38.9	44	25.9	80	94.7	6.7	61.8	69.3	21.4	84.4	89.3	10	68.8	68.7	75
11	🟥	❓	gemini-1.5-flash	gemini-1.5-flash	Proprietary	null	150	Unknown	Unknown	Gemini	null	gpt-4o-2024-08-06	57.8	61.2	42.5	48.5	47.3	47.3	47.1	30.8	30.7	31	83.3	90.7	46.7	55.1	55.3	53.6	79.4	82.7	30	60.4	60.7	50
12	💙	🚚	Nexusflow/Athene-V2-Chat	Nexusflow/Athene-V2-Chat	Open-Weights	72.7	72.7	Qwen2ForCausalLM	Unknown	Qwen	https://huggingface.co/unsloth/Athene-V2-Chat	gpt-4o-2024-08-06	54.9	62.9	19	49.8	36.8	43.3	22.9	31.2	36.7	17.2	74.4	87.3	10	55.6	62	21.4	85	90.7	0	57.8	57.3	75
13	💙	🚗	microsoft/phi-4	microsoft/phi-4	Open-Weights	14.7	14.7	Phi3ForCausalLM	None	Unknown	https://huggingface.co/unsloth/phi-4	gpt-4o-2024-08-06	54.5	58.3	37	43.2	42.3	42.7	41.4	28.4	25.3	36.2	81.7	89.3	43.3	50.6	54.7	28.6	84.4	88	30	48.7	50	0
14	💙	🚗	Qwen/Qwen2.5-32B-Instruct	Qwen/Qwen2.5-32B-Instruct	Open-Weights	32.8	32.8	Qwen2ForCausalLM	None	Qwen	https://huggingface.co/unsloth/Qwen2.5-32B-Instruct	gpt-4o-2024-08-06	52.4	60.4	16	46.3	32.7	42.7	11.4	32.2	34.7	25.9	78.9	92	13.3	44.9	50	17.9	80	85.3	0	56.5	58	0
15	💙	🚚	Qwen/Qwen2.5-72B-Instruct	Qwen/Qwen2.5-72B-Instruct	Open-Weights	72.7	72.7	Qwen2ForCausalLM	qwen	Qwen	https://huggingface.co/unsloth/Qwen2.5-72B-Instruct	gpt-4o-2024-08-06	51.2	58.9	16.5	44.7	30	36.7	15.7	30.3	35.3	17.2	78.3	90.7	16.7	46.1	52	14.3	79.4	84	10	54.5	54.7	50
16	🟥	❓	gpt-4o-2024-08-06	gpt-4o-2024-08-06	Proprietary	null	150	Unknown	Unknown	GPT	null	gpt-4o-2024-08-06	50.2	53.9	33.5	38.3	32.3	30	37.1	27.4	27.3	27.6	80.6	90	33.3	48.3	49.3	42.9	76.9	80	30	45.5	46.7	0
17	💙	🚀	mistralai/Pixtral-Large-Instruct-2411	mistralai/Pixtral-Large-Instruct-2411	Open-Weights	123.3	123.3	Unknown	mrl	Mistral	https://huggingface.co/unsloth/Pixtral-Large-Instruct-2411	gpt-4o-2024-08-06	47.8	51.4	31.5	38.2	30.9	30	32.9	26.9	24.7	32.8	74.4	82.7	33.3	43.8	46.7	28.6	70.6	73.3	30	50	51.3	0
18	💙	🚀	mistralai/Mistral-Large-Instruct-2411	mistralai/Mistral-Large-Instruct-2411	Open-Weights	122.6	122.6	MistralForCausalLM	mrl	Mistral	https://huggingface.co/unsloth/Mistral-Large-Instruct-2411	gpt-4o-2024-08-06	47.6	55.6	12	42.5	24.5	32	8.6	30.8	36.7	15.5	73.3	85.3	13.3	40.4	45.3	14.3	73.1	78	0	55.2	56	25
19	💙	🚗	Qwen/Qwen2.5-Math-7B-Instruct	Qwen/Qwen2.5-Math-7B-Instruct	Open-Weights	7.6	7.6	Qwen2ForCausalLM	None	Qwen	https://huggingface.co/unsloth/Qwen2.5-Math-7B-Instruct	gpt-4o-2024-08-06	45.5	53	11.5	38	24.5	32	8.6	22.1	24	17.2	71.7	84.7	6.7	38.8	44	10.7	76.2	81.3	0	51.9	52	50
20	💙	🚚	meta-llama/Llama-3.3-70B-Instruct	meta-llama/Llama-3.3-70B-Instruct	Open-Weights	70.6	70.6	LlamaForCausalLM	Unknown	LLaMA	https://huggingface.co/unsloth/Llama-3.3-70B-Instruct	gpt-4o-2024-08-06	44.7	51.7	13.5	39.5	27.7	35.3	11.4	25.5	27.3	20.7	70.6	83.3	6.7	42.7	48.7	10.7	65	68.7	10	46.1	46.7	25
21	🟥	❓	gpt-4o-mini-2024-07-18	gpt-4o-mini-2024-07-18	Proprietary	null	150	Unknown	Unknown	GPT	null	gpt-4o-2024-08-06	43.4	47.2	26	30	28.2	26	32.9	16.8	16.7	17.2	75	87.3	13.3	37.6	37.3	39.3	72.5	76	20	40.3	40	50
22	💙	🚗	Qwen/Qwen2.5-7B-Instruct	Qwen/Qwen2.5-7B-Instruct	Open-Weights	7.6	7.6	Qwen2ForCausalLM	None	Qwen	https://huggingface.co/unsloth/Qwen2.5-7B-Instruct	gpt-4o-2024-08-06	43.3	50.4	11	34.5	22.3	30.7	4.3	28.4	32	19	75	86	20	31.5	36.7	3.6	74.4	78.7	10	37.7	38.7	0
23	💙	❓	nvidia/Llama-3.1-Nemotron-70B-Instruct	nvidia/Llama-3.1-Nemotron-70B-Instruct	Open-Weights	null	150	LlamaForCausalLM	Unknown	Unknown	https://huggingface.co/unsloth/Llama-3.1-Nemotron-70B-Instruct	gpt-4o-2024-08-06	42.5	47.7	19.5	33.7	26.8	29.3	21.4	20.7	21.3	19	73.9	84	23.3	36.5	40.7	14.3	64.4	67.3	20	42.2	43.3	0
24	💙	🚚	Qwen/Qwen2-VL-72B-Instruct	Qwen/Qwen2-VL-72B-Instruct	Open-Weights	73.4	73.4	Qwen2VLForConditionalGeneration	tongyi-qianwen	Qwen	https://huggingface.co/unsloth/Qwen2-VL-72B-Instruct	gpt-4o-2024-08-06	41.8	43.9	32.5	29.3	34.1	29.3	44.3	23.6	22	27.6	71.1	80	26.7	31.5	32	28.6	62.5	66	10	33.8	34	25
25	🟥	❓	claude-sonnet-3.5	claude-sonnet-3.5	Proprietary	null	150	Unknown	Unknown	Claude	null	gpt-4o-2024-08-06	38.7	40.7	30	26.2	27.3	20.7	41.4	13	12	15.5	67.8	75.3	30	34.3	33.3	39.3	61.3	64	20	37.7	38.7	0
26	💙	🚚	meta-llama/Llama-3.2-90B-Vision-Instruct	meta-llama/Llama-3.2-90B-Vision-Instruct	Open-Weights	88.6	88.6	MllamaForConditionalGeneration	Unknown	LLaMA	https://huggingface.co/unsloth/Llama-3.2-90B-Vision-Instruct	gpt-4o-2024-08-06	37.2	41.8	16.5	24.7	23.2	21.3	27.1	9.6	11.3	5.2	72.2	82	23.3	27	30	10.7	65.6	70	0	35.7	36	25
27	💙	🚗	mistralai/Mistral-Small-Instruct-2501	mistralai/Mistral-Small-Instruct-2501	Open-Weights	23.6	23.6	MistralForCausalLM	Unknown	Mistral	https://huggingface.co/unsloth/Mistral-Small-Instruct-2501	gpt-4o-2024-08-06	34.8	39.9	12	22	12.3	13.3	10	13.9	13.3	15.5	69.4	80.7	13.3	23.6	25.3	14.3	66.2	70.7	0	35.1	36	0
28	💙	🚚	meta-llama/Llama-3.1-70B-Instruct	meta-llama/Llama-3.1-70B-Instruct	Open-Weights	70.6	70.6	LlamaForCausalLM	Unknown	LLaMA	https://huggingface.co/unsloth/Llama-3.1-70B-Instruct	gpt-4o-2024-08-06	34.3	39.6	10.5	22.8	15.5	15.3	15.7	14.4	16	10.3	65.6	78.7	0	23	26	7.1	63.1	67.3	0	34.4	34	50
29	💙	🚗	meta-llama/Llama-3.1-8B-Instruct	meta-llama/Llama-3.1-8B-Instruct	Open-Weights	8	8	LlamaForCausalLM	Unknown	LLaMA	https://huggingface.co/unsloth/Llama-3.1-8B-Instruct	gpt-4o-2024-08-06	29.5	33.7	11	22.8	15	17.3	10	21.6	22.7	19	50.6	60	3.3	20.2	23.3	3.6	48.8	50.7	20	27.3	28	0
30	💙	🚗	Qwen/Qwen2-VL-7B-Instruct	Qwen/Qwen2-VL-7B-Instruct	Open-Weights	8.3	8.3	Qwen2VLForConditionalGeneration	Unknown	Qwen	https://huggingface.co/unsloth/Qwen2-VL-7B-Instruct	gpt-4o-2024-08-06	26.3	27.1	22.5	15.3	24.5	18.7	37.1	13	11.3	17.2	50.6	58.7	10	14.6	14	17.9	40.6	42.7	10	16.9	17.3	0
31	🟥	❓	LFM-7B	LFM-7B	Proprietary	null	150	Unknown	Unknown	Unknown	null	gpt-4o-2024-08-06	25.8	28	16	10.7	13.6	8.7	24.3	6.7	4	13.8	56.7	66.7	6.7	12.9	13.3	10.7	56.2	58.7	20	16.2	16.7	0
32	💙	🚗	mistralai/Ministral-8B-Instruct-2410	mistralai/Ministral-8B-Instruct-2410	Open-Weights	8	8	MistralForCausalLM	mrl	Mistral	https://huggingface.co/unsloth/Ministral-8B-Instruct-2410	gpt-4o-2024-08-06	23.1	26.9	6	13.5	11.8	13.3	8.6	8.7	10	5.2	51.1	60	6.7	11.2	12.7	3.6	44.4	47.3	0	17.5	18	0
33	💙	🚗	meta-llama/Llama-3.2-11B-Vision-Instruct	meta-llama/Llama-3.2-11B-Vision-Instruct	Open-Weights	10.7	10.7	MllamaForConditionalGeneration	Unknown	LLaMA	https://huggingface.co/unsloth/Llama-3.2-11B-Vision-Instruct	gpt-4o-2024-08-06	20.4	22.9	9	10.3	11.4	7.3	20	1.9	1.3	3.4	43.9	52	3.3	11.2	13.3	0	41.9	44	10	18.8	19.3	0
34	💙	🚗	mistralai/Pixtral-12B-2409	mistralai/Pixtral-12B-2409	Open-Weights	12	12	Unknown	Unknown	Mistral	https://huggingface.co/unsloth/Pixtral-12B-2409	gpt-4o-2024-08-06	17.5	17.9	16	8.8	16.8	10.7	30	4.3	4.7	3.4	37.2	40	23.3	6.7	6.7	7.1	30	32	0	13	13.3	0