MathGlance

Evaluating MLLMs on Abstract Visual and Symbolic Understanding in Mathematical Diagrams

^*Project Lead, ^✉Corresponding Author

Email: shan.zhang@adelaide.edu.au; yanpeng_sun@nus.edu.sg; anton.vandenhengel@adelaide.edu.au

Abstract

Diagrams are a form of visual language, representing complex concepts and their interrelationships through structured symbols, shapes, and spatial arrangements. Unlike natural images, they are inherently symbolic and abstract. Diagrams thus pose significant challenges for Multimodal Large Language Models (MLLMs). Current benchmarks conflate perceptual and reasoning tasks, making it difficult to assess whether MLLMs genuinely understand mathematical diagrams beyond superficial pattern recognition and textual memorization. To address this gap, we introduce Logo MathGLance, a benchmark specifically designed to isolate and evaluate diagram perception in MLLMs. Logo MathGLance comprises 1.2K diagrams and 1.6K carefully curated questions spanning four perception tasks: shape classification, object counting, relationship identification, and object grounding, covering diverse domains including plane geometry, solid geometry, and graphical representations. Our evaluation of MLLMs reveals that their ability to understand diagrams is limited, particularly in fine-grained grounding tasks. In response, we construct the perception-oriented GeoPeP, a 200K dataset that represents geometric diagrams as structured graphs capturing primitives, their spatial relationships, and fine-grained bounding boxes. Training MLLM on GeoPeP leads to significant gains in perceptual accuracy, which in turn substantially improves mathematical reasoning. Our benchmark and dataset establish critical standards for multimodal mathematical understanding and offer valuable resources to advance MLLM research.

Leaderboard on MathGlance

Accuracy scores on the subset of Plane Geometry (PG), Soild Geometry (SG), Graphs (G) in Logo MathGlance.

#	Model	Source	*Avg.*	PG_ALL	PG_cls	PG_cnt	PG_grd	PG_rlat	SG_ALL	SG_cls	SG_cnt	SG_grd	SG_rlat	G_ALL	G_cls	G_cnt	G_grd	G_rlat
1	Qwen2.5-VL⁺-32B (ours) 🥇	Link	74.2	77.9	70.7	79.6	84.0	79.5	73.8	98.8	86.4	15.0	85.0	71.1	98.6	98.2	2.7	99.0
2	Qwen2.5-VL⁺-7B (ours) 🥈	Link	72.9	78.5	70.7	79.2	82.6	85.0	71.9	97.9	86.2	12.9	70.0	68.2	94.2	96.3	4.9	89.4
3	SVE-Math-DeepSeek⁺-7B (ours) 🥉	Link	68.4	84.6	75.8	88.4	82.9	97.5	54.1	85.3	65.8	20.3	45.0	60.7	85.1	78.4	1.6	75.7
4	InternVL2.5-38B	Link	63.1	44.0	59.9	52.0	2.5	66.0	78.8	98.8	92.8	38.1	72.5	66.5	98.6	96.3	3.2	69.7
5	Qwen2.5-VL-32B	Link	62.2	43.3	56.9	54.8	0.0	67.0	72.5	98.8	89.7	1.6	87.5	68.8	91.3	100.0	1.6	97.0
6	Qwen2-VL-72B	Link	59.9	42.4	51.2	50.8	17.4	52.0	71.2	97.7	84.5	6.4	77.5	66.1	76.8	98.2	16.1	84.9
7	Qwen2.5-VL-7B	Link	59.2	44.0	56.2	51.3	18.5	52.0	68.0	98.8	88.7	0.0	65.0	65.7	89.9	100.0	3.2	78.8
8	InternLM-XComposer2-7B	Link	55.6	35.8	49.4	48.8	0.0	47.0	62.9	90.7	86.6	0.0	53.8	54.6	60.9	94.4	0.0	78.8
9	GPT-4o	Link	53.3	42.8	58.4	53.2	1.1	62.5	60.7	72.1	84.5	1.6	66.3	56.4	92.8	72.2	1.6	57.6
10	DeepSeek-VL2-Small (16B)	Link	51.5	37.6	47.6	43.6	12.5	48.5	63.8	98.8	70.1	11.1	60.0	53.2	76.8	53.7	11.3	81.8
11	Qwen2-VL-7B	Link	51.4	37.9	47.6	41.2	12.8	53.0	64.1	93.0	78.4	14.3	55.0	52.3	84.1	88.9	3.2	18.2
12	InternVL2.5-8B	Link	50.7	35.0	48.8	36.0	0.0	60.0	65.6	98.8	72.2	4.8	70.0	51.4	68.1	77.8	0.0	69.7
13	mPLUG-owl3-7B	Link	50.0	36.4	46.7	41.6	3.9	58.5	65.3	95.4	83.5	0.0	62.5	48.2	59.4	77.8	0.0	66.7
14	InternVL2-8B	Link	48.4	31.9	44.3	38.0	0.0	48.5	62.9	98.8	62.9	4.8	70.0	50.5	68.1	75.9	0.0	66.7
15	SVE-Math-DeepSeek-7B	Link	46.6	35.4	52.4	36.0	3.56	51.0	49.4	77.9	62.9	0.0	41.3	55.1	81.2	75.9	0.0	69.7
16	MultiMath-7B	Link	41.8	31.2	44.0	30.4	1.07	53.0	45.7	81.4	53.6	0.0	33.8	48.6	79.7	57.4	0.0	33.8
17	Math-LLaVA-13B	Link	40.0	27.9	34.4	32.4	0.0	50.5	44.8	81.4	55.7	0.0	27.5	47.3	78.3	59.3	0.0	51.5
18	GPT-o1	Link	36.5	15.8	33.2	11.6	0.0	14.0	41.4	75.6	52.6	0.0	23.8	52.3	82.6	81.5	0.0	39.4
19	LLaVA-v1.5-13B	Link	35.4	32.8	29.3	40.4	23.5	42.0	35.9	60.5	38.1	0.0	35.0	37.6	63.8	42.6	0.0	45.5
20	LLaVA-v1.5-7B	Link	33.3	29.2	29.0	39.6	14.2	37.5	31.6	43.0	42.3	0.0	31.3	39.0	76.8	35.2	0.0	39.4
21	DeepSeek-VL2-Tiny	Link	32.6	29.5	45.2	34.4	4.6	32.0	39.0	76.7	32.0	0.0	37.5	29.4	39.1	57.4	0.0	18.2
22	G-LLaVA-7B	Link	30.3	25.6	27.8	41.2	0.4	38.0	31.3	45.4	38.1	0.0	32.5	33.9	58.0	37.0	0.0	42.4

Overview

Logo The MathGLance benchmark is a novel evaluation framework designed to assess the mathematical perception abilities of Multimodal Large Language Models (MLLMs). Unlike existing benchmarks that often conflate perception with high-level reasoning tasks, MathGLance isolates perceptual skills by focusing on mathematical visual reasoning with minimal cognitive load. It provides both quantitative and qualitative assessments across different granularity levels. The benchmark covers a diverse range of mathematical contexts, including Plane Geometry (66%), Solid Geometry (20%), and Graphical data representations (14%) such as line plots, bar charts, and pie charts. It comprises 1,609 questions and 1,198 unique images, formulated mainly as multiple-choice or true/false questions to streamline evaluation. MathGLance features four key task categories: 1) Shape Classification which identifies object classes based on visual attributes (e.g., vertices, material, color, size) across 16 plane geometry categories, 3 CLEVR-defined solid objects, and 5 graphical types; 2) Object Counting which evaluates the model's ability to count either the total number of objects or specific geometric shapes within an image; 3) Relationship Identification which assesses understanding of spatial and mathematical relationships between geometric primitives, covering 4 spatial and over 10 mathematical relationships; 4) Object Grounding which measures fine-grained localization by predicting object coordinates (x1, y1, x2, y2) based on textual descriptions. MathGLance is designed to challenge MLLMs in mathematical perception while minimizing high-level reasoning demands, offering a comprehensive and fine-grained evaluation of visual reasoning abilities.

Examples of Shape Classification in Logo MathGLance.

Examples of Object Counting in Logo MathGLance.

Examples of Object Grounding in Logo MathGLance.

Examples of Relationship Identification in Logo MathGLance.

Key statistics and subject-task distribution of Logo

MathGLance.

Main Results of Logo

MathGLance

Performance comparison of different MLLMs on MathGLance across Plane Geometry, Solid Geometry, and Graphs. cls, cnt, grd, and rlat represent different question categories: shape classification, object counting, object grounding, and relationship identification, respectively. all indicates the overall accuracy, calculated as the ratio of correctly answered questions to the total number of questions in the benchmark, while Avg. denotes the average all score across all subjects.

BibTeX

@article{sun2025mathglance, author = {Yanpeng Sun and Shan Zhang and Wei Tang and Aotian Chen and Piotr Koniusz and Kai Zou and Yuan Xue and Anton van den Hengel}, title = {MATHGLANCE: Multimodal Large Language Models Do Not Know Where to Look in Mathematical Diagrams}, booktitle = {arXiv preprint arXiv:2503.20745}, year = {2025} } @article{zhang2025primitive, author = {Shan Zhang and Aotian Chen and Yanpeng Sun and Jindong Gu and Yi-Yu Zheng and Piotr Koniusz and Kai Zou and Anton van den Hengel and Yuan Xue}, title = {Primitive Vision: Improving Diagram Understanding in MLLMs}, booktitle = {Proceedings of the 42th International Conference on Machine Learning}, year = {2025} }

MathGlance

Evaluating MLLMs on Abstract Visual and Symbolic Understanding in Mathematical Diagrams

Abstract

Leaderboard on MathGlance

MathGlance Dataset

Overview

Experiment Results

Main Results of MathGLance

More Results

Model Responses

BibTeX