๐Ÿฅ‡ MMLongBench-Doc Leaderboard

NeurIPS 2024 Datasets and Benchmarks Track (Spotlight)

๐Ÿ  Homepage | ๐Ÿ“„ arXiv Paper | ๐Ÿค— Dataset

๐Ÿ“š MMLongBench-Doc is a long-context, multimodal document understanding benchmark designed to evaluate the performance of large multimodal models on complex document understanding tasks.

๐Ÿ“Š This leaderboard tracks the performance of various models on the MMLongBench-Doc benchmark, focusing on their ability to understand and process long documents with both text and visual elements.

๐Ÿ”ง You can use the official GitHub repo or VLMEvalKit to evaluate your model on MMLongBench-Doc. We provide the official evaluation results of GPT-4.1 and GPT-4o.

๐Ÿ“ To add your own model to the leaderboard, please send an Email to yubo001@e.ntu.edu.sg or zangyuhang@pjlab.org.cn then we will help with the evaluation and updating the leaderboard.

๐Ÿ“Š Leaderboard Statistics

  • Total Models: 9
  • Best Score: 49.7
  • Lowest Score: 25.1
Model
Release Date
HF Model
MoE
Parameters
Open Source
ACC Score
2025-04
-
45.9B activated (456B total)
โœ—
49.7

๐Ÿ“™ Citation

If you use MMLongBench-Doc in your research, please cite our work: