Name	Name	Last commit message	Last commit date
Latest commit History 1,394 Commits
.github	.github
assets	assets
data	data
docs	docs
requirements	requirements
scripts	scripts
vlmeval	vlmeval
.gitignore	.gitignore
.pre-commit-config.yaml	.pre-commit-config.yaml
LICENSE	LICENSE
README.md	README.md
requirements.txt	requirements.txt
run.py	run.py
setup.py	setup.py

EgoExoBench

📄 Report | 📊 Data

This is the official repository of EgoExoBench: A Benchmark for First- and Third-person View Video Understanding in MLLMs.

📊 Benchmark Overview

EgoExoBench is a large-scale benchmark designed to evaluate cross-view video understanding in multimodal large language models (MLLMs). It contains paired egocentric–exocentric videos and over 7,300 multiple-choice questions across 11 subtasks, covering three key dimensions of ego–exo reasoning:

Ego-Exo Relation
Ego-Exo View Transition
Ego-Exo Temporal Reasoning

📝 Data Preparation

To get started with EgoExoBench, follow the steps below to prepare data:

Method 1: Direct Download (Recommended)

We provide pre-processed videos, frames, and multiple-choice question (MCQ) files on Hugging Face. You can download them directly without additional preprocessing.

MCQs: The MCQs are provided in .tsv format, following the VLMEvalKit data structure.
Processed Videos and Frames:
- The processed videos directory contains video clips corresponding to each MCQ. These files are suitable for models that accept video input (e.g., Qwen2.5-VL).
- The processed frames directory contains frame sequences extracted from videos. These are used for models that take image sequences as input (e.g., InternVL).

Method 2: Build from Source Datasets

Alternatively, you can build the benchmark yourself by downloading the original datasets.

Dataset Collection

EgoExoBench builds upon six publicly available ego–exo datasets. Please download the videos from the following sources:

Place all datasets under the data/ directory. The dataset structure is as follows:

EgoExoBench/ └── data/ ├── CVMHAT/ │ └── data ├── Ego-Exo4D/ │ └── takes/ ├── EgoExoLearn/ ├── EgoMe/ ├── LEMMA/ └── TF2023/ └── data/

Data Preparation

For the CVMHAT and TF2023 datasets, we utilize the bounding box annotations to augment the original frames by overlaying bounding boxes that indicate the target person. To generate these bboxes, run the following commands:

python data/CVMHAT/tools/process_bbox.py python data/TF2023/tools/process_bbox.py

Download Multiple-Choice Questions (MCQs)

Download the EgoExoBench multiple-choice questions (MCQs) file (link) and place it in the MCQ/ directory.

Installation

git clone https://github.com/ayiyayi/EgoExoBench.git cd EgoExoBench

Please note that different VLMs require specific environment configurations (e.g., different versions of transformers). We recommend consulting the official documentation of each VLM to ensure an accurate evaluation and proper setup. Qwen2.5VL, InternVL3, LLaVA-OneVision, LLaVA-NeXT-Video

🚀 Model Evaluation

Evaluation is built upon VLMEvalKit.

# for VLMs that consume small amounts of GPU memory torchrun --nproc-per-node=1 run.py --data EgoExoBench_MCQ --model Qwen2.5-VL-7B-Instruct-ForVideo # for very large VLMs python run.py --data EgoExoBench_MCQ --model Qwen2.5-VL-72B-Instruct-ForVideo

🙏 Acknowledgements

This codebase is based on VLMEvalKit. EgoExoBench builds upon publicly available ego–exo datasets: Ego-Exo4D, LEMMA, EgoExoLearn, TF2023, EgoMe, CVMHAT. Thanks for open-sourcing!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

EgoExoBench

📊 Benchmark Overview

📝 Data Preparation

Method 1: Direct Download (Recommended)

Method 2: Build from Source Datasets

Dataset Collection

Data Preparation

Download Multiple-Choice Questions (MCQs)

Installation

🚀 Model Evaluation

🙏 Acknowledgements

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

License

ayiyayi/EgoExoBench

Folders and files

Latest commit

History

Repository files navigation

EgoExoBench

📊 Benchmark Overview

📝 Data Preparation

Method 1: Direct Download (Recommended)

Method 2: Build from Source Datasets

Dataset Collection

Data Preparation

Download Multiple-Choice Questions (MCQs)

Installation

🚀 Model Evaluation

🙏 Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages