Skip to content

Commit 3be2e61

Browse files
authored
Update Readme & Code for MMScan HVG challenge (#112)
* challenge * challenge * challenge * challenge * challenge
1 parent 728ab36 commit 3be2e61

30 files changed

+425
-490
lines changed

README.md

Lines changed: 50 additions & 201 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88
<div id="top" align="center">
99

1010
[![arXiv](https://img.shields.io/badge/arXiv-2312.16170-blue)](https://arxiv.org/abs/2312.16170)
11-
[![](https://img.shields.io/badge/Paper-%F0%9F%93%96-blue)](./assets/2024_NeurIPS_MMScan_Camera_Ready.pdf)
11+
[![](https://img.shields.io/badge/Paper-%F0%9F%93%96-blue)](./assets/2406.09401v2.pdf)
1212
[![](https://img.shields.io/badge/Project-%F0%9F%9A%80-blue)](https://tai-wang.github.io/mmscan)
1313

1414
</div>
@@ -21,14 +21,21 @@
2121

2222
## 📋 Contents
2323

24-
1. [About](#-about)
25-
2. [Getting Started](#-getting-started)
26-
3. [MMScan API Tutorial](#-mmscan-api-tutorial)
27-
4. [MMScan Benchmark](#-mmscan-benchmark)
28-
5. [TODO List](#-todo-list)
24+
1. [News](#-news)
25+
2. [About](#-about)
26+
3. [Getting Started](#-getting-started)
27+
4. [MMScan Tutorial](#-mmscan-api-tutorial)
28+
5. [MMScan Benchmark](#-mmscan-benchmark)
29+
6. [TODO List](#-todo-list)
2930

30-
## 🏠 About
31+
## 🔥 News
32+
33+
- \[2025-06\] We are co-organizing the CVPR 2025 3D Scene Understanding Challenge. You're warmly invited to participate in the MMScan Hierarchical Visual Grounding track!
34+
The challenge test server is now online [here](https://huggingface.co/spaces/rbler/3d-iou-challenge). We look forward to your strong submissions!
3135

36+
- \[2025-01\] We are delighted to present the official release of [MMScan-devkit](https://github.com/OpenRobotLab/EmbodiedScan/tree/mmscan), which encompasses a suite of data processing utilities, benchmark evaluation tools, and adaptations of some models for the MMScan benchmarks. We invite you to explore these resources and welcome any feedback or questions you may have!
37+
38+
## 🏠 About
3239

3340
<!-- ![Teaser](assets/teaser.jpg) -->
3441

@@ -59,7 +66,6 @@ existing benchmarks and in-the-wild evaluation.
5966

6067
## 🚀 Getting Started
6168

62-
6369
### Installation
6470

6571
1. Clone Github repo.
@@ -100,247 +106,90 @@ existing benchmarks and in-the-wild evaluation.
100106

101107
Please refer to the [guide](data_preparation/README.md) here.
102108

103-
## 👓 MMScan API Tutorial
104-
109+
## 👓 MMScan Tutorial
105110

106111
The **MMScan Toolkit** provides comprehensive tools for dataset handling and model evaluation in tasks.
107112

108-
To import the MMScan API, you can use the following commands:
109-
110-
```bash
111-
import mmscan
112-
113-
# (1) The dataset tool
114-
import mmscan.MMScan as MMScan_dataset
115-
116-
# (2) The evaluator tool ('VisualGroundingEvaluator', 'QuestionAnsweringEvaluator', 'GPTEvaluator')
117-
import mmscan.VisualGroundingEvaluator as MMScan_VG_evaluator
118-
119-
import mmscan.QuestionAnsweringEvaluator as MMScan_QA_evaluator
120-
121-
import mmscan.GPTEvaluator as MMScan_GPT_evaluator
122-
```
123-
124113
### MMScan Dataset
125114

126115
The dataset tool in MMScan allows seamless access to data required for various tasks within MMScan.
127116

128-
#### Usage
129-
130-
Initialize the dataset for a specific task with:
131-
132-
```bash
133-
my_dataset = MMScan_dataset(split='train', task="MMScan-QA", ratio=1.0)
134-
# Access a specific sample
135-
print(my_dataset[index])
136-
```
137-
138-
#### Data Access
139-
140-
Each dataset item is a dictionary containing key elements:
141-
142-
(1) 3D Modality
143-
144-
- **"ori_pcds"** (tuple\[tensor\]): Original point cloud data extracted from the .pth file.
145-
- **"pcds"** (np.ndarray): Point cloud data with dimensions [n_points, 6(xyz+rgb)], representing the coordinates and color of each point.
146-
- **"instance_labels"** (np.ndarray): Instance ID assigned to each point in the point cloud.
147-
- **"class_labels"** (np.ndarray): Class IDs assigned to each point in the point cloud.
148-
- **"bboxes"** (dict): Information about bounding boxes within the scan, structured as { object ID:
149-
{
150-
"type": object type (str),
151-
"bbox": 9 DoF box (np.ndarray)
152-
}}
153-
154-
(2) Language Modality
155-
156-
- **"sub_class"**: The category of the sample.
157-
- **"ID"**: The sample's ID.
158-
- **"scan_id"**: The scan's ID.
159-
- *For Visual Grounding task*
160-
- **"target_id"** (list\[int\]): IDs of target objects.
161-
- **"text"** (str): Text used for grounding.
162-
- **"target"** (list\[str\]): Text prompt to specify the target grounding object.
163-
- **"anchors"** (list\[str\]): Types of anchor objects.
164-
- **"anchor_ids"** (list\[int\]): IDs of anchor objects.
165-
- **"tokens_positive"** (dict): Indices of positions where mentioned objects appear in the text.
166-
- *For Qusetion Answering task*
167-
- **"question"** (str): The text of the question.
168-
- **"answers"** (list\[str\]): List of possible answers.
169-
- **"object_ids"** (list\[int\]): Object IDs referenced in the question.
170-
- **"object_names"** (list\[str\]): Types of referenced objects.
171-
- **"input_bboxes_id"** (list\[int\]): IDs of input bounding boxes.
172-
- **"input_bboxes"** (list\[np.ndarray\]): Input 9-DoF bounding boxes.
173-
174-
(3) 2D Modality
117+
- #### Usage
175118

176-
- **'img_path'** (str): File path to the RGB image.
177-
- **'depth_img_path'** (str): File path to the depth image.
178-
- **'intrinsic'** (np.ndarray): Intrinsic parameters of the camera for RGB images.
179-
- **'depth_intrinsic'** (np.ndarray): Intrinsic parameters of the camera for depth images.
180-
- **'extrinsic'** (np.ndarray): Extrinsic parameters of the camera.
181-
- **'visible_instance_id'** (list): IDs of visible objects in the image.
119+
Initialize the dataset for a specific task with:
182120

183-
### MMScan Evaluator
121+
```bash
122+
from mmscan import MMScan
184123

185-
Our evaluation tool is designed to streamline the assessment of model outputs for the MMScan task, providing essential metrics to gauge model performance effectively.
124+
# (1) The dataset tool
125+
my_dataset = MMScan(split='train'/'test'/'val', task='MMScan-VG'/'MMScan-QA')
126+
# Access a specific sample
127+
print(my_dataset[index])
128+
```
186129

187-
#### 1. Visual Grounding Evaluator
130+
*Note:* For the test split, we have only made the VG portion publicly available, while the QA portion has not been released.
188131

189-
For the visual grounding task, our evaluator computes multiple metrics including AP (Average Precision), AR (Average Recall), AP_C, AR_C, and gtop-k:
132+
- #### Data Access
190133

191-
- **AP and AR**: These metrics calculate the precision and recall by considering each sample as an individual category.
192-
- **AP_C and AR_C**: These versions categorize samples belonging to the same subclass and calculate them together.
193-
- **gTop-k**: An expanded metric that generalizes the traditional Top-k metric, offering superior flexibility and interpretability compared to traditional ones when oriented towards multi-target grounding.
194-
195-
*Note:* Here, AP corresponds to AP<sub>sample</sub> in the paper, and AP_C corresponds to AP<sub>box</sub> in the paper.
134+
Each dataset item is a dictionary containing data information from three modalities: language, 2D, and 3D.([Details](https://rbler1234.gitbook.io/mmscan-devkit-tutorial#data-access)
196135

197-
Below is an example of how to utilize the Visual Grounding Evaluator:
136+
### MMScan Evaluation
198137

199-
```python
200-
# Initialize the evaluator with show_results enabled to display results
201-
my_evaluator = MMScan_VG_evaluator(show_results=True)
138+
Our evaluation tool is designed to streamline the assessment of model outputs for the MMScan task, providing essential metrics to gauge model performance effectively. We provide three evaluation tools: `VisualGroundingEvaluator`, `QuestionAnsweringEvaluator`, and `GPTEvaluator`. For more details, please refer to the [documentation](https://rbler1234.gitbook.io/mmscan-devkit-tutorial/evaluator).
202139

203-
# Update the evaluator with the model's output
204-
my_evaluator.update(model_output)
205-
206-
# Start the evaluation process and retrieve metric results
207-
metric_dict = my_evaluator.start_evaluation()
208-
209-
# Optional: Retrieve detailed sample-level results
210-
print(my_evaluator.records)
211-
212-
# Optional: Show the table of results
213-
print(my_evaluator.print_result())
214-
215-
# Important: Reset the evaluator after use
216-
my_evaluator.reset()
217-
```
218-
219-
The evaluator expects input data in a specific format, structured as follows:
220-
221-
```python
222-
[
223-
{
224-
"pred_scores" (tensor/ndarray): Confidence scores for each prediction. Shape: (num_pred, 1)
225-
226-
"pred_bboxes"/"gt_bboxes" (tensor/ndarray): List of 9 DoF bounding boxes.
227-
Supports two input formats:
228-
1. 9-dof box format: (num_pred/gt, 9)
229-
2. center, size and rotation matrix:
230-
"center": (num_pred/gt, 3),
231-
"size" : (num_pred/gt, 3),
232-
"rot" : (num_pred/gt, 3, 3)
233-
234-
"subclass": The subclass of each VG sample.
235-
"index": Index of the sample.
236-
}
237-
...
238-
]
239-
```
240-
241-
#### 2. Question Answering Evaluator
242-
243-
The question answering evaluator measures performance using several established metrics:
244-
245-
- **Bleu-X**: Evaluates n-gram overlap between prediction and ground truths.
246-
- **Meteor**: Focuses on precision, recall, and synonymy.
247-
- **CIDEr**: Considers consensus-based agreement.
248-
- **SPICE**: Used for semantic propositional content.
249-
- **SimCSE/SBERT**: Semantic similarity measures using sentence embeddings.
250-
- **EM (Exact Match) and Refine EM**: Compare exact matches between predictions and ground truths.
140+
```bash
141+
from mmscan import MMScan
251142

252-
```python
253-
# Initialize evaluator with pre-trained weights for SIMCSE and SBERT
254-
my_evaluator = MMScan_QA_evaluator(model_config={}, show_results=True)
143+
# (2) The evaluator tool ('VisualGroundingEvaluator', 'QuestionAnsweringEvaluator', 'GPTEvaluator')
144+
from mmscan import VisualGroundingEvaluator, QuestionAnsweringEvaluator, GPTEvaluator
255145

256-
# Update evaluator with model output
146+
# For VisualGroundingEvaluator and QuestionAnsweringEvaluator, initialize the evaluator in the following way, update the model output to the evaluator, and finally perform the evaluation and save the final results.
147+
my_evaluator = VisualGroundingEvaluator(show_results=True) / QuestionAnsweringEvaluaton(show_results=True)
257148
my_evaluator.update(model_output)
258-
259-
# Start evaluation and obtain metrics
260149
metric_dict = my_evaluator.start_evaluation()
261150

262-
# Optional: View detailed sample-level results
263-
print(my_evaluator.records)
264-
265-
# Important: Reset evaluator after completion
266-
my_evaluator.reset()
267-
```
151+
# For GPTEvaluator, initialize the Evaluator in the following way, and evaluate the model's output using multithreading, finally saving the results to the specified path (tmp_path).
152+
gpt_evaluator = GPTEvaluator(API_key='XXX')
153+
metric_dict = gpt_evaluator.load_and_eval(model_output, num_threads=1, tmp_path='XXX')
268154

269-
The evaluator requires input data structured as follows:
270-
271-
```python
272-
[
273-
{
274-
"question" (str): The question text,
275-
"pred" (list[str]): The predicted answer, single element list,
276-
"gt" (list[str]): Ground truth answers, containing multiple elements,
277-
"ID": Unique ID for each QA sample,
278-
"index": Index of the sample,
279-
}
280-
...
281-
]
282155
```
283156

284-
#### 3. GPT Evaluator
285157

286-
In addition to classical QA metrics, the GPT evaluator offers a more advanced evaluation process.
287-
288-
```python
289-
# Initialize GPT evaluator with an API key for access
290-
my_evaluator = MMScan_GPT_Evaluator(API_key='XXX')
291-
292-
# Load, evaluate with multiprocessing, and store results in temporary path
293-
metric_dict = my_evaluator.load_and_eval(model_output, num_threads=5, tmp_path='XXX')
294-
295-
# Important: Reset evaluator when finished
296-
my_evaluator.reset()
297-
```
298-
299-
The input structure remains the same as for the question answering evaluator:
300-
301-
```python
302-
[
303-
{
304-
"question" (str): The question text,
305-
"pred" (list[str]): The predicted answer, single element list,
306-
"gt" (list[str]): Ground truth answers, containing multiple elements,
307-
"ID": Unique ID for each QA sample,
308-
"index": Index of the sample,
309-
}
310-
...
311-
]
312-
```
158+
### MMScan HVG Challenge Submission
159+
To participate in the MMScan Visual Grounding Challenge and submit your results, please follow the instructions available on our [test server](https://huggingface.co/spaces/rbler/3d-iou-challenge). We welcome your feedback and inquiries—please feel free to contact us at linjingli@166.com.
313160

314161
## 🏆 MMScan Benchmark
315162

163+
<div align=center>
164+
<img src="assets/mix.png" width=95%>
165+
</div>
316166

317167
### MMScan Visual Grounding Benchmark
318168

319169
| Methods | gTop-1 | gTop-3 | AP<sub>sample</sub> | AP<sub>box</sub> | AR | Release | Download |
320-
|---------|--------|--------|---------------------|------------------|----|-------|----|
321-
| ScanRefer | 4.74 | 9.19 | 9.49 | 2.28 | 47.68 | [code](https://github.com/rbler1234/EmbodiedScan/tree/mmscan-devkit/models/Scanrefer) | [model](https://drive.google.com/file/d/1C0-AJweXEc-cHTe9tLJ3Shgqyd44tXqY/view?usp=drive_link) \| [log](https://drive.google.com/file/d/1ENOS2FE7fkLPWjIf9J76VgiPrn6dGKvi/view?usp=drive_link) |
170+
|---------|----------------|-----------|---------------------|------------------|----|-------|----|
171+
| ScanRefer | 4.74 | 9.19 | 9.49 | 2.28 | 47.68 | [code](./models/Scanrefer/README.md) | [model](https://drive.google.com/file/d/1C0-AJweXEc-cHTe9tLJ3Shgqyd44tXqY/view?usp=drive_link) | [log](https://drive.google.com/file/d/1ENOS2FE7fkLPWjIf9J76VgiPrn6dGKvi/view?usp=drive_link) |
322172
| MVT | 7.94 | 13.07 | 13.67 | 2.50 | 86.86 | - | - |
323173
| BUTD-DETR | 15.24 | 20.68 | 18.58 | 9.27 | 66.62 | - | - |
324174
| ReGround3D | 16.35 | 26.13 | 22.89 | 5.25 | 43.24 | - | - |
325-
| EmbodiedScan | 19.66 | 34.00 | 29.30 | **15.18** | 59.96 | [code](https://github.com/OpenRobotLab/EmbodiedScan/tree/mmscan/models/EmbodiedScan) | [model](https://drive.google.com/file/d/1F6cHY6-JVzAk6xg5s61aTT-vD-eu_4DD/view?usp=drive_link) \| [log](https://drive.google.com/file/d/1Ua_-Z2G3g0CthbeBkrR1a7_sqg_Spd9s/view?usp=drive_link) |
175+
| EmbodiedScan | 19.66 | 34.00 | 29.30 | **15.18** | 59.96 | [code](./models/EmbodiedScan/README.md) | [model](https://drive.google.com/file/d/1F6cHY6-JVzAk6xg5s61aTT-vD-eu_4DD/view?usp=drive_link) | [log](https://drive.google.com/file/d/1Ua_-Z2G3g0CthbeBkrR1a7_sqg_Spd9s/view?usp=drive_link) |
326176
| 3D-VisTA | 25.38 | 35.41 | 33.47 | 6.67 | 87.52 | - | - |
327177
| ViL3DRef | **26.34** | **37.58** | **35.09** | 6.65 | 86.86 | - | - |
328178

329179
### MMScan Question Answering Benchmark
180+
330181
| Methods | Overall | ST-attr | ST-space | OO-attr | OO-space | OR| Advanced | Release | Download |
331182
|---|--------|--------|--------|--------|--------|--------|-------|----|----|
332-
| LL3DA | 45.7 | 39.1 | 58.5 | 43.6 | 55.9 | 37.1 | 24.0| [code](https://github.com/rbler1234/EmbodiedScan/tree/mmscan-devkit/models/LL3DA) | [model](https://drive.google.com/file/d/1mcWNHdfrhdbtySBtmG-QRH1Y1y5U3PDQ/view?usp=drive_link) \| [log](https://drive.google.com/file/d/1VHpcnO0QmAvMa0HuZa83TEjU6AiFrP42/view?usp=drive_link) |
333-
| LEO |54.6 | 48.9 | 62.7 | 50.8 | 64.7 | 50.4 | 45.9 | [code](https://github.com/rbler1234/EmbodiedScan/tree/mmscan-devkit/models/LEO) | [model](https://drive.google.com/drive/folders/1HZ38LwRe-1Q_VxlWy8vqvImFjtQ_b9iA?usp=drive_link)|
183+
| LL3DA | 45.7 | 39.1 | 58.5 | 43.6 | 55.9 | 37.1 | 24.0| [code](./models/LL3DA/README.md) | [model](https://drive.google.com/file/d/1mcWNHdfrhdbtySBtmG-QRH1Y1y5U3PDQ/view?usp=drive_link) | [log](https://drive.google.com/file/d/1VHpcnO0QmAvMa0HuZa83TEjU6AiFrP42/view?usp=drive_link) |
184+
| LEO |54.6 | 48.9 | 62.7 | 50.8 | 64.7 | 50.4 | 45.9 | [code](./models/LEO/README.md) | [model](https://drive.google.com/drive/folders/1HZ38LwRe-1Q_VxlWy8vqvImFjtQ_b9iA?usp=drive_link)|
334185
| LLaVA-3D |**61.6** | 58.5 | 63.5 | 56.8 | 75.6 | 58.0 | 38.5|- | - |
335186

336187
*Note:* These two tables only show the results for main metrics; see the paper for complete results.
337188

338-
We have released the codes of some models under [./models](./models/README.md).
189+
We have released the codes of some models under [./models](./models).
339190

340191
## 📝 TODO List
341192

342-
343193
- \[ \] MMScan annotation and samples for ARKitScenes.
344-
- \[ \] Online evaluation platform for the MMScan benchmark.
345194
- \[ \] Codes of more MMScan Visual Grounding baselines and Question Answering baselines.
346195
- \[ \] Full release and further updates.
-3.11 MB
Binary file not shown.

assets/2406.09401v2.pdf

3.13 MB
Binary file not shown.

assets/LEO.png

398 KB
Loading

assets/LL3DA.png

239 KB
Loading

assets/Scanrefer.png

303 KB
Loading

assets/benchmark.png

435 KB
Loading

assets/circle.png

322 KB
Loading

assets/ex.png

677 KB
Loading

assets/graph.png

684 KB
Loading

0 commit comments

Comments
 (0)