[Benchmark] Add RefCOCO #1109

mushenL · 2025-12-23T15:29:12Z

add refcoco benchmark

gemini-code-assist · 2025-12-23T15:29:30Z

Summary of Changes

Hello @mushenL, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request integrates the RefCOCO benchmark into the evalscope framework, significantly expanding its capabilities for evaluating multi-modal models on referring expression comprehension. It provides a robust set of tools for assessing model performance across various tasks, from generating descriptive captions to precisely localizing objects via bounding box predictions, all within a flexible and configurable evaluation environment.

Highlights

New Benchmark Addition: The RefCOCO benchmark has been added to the system, enabling evaluation for referring expression comprehension tasks.
Comprehensive Evaluation Metrics: New evaluation functions are introduced to calculate metrics for both natural language generation (Bleu, METEOR, ROUGE_L, CIDEr) and bounding box prediction (IoU, ACC@thresholds, Center_ACC).
Flexible Evaluation Modes: The RefCOCO benchmark supports multiple evaluation modes ('bbox', 'seg', 'bbox_rec') configurable via an 'eval_mode' parameter, allowing tailored assessment based on the task.
Data Adaptation and Prompt Generation: A dedicated data adapter (RefCOCOAdapter) is implemented to transform raw RefCOCO dataset records into evaluation samples, including specific prompt formatting for bounding box regression and image processing.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request adds a new benchmark for RefCOCO. The changes include an evaluation library for calculating scores and a data adapter for processing the dataset. My review focuses on improving code robustness, correctness, and maintainability. I've suggested handling potential division-by-zero errors, using proper exception handling, replacing string-based exceptions with actual exception objects, and improving code clarity by using _ for unused variables and translating comments to English. I also recommend re-enabling a commented-out try-except block to prevent crashes during metric calculation.

evalscope/benchmarks/refcoco/evaluation_lib.py

gemini-code-assist · 2025-12-23T15:30:46Z

evalscope/benchmarks/refcoco/ref_coco_adapter.py

+ segmentation = record.get('segmentation')
+ image_data = refcoco_seg_doc_to_visual(image_data, segmentation)
+ else:
+ raise 'Invalid eval mode parameter'


Raising a string literal like 'Invalid eval mode parameter' will result in a TypeError at runtime, not the exception you might expect. You should raise an actual Exception instance, such as ValueError.

Suggested change

raise 'Invalid eval mode parameter'

raise ValueError('Invalid eval mode parameter')

gemini-code-assist · 2025-12-23T15:30:46Z

evalscope/benchmarks/refcoco/ref_coco_adapter.py

+ elif self.eval_mode in ['bbox', 'seg']:
+ target = record.get('answer')
+ else:
+ raise 'Invalid eval mode parameter'


Raising a string literal like 'Invalid eval mode parameter' will result in a TypeError at runtime, not the exception you might expect. You should raise an actual Exception instance, such as ValueError.

Suggested change

raise 'Invalid eval mode parameter'

raise ValueError('Invalid eval mode parameter')

gemini-code-assist · 2025-12-23T15:30:46Z

evalscope/benchmarks/refcoco/ref_coco_adapter.py

+ # try:
+ results = process_results(doc, filtered_prediction)
+ score.value.update(results)
+
+ score.main_score_name = doc['eval_mode']
+
+ # except Exception as e:
+ # logger.error(f'Error calculating ref_coco metrics: {e}')
+ # score.value = {}


The try...except block is commented out. This is risky as any error during metric calculation in process_results will crash the evaluation for this sample. It's better to enable this error handling to gracefully manage failures and log them, allowing the evaluation to continue with other samples.

Suggested change

# try:

results = process_results(doc, filtered_prediction)

score.value.update(results)

score.main_score_name = doc['eval_mode']

# except Exception as e:

# logger.error(f'Error calculating ref_coco metrics: {e}')

# score.value = {}

try:

results = process_results(doc, filtered_prediction)

score.value.update(results)

score.main_score_name = doc['eval_mode']

except Exception as e:

logger.error(f'Error calculating ref_coco metrics: {e}')

score.value = {}

evalscope/benchmarks/refcoco/evaluation_lib.py

gemini-code-assist · 2025-12-23T15:30:46Z

evalscope/benchmarks/refcoco/ref_coco_adapter.py

+
+ def extract_answer(self, prediction: str, task_state: TaskState):
+ if task_state.metadata['eval_mode'] == 'bbox_rec':
+ # 匹配[a, b, c, d]格式的答案


The comment is in Chinese. For consistency and to make the code accessible to a wider audience, please write comments in English.

Suggested change

# 匹配[a, b, c, d]格式的答案

# Match answers in the format [a, b, c, d]

杨堃 added 2 commits December 23, 2025 23:06

add refcoco benchmark

063bf1f

add refcoco benchmark

2ae858a

gemini-code-assist bot reviewed Dec 23, 2025

View reviewed changes

Yunnglin changed the title ~~Yk 20251223~~ [Benchmark] Add RefCOCO Dec 24, 2025

update refcoco

3b55d3c

Yunnglin merged commit 2c0e014 into modelscope:main Dec 24, 2025
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Benchmark] Add RefCOCO #1109

[Benchmark] Add RefCOCO #1109

mushenL commented Dec 23, 2025

gemini-code-assist bot commented Dec 23, 2025

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Dec 23, 2025

gemini-code-assist bot Dec 23, 2025

gemini-code-assist bot Dec 23, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist bot Dec 23, 2025

Uh oh!

Labels

2 participants

	raise 'Invalid eval mode parameter'
	raise ValueError('Invalid eval mode parameter')

	# 匹配[a, b, c, d]格式的答案
	# Match answers in the format [a, b, c, d]

[Benchmark] Add RefCOCO #1109

[Benchmark] Add RefCOCO #1109

Conversation

mushenL commented Dec 23, 2025

gemini-code-assist bot commented Dec 23, 2025

Summary of Changes

Highlights

Footnotes

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Dec 23, 2025

Choose a reason for hiding this comment

gemini-code-assist bot Dec 23, 2025

Choose a reason for hiding this comment

gemini-code-assist bot Dec 23, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist bot Dec 23, 2025

Choose a reason for hiding this comment

Uh oh!

Labels

2 participants