feat: add base_model eval type and completions support #1069

TrinityXI · 2025-12-11T13:41:30Z

New Feature: Support for `base_model` Evaluation Type

EvalType.BASE_MODEL is now introduced to support evaluation of non-chat (base/completion-style) models via the OpenAI-compatible /completions endpoint, clearly distinguished from the existing server/openai_api type that uses /chat/completions.

Key Implementation Details

Added new evaluation type: EvalType.BASE_MODEL
Registered corresponding API class: OpenAIBaseModelAPI, using the completions call chain
When constructing prompts, chat message lists are flattened into plain text in the following format:
```
user: xxx assistant: yyy user: zzz assistant: 
```
(final assistant: suffix preserved to trigger continuation)
Fully supports streaming responses from /completions
All responses are normalized into the unified ModelOutput structure, ensuring seamless integration with existing evaluation pipelines

Default Configuration & Compatibility

Reuses the default generation settings of the service class
Temperature defaults to 0
Automatically injects max_tokens=512 (can be overridden via generation_config, e.g., max_tokens=2048)
Base URL normalization now strips both trailing /chat/completions and /completions to prevent duplication
No tool calling support — tools, tool_choice, etc., are ignored
Unsupported parameters in /completions requests are automatically moved to extra_body for compatibility

Usage Example (CLI)

--eval-type base_model \ --api-url https://api.example.com/v1 \ --api-key sk-xxx \ --generation-config '{"max_tokens": 2048, "temperature": 0}'

Important Note for Few-Shot Evaluation
When using few-shot prompting with the /completions endpoint, it is strongly recommended to explicitly specify stop_seqs to prevent the model from continuing to generate subsequent examples:

"stop_seqs": ["\nquestion:", "\n\nquestion:", "Question:", "question:\n"]

Scope of Changes

New/updated files:
constants.py, config.py, models/model_apis.py, models/openai_compatible.py, models/utils/openai.py
Zero breaking changes to existing server/openai_api behavior
Fully backward compatible extension

This addition enables accurate and standardized evaluation of base (non-instruction-tuned) models through OpenAI-compatible inference services, closing a long-standing gap in evaluation coverage.

sync with main

gemini-code-assist · 2025-12-11T13:41:46Z

Summary of Changes

Hello @TrinityXI, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly extends the evaluation framework by introducing EvalType.BASE_MODEL, enabling the standardized evaluation of non-chat models through OpenAI-compatible /completions endpoints. It includes a new API class, OpenAIBaseModelAPI, to manage interactions with these models, handling prompt formatting, streaming responses, and normalizing outputs into the existing ModelOutput structure. The update ensures broader model coverage without impacting current functionalities, providing a robust solution for evaluating a wider range of language models.

Highlights

New Evaluation Type: Introduced EvalType.BASE_MODEL to support evaluation of non-chat (base/completion-style) models using the OpenAI-compatible /completions endpoint, distinct from the existing /chat/completions type.
Dedicated API Class: Registered OpenAIBaseModelAPI to handle the /completions call chain, ensuring proper interaction with base models.
Prompt Formatting: Chat message lists are now flattened into a plain text format (e.g., user: xxx\nassistant: yyy) when constructing prompts for the /completions endpoint, with a final assistant: suffix for continuation.
Streaming Support: Full support for streaming responses from the /completions endpoint has been implemented.
Response Normalization: All responses from the new BASE_MODEL type are normalized into the unified ModelOutput structure for seamless integration with existing evaluation pipelines.
Default Configuration: Default generation settings for BASE_MODEL now include temperature=0.0 and max_tokens=512, which can be overridden by generation_config.
URL Normalization: Base URL normalization logic has been enhanced to strip both trailing /chat/completions and /completions to prevent duplication issues.
Parameter Handling: Tool calling is explicitly ignored for BASE_MODEL, and unsupported parameters in /completions requests are automatically moved to extra_body for compatibility.
Backward Compatibility: The changes introduce zero breaking changes to existing server/openai_api behavior and are fully backward compatible.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces support for evaluating base models via the /completions endpoint by adding a new EvalType.BASE_MODEL. The implementation is comprehensive, covering prompt construction, streaming support, and response normalization.

My review focuses on a few key areas:

A potential breaking change in the default configuration for existing service types.
Missing feature support for logprobs in completion responses.
Minor improvements for robustness and correctness in utility functions.

Overall, this is a great addition that expands the evaluation capabilities of the framework. The suggested changes aim to improve correctness and avoid unintended side effects.

gemini-code-assist · 2025-12-11T13:43:05Z

evalscope/config.py

+ elif self.eval_type in (EvalType.SERVICE, EvalType.BASE_MODEL):
 self.generation_config = {
 'temperature': 0.0,
+ 'max_tokens': 512,
 }


This change introduces a default max_tokens: 512 for EvalType.SERVICE in addition to the new EvalType.BASE_MODEL. Previously, EvalType.SERVICE did not have a default max_tokens, so it would use the model's default. This is a behavioral change that could unexpectedly truncate outputs for existing users of EvalType.SERVICE. The pull request description states "Zero breaking changes to existing server/openai_api behavior", but this change seems to contradict that. To avoid breaking changes, you might want to apply this default only to EvalType.BASE_MODEL.

Suggested change

elif self.eval_type in (EvalType.SERVICE, EvalType.BASE_MODEL):

self.generation_config = {

'temperature': 0.0,

'max_tokens': 512,

}

elif self.eval_type == EvalType.SERVICE:

self.generation_config = {

'temperature': 0.0,

}

elif self.eval_type == EvalType.BASE_MODEL:

self.generation_config = {

'temperature': 0.0,

'max_tokens': 512,

}

gemini-code-assist · 2025-12-11T13:43:05Z

evalscope/models/openai_compatible.py

+ if not hasattr(self, '_valid_completion_params'):
+ self._valid_completion_params = get_supported_params(self.client.completions.create)


It's good practice to cache the result of get_supported_params to avoid repeated and potentially expensive reflection calls. However, this caching is implemented on the instance (self). If multiple instances of OpenAIBaseModelAPI are created, this check will be performed for each one. Consider caching this at the class level to optimize further, as the supported parameters for self.client.completions.create will be the same across all instances of this class.

Suggested change

if not hasattr(self, '_valid_completion_params'):

self._valid_completion_params = get_supported_params(self.client.completions.create)

if not hasattr(OpenAIBaseModelAPI, '_valid_completion_params'):

OpenAIBaseModelAPI._valid_completion_params = get_supported_params(self.client.completions.create)

gemini-code-assist · 2025-12-11T13:43:06Z

evalscope/models/utils/openai.py

+ """Flatten chat messages into a simple text prompt for completions API."""
+ parts: List[str] = []
+ for message in messages:
+ role = getattr(message, 'role', 'user')


Using getattr(message, 'role', 'user') is defensive, but ChatMessage is a Union of types that all define a role attribute. Relying on getattr with a default might mask potential issues where a message object is missing its role, which would cause it to silently default to 'user' and lead to incorrect prompt formatting. It would be more robust and clearer to directly access message.role. If there's a scenario where role can be missing, it might be better to handle that case explicitly or adjust the type hints.

Suggested change

role = getattr(message, 'role', 'user')

role = message.role

gemini-code-assist · 2025-12-11T13:43:06Z

evalscope/models/utils/openai.py

+ ChatCompletionChoice(
+ message=ChatMessageAssistant(content=(choice.text or ''), model=response.model, source='generate'),
+ stop_reason=as_stop_reason(choice.finish_reason),
+ logprobs=None,


The logprobs field is hardcoded to None. The OpenAI completions API can return log probabilities if they are requested in generation_config. To fully support the features of the completions endpoint, it would be beneficial to parse the logprobs from the choice object and populate the logprobs field in ChatCompletionChoice. This would be consistent with how chat_choices_from_openai handles logprobs for chat completions.

gemini-code-assist · 2025-12-11T13:43:06Z

evalscope/models/utils/openai.py

+ CompletionChoice(
+ finish_reason=finish_reasons.get(index, 'stop'),
+ index=index,
+ logprobs=None,


Similar to the non-streaming case, logprobs are hardcoded to None when reconstructing choices from a stream. The streaming chunks can contain logprob information which is currently being discarded. To provide full feature support, please consider collecting and reconstructing the logprobs from the stream as well.

gemini-code-assist · 2025-12-11T13:43:06Z

evalscope/models/utils/openai.py

+ choices=choices,
+ created=last_chunk.created,
+ model=last_chunk.model,
+ object=getattr(last_chunk, 'object', 'completion'),


The object attribute of the reconstructed Completion object is set using getattr(last_chunk, 'object', 'completion'). According to the OpenAI API documentation, the object type for a completion object is 'text_completion'. Using 'completion' might be incorrect. For consistency with the OpenAI spec and with how collect_stream_response handles chat completions (it sets object='chat.completion'), this should probably be set to 'text_completion'. The object for a streaming chunk is also 'text_completion', so you could just use last_chunk.object.

Suggested change

object=getattr(last_chunk, 'object', 'completion'),

object='text_completion',

Yunnglin · 2025-12-12T03:42:31Z

@TrinityXI Thanks for the PR! A few things to address:

Registering the OpenAI API completion endpoint as base_model could be ambiguous when we support local base models later. Suggest renaming to openai_api_completion for clarity.
Please review Gemini's comments and address applicable ones.
Run linting checks:

pip install pre-commit pre-commit install pre-commit run --all-files

Yunnglin

@TrinityXI Thanks for the PR! A few things to address as commented.

TrinityXI added 3 commits November 27, 2025 15:28

Merge pull request #3 from modelscope/main

7fb8830

sync with main

Merge branch 'modelscope:main' into main

3c232f5

feat: add base_model eval type and completions support

84d6e55

gemini-code-assist bot reviewed Dec 11, 2025

View reviewed changes

Yunnglin requested changes Dec 15, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add base_model eval type and completions support #1069

feat: add base_model eval type and completions support #1069

TrinityXI commented Dec 11, 2025

gemini-code-assist bot commented Dec 11, 2025

gemini-code-assist bot left a comment

gemini-code-assist bot Dec 11, 2025

gemini-code-assist bot Dec 11, 2025

gemini-code-assist bot Dec 11, 2025

gemini-code-assist bot Dec 11, 2025

gemini-code-assist bot Dec 11, 2025

gemini-code-assist bot Dec 11, 2025

Yunnglin commented Dec 12, 2025 •

edited

Loading

Yunnglin left a comment

Labels

2 participants

		if not hasattr(self, '_valid_completion_params'):
		self._valid_completion_params = get_supported_params(self.client.completions.create)

	object=getattr(last_chunk, 'object', 'completion'),
	object='text_completion',

feat: add base_model eval type and completions support #1069

Are you sure you want to change the base?

feat: add base_model eval type and completions support #1069

Conversation

TrinityXI commented Dec 11, 2025

New Feature: Support for base_model Evaluation Type

Key Implementation Details

Default Configuration & Compatibility

Usage Example (CLI)

Scope of Changes

gemini-code-assist bot commented Dec 11, 2025

Summary of Changes

Highlights

Footnotes

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

gemini-code-assist bot Dec 11, 2025

Choose a reason for hiding this comment

gemini-code-assist bot Dec 11, 2025

Choose a reason for hiding this comment

gemini-code-assist bot Dec 11, 2025

Choose a reason for hiding this comment

gemini-code-assist bot Dec 11, 2025

Choose a reason for hiding this comment

gemini-code-assist bot Dec 11, 2025

Choose a reason for hiding this comment

gemini-code-assist bot Dec 11, 2025

Choose a reason for hiding this comment

Yunnglin commented Dec 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Yunnglin left a comment

Choose a reason for hiding this comment

Labels

2 participants

New Feature: Support for `base_model` Evaluation Type

Yunnglin commented Dec 12, 2025 •

edited

Loading