Skip to content

Conversation

@TrinityXI
Copy link
Contributor

New Feature: Support for base_model Evaluation Type

EvalType.BASE_MODEL is now introduced to support evaluation of non-chat (base/completion-style) models via the OpenAI-compatible /completions endpoint, clearly distinguished from the existing server/openai_api type that uses /chat/completions.

Key Implementation Details

  • Added new evaluation type: EvalType.BASE_MODEL
  • Registered corresponding API class: OpenAIBaseModelAPI, using the completions call chain
  • When constructing prompts, chat message lists are flattened into plain text in the following format:
    user: xxx assistant: yyy user: zzz assistant: 
    (final assistant: suffix preserved to trigger continuation)
  • Fully supports streaming responses from /completions
  • All responses are normalized into the unified ModelOutput structure, ensuring seamless integration with existing evaluation pipelines

Default Configuration & Compatibility

  • Reuses the default generation settings of the service class
  • Temperature defaults to 0
  • Automatically injects max_tokens=512 (can be overridden via generation_config, e.g., max_tokens=2048)
  • Base URL normalization now strips both trailing /chat/completions and /completions to prevent duplication
  • No tool calling support — tools, tool_choice, etc., are ignored
  • Unsupported parameters in /completions requests are automatically moved to extra_body for compatibility

Usage Example (CLI)

--eval-type base_model \ --api-url https://api.example.com/v1 \ --api-key sk-xxx \ --generation-config '{"max_tokens": 2048, "temperature": 0}'

Important Note for Few-Shot Evaluation
When using few-shot prompting with the /completions endpoint, it is strongly recommended to explicitly specify stop_seqs to prevent the model from continuing to generate subsequent examples:

"stop_seqs": ["\nquestion:", "\n\nquestion:", "Question:", "question:\n"]

Scope of Changes

  • New/updated files:
    constants.py, config.py, models/model_apis.py, models/openai_compatible.py, models/utils/openai.py
  • Zero breaking changes to existing server/openai_api behavior
  • Fully backward compatible extension

This addition enables accurate and standardized evaluation of base (non-instruction-tuned) models through OpenAI-compatible inference services, closing a long-standing gap in evaluation coverage.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @TrinityXI, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly extends the evaluation framework by introducing EvalType.BASE_MODEL, enabling the standardized evaluation of non-chat models through OpenAI-compatible /completions endpoints. It includes a new API class, OpenAIBaseModelAPI, to manage interactions with these models, handling prompt formatting, streaming responses, and normalizing outputs into the existing ModelOutput structure. The update ensures broader model coverage without impacting current functionalities, providing a robust solution for evaluating a wider range of language models.

Highlights

  • New Evaluation Type: Introduced EvalType.BASE_MODEL to support evaluation of non-chat (base/completion-style) models using the OpenAI-compatible /completions endpoint, distinct from the existing /chat/completions type.
  • Dedicated API Class: Registered OpenAIBaseModelAPI to handle the /completions call chain, ensuring proper interaction with base models.
  • Prompt Formatting: Chat message lists are now flattened into a plain text format (e.g., user: xxx\nassistant: yyy) when constructing prompts for the /completions endpoint, with a final assistant: suffix for continuation.
  • Streaming Support: Full support for streaming responses from the /completions endpoint has been implemented.
  • Response Normalization: All responses from the new BASE_MODEL type are normalized into the unified ModelOutput structure for seamless integration with existing evaluation pipelines.
  • Default Configuration: Default generation settings for BASE_MODEL now include temperature=0.0 and max_tokens=512, which can be overridden by generation_config.
  • URL Normalization: Base URL normalization logic has been enhanced to strip both trailing /chat/completions and /completions to prevent duplication issues.
  • Parameter Handling: Tool calling is explicitly ignored for BASE_MODEL, and unsupported parameters in /completions requests are automatically moved to extra_body for compatibility.
  • Backward Compatibility: The changes introduce zero breaking changes to existing server/openai_api behavior and are fully backward compatible.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for evaluating base models via the /completions endpoint by adding a new EvalType.BASE_MODEL. The implementation is comprehensive, covering prompt construction, streaming support, and response normalization.

My review focuses on a few key areas:

  • A potential breaking change in the default configuration for existing service types.
  • Missing feature support for logprobs in completion responses.
  • Minor improvements for robustness and correctness in utility functions.

Overall, this is a great addition that expands the evaluation capabilities of the framework. The suggested changes aim to improve correctness and avoid unintended side effects.

Comment on lines +198 to 202
elif self.eval_type in (EvalType.SERVICE, EvalType.BASE_MODEL):
self.generation_config = {
'temperature': 0.0,
'max_tokens': 512,
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This change introduces a default max_tokens: 512 for EvalType.SERVICE in addition to the new EvalType.BASE_MODEL. Previously, EvalType.SERVICE did not have a default max_tokens, so it would use the model's default. This is a behavioral change that could unexpectedly truncate outputs for existing users of EvalType.SERVICE. The pull request description states "Zero breaking changes to existing server/openai_api behavior", but this change seems to contradict that. To avoid breaking changes, you might want to apply this default only to EvalType.BASE_MODEL.

Suggested change
elif self.eval_type in (EvalType.SERVICE, EvalType.BASE_MODEL):
self.generation_config = {
'temperature': 0.0,
'max_tokens': 512,
}
elif self.eval_type == EvalType.SERVICE:
self.generation_config = {
'temperature': 0.0,
}
elif self.eval_type == EvalType.BASE_MODEL:
self.generation_config = {
'temperature': 0.0,
'max_tokens': 512,
}
Comment on lines +190 to +191
if not hasattr(self, '_valid_completion_params'):
self._valid_completion_params = get_supported_params(self.client.completions.create)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

It's good practice to cache the result of get_supported_params to avoid repeated and potentially expensive reflection calls. However, this caching is implemented on the instance (self). If multiple instances of OpenAIBaseModelAPI are created, this check will be performed for each one. Consider caching this at the class level to optimize further, as the supported parameters for self.client.completions.create will be the same across all instances of this class.

Suggested change
if not hasattr(self, '_valid_completion_params'):
self._valid_completion_params = get_supported_params(self.client.completions.create)
if not hasattr(OpenAIBaseModelAPI, '_valid_completion_params'):
OpenAIBaseModelAPI._valid_completion_params = get_supported_params(self.client.completions.create)
"""Flatten chat messages into a simple text prompt for completions API."""
parts: List[str] = []
for message in messages:
role = getattr(message, 'role', 'user')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using getattr(message, 'role', 'user') is defensive, but ChatMessage is a Union of types that all define a role attribute. Relying on getattr with a default might mask potential issues where a message object is missing its role, which would cause it to silently default to 'user' and lead to incorrect prompt formatting. It would be more robust and clearer to directly access message.role. If there's a scenario where role can be missing, it might be better to handle that case explicitly or adjust the type hints.

Suggested change
role = getattr(message, 'role', 'user')
role = message.role
ChatCompletionChoice(
message=ChatMessageAssistant(content=(choice.text or ''), model=response.model, source='generate'),
stop_reason=as_stop_reason(choice.finish_reason),
logprobs=None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The logprobs field is hardcoded to None. The OpenAI completions API can return log probabilities if they are requested in generation_config. To fully support the features of the completions endpoint, it would be beneficial to parse the logprobs from the choice object and populate the logprobs field in ChatCompletionChoice. This would be consistent with how chat_choices_from_openai handles logprobs for chat completions.

CompletionChoice(
finish_reason=finish_reasons.get(index, 'stop'),
index=index,
logprobs=None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Similar to the non-streaming case, logprobs are hardcoded to None when reconstructing choices from a stream. The streaming chunks can contain logprob information which is currently being discarded. To provide full feature support, please consider collecting and reconstructing the logprobs from the stream as well.

choices=choices,
created=last_chunk.created,
model=last_chunk.model,
object=getattr(last_chunk, 'object', 'completion'),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The object attribute of the reconstructed Completion object is set using getattr(last_chunk, 'object', 'completion'). According to the OpenAI API documentation, the object type for a completion object is 'text_completion'. Using 'completion' might be incorrect. For consistency with the OpenAI spec and with how collect_stream_response handles chat completions (it sets object='chat.completion'), this should probably be set to 'text_completion'. The object for a streaming chunk is also 'text_completion', so you could just use last_chunk.object.

Suggested change
object=getattr(last_chunk, 'object', 'completion'),
object='text_completion',
@Yunnglin
Copy link
Collaborator

Yunnglin commented Dec 12, 2025

@TrinityXI Thanks for the PR! A few things to address:

  1. Registering the OpenAI API completion endpoint as base_model could be ambiguous when we support local base models later. Suggest renaming to openai_api_completion for clarity.

  2. Please review Gemini's comments and address applicable ones.

  3. Run linting checks:

pip install pre-commit pre-commit install pre-commit run --all-files
Copy link
Collaborator

@Yunnglin Yunnglin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TrinityXI Thanks for the PR! A few things to address as commented.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

2 participants