- Notifications
You must be signed in to change notification settings - Fork 25.5k
Add Google Model Garden's Anthropic support to Inference Plugin #134080
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Google Model Garden's Anthropic support to Inference Plugin #134080
Conversation
…ntegration # Conflicts: # server/src/main/java/org/elasticsearch/TransportVersions.java # x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/googlevertexai/action/GoogleVertexAiActionCreator.java # x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/googlevertexai/request/completion/GoogleVertexAiUnifiedChatCompletionRequest.java # x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/services/googlevertexai/action/GoogleVertexAiUnifiedChatCompletionActionTests.java # x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/services/googlevertexai/completion/GoogleVertexAiChatCompletionModelTests.java # x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/services/googlevertexai/request/completion/GoogleVertexAiUnifiedChatCompletionRequestTests.java
…ntegration # Conflicts: # server/src/main/java/org/elasticsearch/TransportVersions.java
…al parameters based on transport version
…ntegration # Conflicts: # server/src/main/java/org/elasticsearch/TransportVersions.java
… to support new content block types and improve parsing logic
… parser and add unit tests for response validation
…ate response parsing and error handling
…ity to validate serialization of user fields
…n model and update related tests
…ntegration # Conflicts: # server/src/main/java/org/elasticsearch/TransportVersions.java
Hello @jonathan-buttner @dan-rubinstein |
public static final TransportVersion ESQL_DOCUMENTS_FOUND_AND_VALUES_LOADED_8_19 = def(8_841_0_61); | ||
public static final TransportVersion ESQL_PROFILE_INCLUDE_PLAN_8_19 = def(8_841_0_62); | ||
public static final TransportVersion INITIAL_ELASTICSEARCH_8_19_4 = def(8_841_0_68); | ||
public static final TransportVersion ML_INFERENCE_GOOGLE_MODEL_GARDEN_ADDED_8_19 = def(8_841_0_69); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me know if this needs to be removed. I haven't seen backports in a while. But Google Vertex AI is there for quite some time, so probably we'd require one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's remove this, we won't be backporting the changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed.
…ntegration # Conflicts: # server/src/main/java/org/elasticsearch/TransportVersions.java
…ntegration # Conflicts: # server/src/main/java/org/elasticsearch/TransportVersions.java # x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/googlevertexai/completion/GoogleVertexAiChatCompletionServiceSettings.java
@jonathan-buttner your comments are addressed. Could you please take a look at the PR once more? |
…ntegration # Conflicts: # server/src/main/resources/transport/upper_bounds/9.2.csv
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good, couple more changes
} | ||
| ||
public GoogleVertexAiChatCompletionTaskSettings(StreamInput in) throws IOException { | ||
thinkingConfig = new ThinkingConfig(in); | ||
TransportVersion version = in.getTransportVersion(); | ||
if (GoogleVertexAiUtils.supportsModelGarden(version)) { | ||
maxTokens = Objects.requireNonNullElse(in.readOptionalInt(), DEFAULT_MAX_TOKENS); | ||
maxTokens = in.readOptionalInt(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use readOptionalVInt
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good thinking. Done.
@@ -124,7 +124,9 @@ public TransportVersion getMinimalSupportedVersion() { | |||
@Override | |||
public void writeTo(StreamOutput out) throws IOException { | |||
thinkingConfig.writeTo(out); | |||
out.writeOptionalInt(maxTokens); | |||
if (GoogleVertexAiUtils.supportsModelGarden(out.getTransportVersion())) { | |||
out.writeOptionalInt(maxTokens); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's use writeOptionalVInt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
delta = new StreamingUnifiedChatCompletionResults.ChatCompletionChunk.Choice.Delta( | ||
null, | ||
null, | ||
null, | ||
List.of( | ||
new StreamingUnifiedChatCompletionResults.ChatCompletionChunk.Choice.Delta.ToolCall( | ||
0, | ||
id, | ||
new StreamingUnifiedChatCompletionResults.ChatCompletionChunk.Choice.Delta.ToolCall.Function( | ||
input != null ? input.toString() : null, | ||
name | ||
), | ||
null | ||
) | ||
) | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For readability, this might be better as:
var function = new StreamingUnifiedChatCompletionResults.ChatCompletionChunk.Choice.Delta.ToolCall.Function( input != null ? input.toString() : null, name ); var toolCall = new StreamingUnifiedChatCompletionResults.ChatCompletionChunk.Choice.Delta.ToolCall(0, id, function, null); delta = new StreamingUnifiedChatCompletionResults.ChatCompletionChunk.Choice.Delta(null, null, null, List.of(toolCall));
Similar changes can be made in the parseContentBlockDelta()
method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Done!
…ntegration # Conflicts: # server/src/main/resources/transport/upper_bounds/9.2.csv
…CompletionStreamingProcessor readability
…ntegration # Conflicts: # server/src/main/resources/transport/upper_bounds/9.2.csv
Your comments are addressed. Could you please review the fixes? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the changes!
Create Completion EndpointNo Provider No URLs:
Google Provider With URLs:
Google Provider No URLs:
No URLs:
Both URLs:
Only Non-Streaming URL:
Only Streaming URL:
No Task Parameters:
Not Found:
Perform Non-Streaming CompletionNon-Streaming Both URLs
Non-Streaming Only Non-Streaming URL
Non-Streaming Only Streaming URL
Non-Streaming Without Task Settings
Perform Streaming CompletionStreaming Both URLs
Streaming Only Non-Streaming URL
Streaming Only Streaming URL
Streaming Without Task Settings
|
Create Chat Completion EndpointNo Provider No URLs:
Google Provider With URLs:
Google Provider No URLs:
No URLs:
Both URLs:
Only Non-Streaming URL:
Only Streaming URL:
No Task Parameters:
Not Found:
Testing of Performing Streaming Chat Completion is done and it is confirmed to be successful. |
Perform Chat CompletionBoth URLs
Both URLs With Max Tokens in RQ
Only Non-Streaming URL
Only Non-Streaming URL With Max Tokens in RQ
Only Streaming URL
Only Streaming URL With Max Tokens in RQ
Both URLs No task settings on creation
Both URLs No task settings on creation With Max Tokens in RQ
|
Regression Tests for Google Vertex AI. Create Completion endpointSuccess
No model_id
Perform Non-Streaming Completion
Perform Streaming Completion
Create Chat Completion endpoint
Perform Chat Completion
|
@jonathan-buttner |
Update of the existing Google Vertex AI inference provider integration allowing performing completion (both streaming and non-streaming) and chat_completion (only streaming) of Anthropic provider models withing Google Model Garden.
Changes were tested locally against next anthropic models:
Create Completion Endpoint
Success:
With max_tokens in task settings:
Unknown Provider:
No Provider + No Google Vertex AI parameters:
No URL + No Streaming URL + No Google Vertex AI parameters:
URL + No Streaming URL (URL is default for both streaming/non-streaming):
No URL + Streaming URL (Streaming URL is default for both streaming/non-streaming):
Not Found:
Perform Completion
Success Non Streaming:
Success Streaming:
Success Non Streaming with task_settings max_tokens:
Success Streaming with task_settings max_tokens:
Create Chat Completion Endpoint
Success:
Success with task_settings max_tokens:
Unknown Provider:
No url/streaming_url:
Not found:
No streaming_url (url is default for both streaming/non-streaming):
No url (steraming_url is default for both streaming/non-streaming):
Perform Chat Completion
Basic:
Complex