Create a Google Vertex AI inference endpoint Generally available; Added in 8.15.0

PUT /_inference/{task_type}/{googlevertexai_inference_id}

Create an inference endpoint to perform an inference task with the googlevertexai service.

Required authorization

  • Cluster privileges: manage_inference

Path parameters

  • task_type string

    The type of the inference task that the model will perform.

    Values are rerank, text_embedding, completion, or chat_completion.

  • googlevertexai_inference_id string Required

    The unique identifier of the inference endpoint.

application/json

Body

  • chunking_settings object

    Chunking configuration object

    Hide chunking_settings attributes Show chunking_settings attributes object
    • max_chunk_size number

      The maximum size of a chunk in words. This value cannot be higher than 300 or lower than 20 (for sentence strategy) or 10 (for word strategy).

      Default value is 250.

    • overlap number

      The number of overlapping words for chunks. It is applicable only to a word chunking strategy. This value cannot be higher than half the max_chunk_size value.

      Default value is 100.

    • sentence_overlap number

      The number of overlapping sentences for chunks. It is applicable only for a sentence chunking strategy. It can be either 1 or 0.

      Default value is 1.

    • strategy string

      The chunking strategy: sentence or word.

      Default value is sentence.

  • service string Required

    Value is googlevertexai.

  • service_settings object Required
    Hide service_settings attributes Show service_settings attributes object
    • location string Required

      The name of the location to use for the inference task. Refer to the Google documentation for the list of supported locations.

      External documentation
    • model_id string Required

      The name of the model to use for the inference task. Refer to the Google documentation for the list of supported models.

      External documentation
    • project_id string Required

      The name of the project to use for the inference task.

    • rate_limit object

      This setting helps to minimize the number of rate limit errors returned from the service.

      Hide rate_limit attribute Show rate_limit attribute object
      • requests_per_minute number

        The number of requests allowed per minute. By default, the number of requests allowed per minute is set by each service as follows:

        • alibabacloud-ai-search service: 1000
        • anthropic service: 50
        • azureaistudio service: 240
        • azureopenai service and task type text_embedding: 1440
        • azureopenai service and task type completion: 120
        • cohere service: 10000
        • elastic service and task type chat_completion: 240
        • googleaistudio service: 360
        • googlevertexai service: 30000
        • hugging_face service: 3000
        • jinaai service: 2000
        • mistral service: 240
        • openai service and task type text_embedding: 3000
        • openai service and task type completion: 500
        • voyageai service: 2000
        • watsonxai service: 120
    • service_account_json string Required

      A valid service account in JSON format for the Google Vertex AI API.

  • task_settings object
    Hide task_settings attributes Show task_settings attributes object
    • auto_truncate boolean

      For a text_embedding task, truncate inputs longer than the maximum token length automatically.

    • top_n number

      For a rerank task, the number of the top N documents that should be returned.

Responses

  • 200 application/json
    Hide response attributes Show response attributes object
    • chunking_settings object

      Chunking configuration object

      Hide chunking_settings attributes Show chunking_settings attributes object
      • max_chunk_size number

        The maximum size of a chunk in words. This value cannot be higher than 300 or lower than 20 (for sentence strategy) or 10 (for word strategy).

        Default value is 250.

      • overlap number

        The number of overlapping words for chunks. It is applicable only to a word chunking strategy. This value cannot be higher than half the max_chunk_size value.

        Default value is 100.

      • sentence_overlap number

        The number of overlapping sentences for chunks. It is applicable only for a sentence chunking strategy. It can be either 1 or 0.

        Default value is 1.

      • strategy string

        The chunking strategy: sentence or word.

        Default value is sentence.

    • service string Required

      The service type

    • service_settings object Required
    • task_settings object
    • inference_id string Required

      The inference Id

    • task_type string Required

      Values are text_embedding or rerank.

PUT /_inference/{task_type}/{googlevertexai_inference_id}
PUT _inference/text_embedding/google_vertex_ai_embeddingss { "service": "googlevertexai", "service_settings": { "service_account_json": "service-account-json", "model_id": "model-id", "location": "location", "project_id": "project-id" } }
resp = client.inference.put( task_type="text_embedding", inference_id="google_vertex_ai_embeddingss", inference_config={ "service": "googlevertexai", "service_settings": { "service_account_json": "service-account-json", "model_id": "model-id", "location": "location", "project_id": "project-id" } }, )
const response = await client.inference.put({ task_type: "text_embedding", inference_id: "google_vertex_ai_embeddingss", inference_config: { service: "googlevertexai", service_settings: { service_account_json: "service-account-json", model_id: "model-id", location: "location", project_id: "project-id", }, }, });
response = client.inference.put( task_type: "text_embedding", inference_id: "google_vertex_ai_embeddingss", body: { "service": "googlevertexai", "service_settings": { "service_account_json": "service-account-json", "model_id": "model-id", "location": "location", "project_id": "project-id" } } )
$resp = $client->inference()->put([ "task_type" => "text_embedding", "inference_id" => "google_vertex_ai_embeddingss", "body" => [ "service" => "googlevertexai", "service_settings" => [ "service_account_json" => "service-account-json", "model_id" => "model-id", "location" => "location", "project_id" => "project-id", ], ], ]);
curl -X PUT -H "Authorization: ApiKey $ELASTIC_API_KEY" -H "Content-Type: application/json" -d '{"service":"googlevertexai","service_settings":{"service_account_json":"service-account-json","model_id":"model-id","location":"location","project_id":"project-id"}}' "$ELASTICSEARCH_URL/_inference/text_embedding/google_vertex_ai_embeddingss"
client.inference().put(p -> p .inferenceId("google_vertex_ai_embeddingss") .taskType(TaskType.TextEmbedding) .inferenceConfig(i -> i .service("googlevertexai") .serviceSettings(JsonData.fromJson("{\"service_account_json\":\"service-account-json\",\"model_id\":\"model-id\",\"location\":\"location\",\"project_id\":\"project-id\"}")) ) ); 
Request examples
Run `PUT _inference/text_embedding/google_vertex_ai_embeddings` to create an inference endpoint to perform a `text_embedding` task type.
{ "service": "googlevertexai", "service_settings": { "service_account_json": "service-account-json", "model_id": "model-id", "location": "location", "project_id": "project-id" } }
Run `PUT _inference/rerank/google_vertex_ai_rerank` to create an inference endpoint to perform a `rerank` task type.
{ "service": "googlevertexai", "service_settings": { "service_account_json": "service-account-json", "project_id": "project-id" } }