Create a Hugging Face inference endpoint Generally available; Added in 8.12.0
Create an inference endpoint to perform an inference task with the hugging_face
service.
You must first create an inference endpoint on the Hugging Face endpoint page to get an endpoint URL. Select the model you want to use on the new endpoint creation page (for example intfloat/e5-small-v2
), then select the sentence embeddings task under the advanced configuration section. Create the endpoint and copy the URL after the endpoint initialization has been finished.
The following models are recommended for the Hugging Face service:
all-MiniLM-L6-v2
all-MiniLM-L12-v2
all-mpnet-base-v2
e5-base-v2
e5-small-v2
multilingual-e5-base
multilingual-e5-small
Required authorization
- Cluster privileges:
manage_inference
Path parameters
-
The type of the inference task that the model will perform.
Value is
text_embedding
. -
The unique identifier of the inference endpoint.
PUT /_inference/{task_type}/{huggingface_inference_id}
Console
PUT _inference/text_embedding/hugging-face-embeddings { "service": "hugging_face", "service_settings": { "api_key": "hugging-face-access-token", "url": "url-endpoint" } }
resp = client.inference.put( task_type="text_embedding", inference_id="hugging-face-embeddings", inference_config={ "service": "hugging_face", "service_settings": { "api_key": "hugging-face-access-token", "url": "url-endpoint" } }, )
const response = await client.inference.put({ task_type: "text_embedding", inference_id: "hugging-face-embeddings", inference_config: { service: "hugging_face", service_settings: { api_key: "hugging-face-access-token", url: "url-endpoint", }, }, });
response = client.inference.put( task_type: "text_embedding", inference_id: "hugging-face-embeddings", body: { "service": "hugging_face", "service_settings": { "api_key": "hugging-face-access-token", "url": "url-endpoint" } } )
$resp = $client->inference()->put([ "task_type" => "text_embedding", "inference_id" => "hugging-face-embeddings", "body" => [ "service" => "hugging_face", "service_settings" => [ "api_key" => "hugging-face-access-token", "url" => "url-endpoint", ], ], ]);
curl -X PUT -H "Authorization: ApiKey $ELASTIC_API_KEY" -H "Content-Type: application/json" -d '{"service":"hugging_face","service_settings":{"api_key":"hugging-face-access-token","url":"url-endpoint"}}' "$ELASTICSEARCH_URL/_inference/text_embedding/hugging-face-embeddings"
client.inference().put(p -> p .inferenceId("hugging-face-embeddings") .taskType(TaskType.TextEmbedding) .inferenceConfig(i -> i .service("hugging_face") .serviceSettings(JsonData.fromJson("{\"api_key\":\"hugging-face-access-token\",\"url\":\"url-endpoint\"}")) ) );
Request example
Run `PUT _inference/text_embedding/hugging-face-embeddings` to create an inference endpoint that performs a `text_embedding` task type.
{ "service": "hugging_face", "service_settings": { "api_key": "hugging-face-access-token", "url": "url-endpoint" } }