Skip to content

Conversation

kosabogi
Copy link
Contributor

@kosabogi kosabogi commented Oct 13, 2025

This PR adds the new long_document_strategy and max_chunks_per_doc parameters to the service_settings object of the Create an Elasticsearch inference endpoint documentation.

It also updates the description of the chunking_settings object to clarify that this setting is only applicable for the sparse_embeddings and text_embeddings task types.

Related issue: #5451

Copy link
Contributor

github-actions bot commented Oct 13, 2025

Following you can find the validation changes against the target branch for the APIs.

API Status Request Response
index 🟢 1445/1445 → 1443/1443 1447/1447 → 1445/1445
indices.create 🔴 1378/1402 → 1385/1409 1402/1402 → 1409/1409
indices.refresh 🟢 329/329 → 327/327 329/329 → 327/327
ml.get_job_stats 🟢 30/30 → 29/29 30/30 → 29/29
ml.put_job 🟢 65/65 → 64/64 65/65 → 64/64

You can validate these APIs yourself by using the make validate target.

@pquentin pquentin changed the title [9.2] Adds new parameters to the elasticsearch inference API for the rerank task type Adds new parameters to the elasticsearch inference API for the rerank task type Oct 14, 2025
*/
num_threads: integer
/**
* Only for the `rerank` task type.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A quick clarification. For 9.2, these two values are only configurable for rerank endpoints using the elastic reranker model.

body: {
/**
* The chunking configuration object.
* The chunking configuration object. For the `rerank` task type, you can enable chunking by setting the `long_document_strategy` parameter to `chunk` in the `service_settings` object.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if we need to be more specific about this anywhere but for this new method of chunking the user can not set chunking_settings the way that they would for embeddings. We handle building the chunking settings for them. If we want to clarify how we build the chunking settings somewhere we can.

*
* Possible values:
* - `truncate` (default): Processes only the beginning of each document.
* - `chunk`: Splits long documents into smaller parts (chunks) before inference.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure where it's best to clarify this but with chunking enabled we will return to the user a single score per document (same as we do for truncating) with the score correlating to the highest score of any chunk. I just want to make it clear that the structure of the response to the user will not change, only the rerank relevance scores.

@kosabogi kosabogi requested a review from davidkyle October 16, 2025 13:28
Copy link
Member

@davidkyle davidkyle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment