- Notifications
You must be signed in to change notification settings - Fork 25.5k
ES|QL async queries: Partial result on demand #118122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
de1039b
to f7e1d9c
Compare ...k/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/action/RestEsqlStopAsyncAction.java Show resolved Hide resolved
...c/internalClusterTest/java/org/elasticsearch/xpack/esql/action/CrossClusterAsyncQueryIT.java Show resolved Hide resolved
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/action/EsqlExecutionInfo.java Outdated Show resolved Hide resolved
afddc5f
to 124e070
Compare 81f2e81
to 49c768e
Compare There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Next round feedback.
return new EsqlExecutionInfo.Cluster.Builder(v).setTook(tookTime).build(); | ||
EsqlExecutionInfo.Cluster.Status resultStatus = v.getStatus(); | ||
// If we got the result after execution has been marked as partial, we declare it partial. | ||
// This should only happen if we didn't get any results yet from local, otherwise it's not partial. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you clarify this comment? I think it's really important and want to make sure readers (including me) understand it.
For the first sentence, it would help give more context like:
If the ExecutionInfo has already been marked as partial (e.g., when the async-query-stop API is invoked), then all subsequent clusters finishing after that should be marked as PARTIAL since they may have stopped data processing early.
For the second sentence "This should only happen if we didn't get any results yet from local", I don't understand. Why is that true? Can you elaborate in the code comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically it has to do with whether the local cluster is marked as "partial". Currently, it is marked as partial only if the local data processing (i.e nodes) has not been finished at the time stop arrived - otherwise, local will always be partial on partial requests, even if all local data has been actually processed, and that sounds wrong. I am not sure how to describe it better, let's discuss more if you have any suggestions.
FROM employees, *:employees | ||
| SORT emp_id ASC | ||
| LIMIT 10 | ||
| WHERE delay(10ms) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As Nhat said, I think this is definitely going to fail on heavily slow CI servers.
So we should either delete this test or allow the possibility that the is_running will be false
when first queried below and allow that as an acceptable outcome.
…al-result-on-demand
…al-result-on-demand
assertNotNull(executionInfo); | ||
assertThat(executionInfo.isPartial(), is(true)); | ||
} | ||
List<TaskInfo> tasks = getDriverTasks(client(REMOTE_CLUSTER_2)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dnhatn This part doesn't make sense to me - the purpose of the check has been to see if the "stop" operation has started, not that the cluster is finished, as in previous ones. Basically it literally waits until "partial" flag is set, which is supposed to happen when stop command is received, right before finishEarly
is called in asyncListener.markAsPartial();
. We are waiting for it to ensure local doesn't have the chance to finish before stop flag is set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
💔 Backport failed
You can use sqren/backport to manually backport by running |
💚 All backports created successfully
Questions ?Please refer to the Backport tool documentation |
* ES|QL async queries: Partial result on demand (#118122) Add capability to stop async query on demand The theory: - User initiates async search request - User sends the stop request (POST _query/async/<ID>/stop) - If the async is finished by that time, it's like regular async get - If it's not finished, the sinks are closed and the request is forcefully finished (cherry picked from commit f27f746) # Conflicts: # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/action/EsqlQueryResponse.java # x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/action/EsqlQueryResponseTests.java # x-pack/plugin/security/qa/multi-cluster/src/javaRestTest/java/org/elasticsearch/xpack/remotecluster/CrossClusterEsqlRCS1UnavailableRemotesIT.java # x-pack/plugin/security/qa/multi-cluster/src/javaRestTest/java/org/elasticsearch/xpack/remotecluster/CrossClusterEsqlRCS2UnavailableRemotesIT.java * fix tests * [CI] Auto commit changes from spotless --------- Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
This implements a new "async stop" endpoint for ES|QL, which allows the user to interrupt the request and demand the delivery of results immediately.
The theory:
_query/async/<ID>/stop
)Note that this patch does not interrupt page processing (covered in #118188), which means it can take at least one page processing time to react. It also does not shut down all the operators, just the data sources, so right now the time it may take to stop the request is theoretically unbound. This will be addressed in the followup patches. On shapshot builds, this can be changed using
page_size
pragma.