Skip to content

Conversation

smalyshev
Copy link
Contributor

@smalyshev smalyshev commented Dec 6, 2024

This implements a new "async stop" endpoint for ES|QL, which allows the user to interrupt the request and demand the delivery of results immediately.

The theory:

  1. User initiates async search request
  2. User sends the stop request (POST _query/async/<ID>/stop)
  3. If the async is finished by that time, it's like regular async get
  4. If it's not finished, the sinks are closed and the request is forcefully finished

Note that this patch does not interrupt page processing (covered in #118188), which means it can take at least one page processing time to react. It also does not shut down all the operators, just the data sources, so right now the time it may take to stop the request is theoretically unbound. This will be addressed in the followup patches. On shapshot builds, this can be changed using page_size pragma.

@smalyshev smalyshev force-pushed the partial-result-on-demand branch from de1039b to f7e1d9c Compare December 6, 2024 18:26
@smalyshev smalyshev force-pushed the partial-result-on-demand branch from afddc5f to 124e070 Compare December 6, 2024 20:37
@smalyshev smalyshev force-pushed the partial-result-on-demand branch from 81f2e81 to 49c768e Compare December 13, 2024 00:57
@smalyshev smalyshev changed the title Partial result on demand WIP: Partial result on demand Dec 13, 2024
@smalyshev smalyshev requested a review from nik9000 December 13, 2024 19:03
Copy link
Contributor

@quux00 quux00 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Next round feedback.

return new EsqlExecutionInfo.Cluster.Builder(v).setTook(tookTime).build();
EsqlExecutionInfo.Cluster.Status resultStatus = v.getStatus();
// If we got the result after execution has been marked as partial, we declare it partial.
// This should only happen if we didn't get any results yet from local, otherwise it's not partial.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you clarify this comment? I think it's really important and want to make sure readers (including me) understand it.

For the first sentence, it would help give more context like:

If the ExecutionInfo has already been marked as partial (e.g., when the async-query-stop API is invoked), then all subsequent clusters finishing after that should be marked as PARTIAL since they may have stopped data processing early.

For the second sentence "This should only happen if we didn't get any results yet from local", I don't understand. Why is that true? Can you elaborate in the code comment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically it has to do with whether the local cluster is marked as "partial". Currently, it is marked as partial only if the local data processing (i.e nodes) has not been finished at the time stop arrived - otherwise, local will always be partial on partial requests, even if all local data has been actually processed, and that sounds wrong. I am not sure how to describe it better, let's discuss more if you have any suggestions.

FROM employees, *:employees
| SORT emp_id ASC
| LIMIT 10
| WHERE delay(10ms)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As Nhat said, I think this is definitely going to fail on heavily slow CI servers.

So we should either delete this test or allow the possibility that the is_running will be false when first queried below and allow that as an acceptable outcome.

assertNotNull(executionInfo);
assertThat(executionInfo.isPartial(), is(true));
}
List<TaskInfo> tasks = getDriverTasks(client(REMOTE_CLUSTER_2));
Copy link
Contributor Author

@smalyshev smalyshev Jan 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dnhatn This part doesn't make sense to me - the purpose of the check has been to see if the "stop" operation has started, not that the cluster is finished, as in previous ones. Basically it literally waits until "partial" flag is set, which is supposed to happen when stop command is received, right before finishEarly is called in asyncListener.markAsPartial();. We are waiting for it to ensure local doesn't have the chance to finish before stop flag is set.

@smalyshev smalyshev requested a review from quux00 January 23, 2025 14:50
Copy link
Contributor

@quux00 quux00 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@smalyshev smalyshev merged commit f27f746 into elastic:main Jan 23, 2025
16 checks passed
@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

Status Branch Result
8.x Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 118122

@smalyshev
Copy link
Contributor Author

💚 All backports created successfully

Status Branch Result
8.x

Questions ?

Please refer to the Backport tool documentation

elasticsearchmachine pushed a commit that referenced this pull request Jan 23, 2025
* ES|QL async queries: Partial result on demand (#118122) Add capability to stop async query on demand The theory: - User initiates async search request - User sends the stop request (POST _query/async/<ID>/stop) - If the async is finished by that time, it's like regular async get - If it's not finished, the sinks are closed and the request is forcefully finished (cherry picked from commit f27f746) # Conflicts: #	x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/action/EsqlQueryResponse.java #	x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/action/EsqlQueryResponseTests.java #	x-pack/plugin/security/qa/multi-cluster/src/javaRestTest/java/org/elasticsearch/xpack/remotecluster/CrossClusterEsqlRCS1UnavailableRemotesIT.java #	x-pack/plugin/security/qa/multi-cluster/src/javaRestTest/java/org/elasticsearch/xpack/remotecluster/CrossClusterEsqlRCS2UnavailableRemotesIT.java * fix tests * [CI] Auto commit changes from spotless --------- Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/ES|QL AKA ESQL auto-backport Automatically create backport pull requests when merged >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.18.0 v9.0.0

5 participants