Skip to content

Conversation

benchaplin
Copy link
Contributor

@benchaplin benchaplin commented Sep 8, 2025

The 120_batch_reduce_size.yml test was skipped with a TODO when batched query execution was introduced in #121885. A key piece of the test was an assertion on the num_reduce_phases in the search response - the assertion was removed as num_reduce_phases now depends on how shards are laid out in the cluster, which may change across test runs.

To know how many reductions occurred, we need to understand the shard layout:

  • how many shards are on the coordinating node (these won't be batched)?
  • how many shards are on each data node (these will be batched)?

We can do this in an IT that captures the batched transport requests then derives the layout from them. I've added this IT, and removed the skip on the YAML test. I'd hear an argument for removing the YAML test completely, but it still covers some validation on the batched_reduce_size query parameter so I'm leaning towards keeping it.

@benchaplin benchaplin added >test Issues or PRs that are addressing/adding tests Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch :Search Foundations/Search Catch all for Search Foundations labels Sep 8, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-foundations (Team:Search Foundations)

return searchRequest.indicesOptions();
}

public List<ShardToQuery> shards() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider limiting this to a package-protected scope or exposing only shard.size(), as the test doesn’t need the complete list.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++ another way to do this would be to inspect the indices service in the test itself, and extract the info about the how many nodes and how many shards per node from there, as opposed to from the request.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another another way to go about this test is to run it in a more controlled scenario, along the same lines as Dimi suggested above: decide upfront how many shards and nodes, and make the execution more predictable that way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call, thanks @drempapis - reduced to package-private and just the size.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And thanks @javanna, I agree this is cleaner without the request intercepting stuff. I've reworked the test a bit to deduce how many shards aren't batched from cluster state.

@@ -1,7 +1,4 @@
setup:
- skip:
awaits_fix: "TODO fix this test, the response with batched execution is not deterministic enough for the available matchers"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we just get away with unskipping this ? I was under the impression that it's going to fail. Or does it run in a controlled scenario where the result is predictable?

Copy link
Contributor Author

@benchaplin benchaplin Sep 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you take a look at #121885 the line that would make this test fail was removed. Now it essentially just tests batched_reduce_size validation. I'm not sure if it's worth keeping, what do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, thanks. I'd keep it. Can we have some simpler check on num_reduce_phases, like greather than some threshold that's easier to predict?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't think we can assume anything about num_reduce_phases now - if it's 1, it's left out of the response entirely. And it can be 1 if all shards are batched.

Copy link
Member

@javanna javanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice work, thanks!

@benchaplin benchaplin merged commit 81d63bc into elastic:main Sep 17, 2025
34 checks passed
mridula-s109 pushed a commit to mridula-s109/elasticsearch that referenced this pull request Sep 17, 2025
szybia added a commit to szybia/elasticsearch that referenced this pull request Sep 17, 2025
* upstream/main: Add additional logging to make spotting stats issues easier (elastic#133972) [ESQL] Clean up ESQL enrich landing page (elastic#134820) ES|QL: Make kibana docs for Query settings more consistent (elastic#134881) Add file extension metadata to cache miss counter from SharedBlobCacheService (elastic#134374) Add IT for num_reduced_phases with batched query execution (elastic#134312) Remove `SizeValue` (elastic#134871)
gmjehovich pushed a commit to gmjehovich/elasticsearch that referenced this pull request Sep 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Search Foundations/Search Catch all for Search Foundations Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch >test Issues or PRs that are addressing/adding tests v9.2.0

4 participants