Introduce a filtered collector manager #96824

javanna · 2023-06-14T09:55:55Z

In order to add support for inter-segment search concurrency, we need to implement collector managers for all of our custom collectors.

This PR introduces a collector manager that is based on FilteredCollector, used when a post_filter is provided as part of a search request.

Note that the collector manager is not yet integrated in the query phase.

elasticsearchmachine · 2023-06-14T09:56:19Z

Pinging @elastic/es-search (Team:Search)

elasticsearchmachine · 2023-06-14T09:56:19Z

Hi @javanna, I've created a changelog YAML for you.

javanna · 2023-06-14T10:15:13Z

server/src/test/java/org/elasticsearch/common/lucene/search/FilteredCollectorTests.java

 directory = newDirectory();
 RandomIndexWriter writer = new RandomIndexWriter(random(), directory, newIndexWriterConfig());
- numDocs = randomIntBetween(10, 100);
+ numDocs = randomIntBetween(900, 1000);


This is to increase the likelihood of leveraging concurrency. It still does not happen super often, around 10% of the runs. Adding flushes would help but random index writer already does it and it also slows down the tests. I have opened 12369 to potentially address this in Lucene,

Thanks, based on several test runs I just did this increases the likelihood of running a multi-threaded collection to about 16%, should be okay for now but it would also be great if we find a way to force this somehow, especially since I'm not sure how reproducible this is even with the random seed. i.e. my current understanding is that some parts of how many segments are created (which later seems to influence how many slices the searcher has) seem to depend on timing issues as well. Maybe I'm wrong on this though and its somehow deterministic due to how RandomIndexWriter works.

The creation of segments and slices should be fully reproducible with the seed, if it's now it sounds like a bug.

javanna · 2023-06-14T12:18:37Z

run elasticsearch-ci/part-1

cbuescher

Thanks, I think this is a good template for creating other managers as well. LGTM

cbuescher · 2023-06-14T13:42:13Z

server/src/test/java/org/elasticsearch/common/lucene/search/FilteredCollectorTests.java

 directory = newDirectory();
 RandomIndexWriter writer = new RandomIndexWriter(random(), directory, newIndexWriterConfig());
- numDocs = randomIntBetween(10, 100);
+ numDocs = randomIntBetween(900, 1000);


Thanks, based on several test runs I just did this increases the likelihood of running a multi-threaded collection to about 16%, should be okay for now but it would also be great if we find a way to force this somehow, especially since I'm not sure how reproducible this is even with the random seed. i.e. my current understanding is that some parts of how many segments are created (which later seems to influence how many slices the searcher has) seem to depend on timing issues as well. Maybe I'm wrong on this though and its somehow deterministic due to how RandomIndexWriter works.

javanna added 4 commits June 14, 2023 10:40

filtered collector manager

37bbb05

Merge branch 'main' into enhancement/filtered_collector_manager

1ec0520

iter

38d9aa4

iter

bf8ecdc

javanna added >enhancement :Search/Search Search-related issues that do not fall into other categories v8.9.0 labels Jun 14, 2023

javanna requested a review from cbuescher June 14, 2023 09:55

elasticsearchmachine added the Team:Search Meta label for search team label Jun 14, 2023

javanna and others added 2 commits June 14, 2023 11:56

Update docs/changelog/96824.yaml

c134af6

spotless

b65c2e7

javanna commented Jun 14, 2023

View reviewed changes

javanna requested a review from iverase June 14, 2023 13:20

cbuescher approved these changes Jun 14, 2023

View reviewed changes

javanna merged commit d98b9cb into elastic:main Jun 14, 2023

javanna deleted the enhancement/filtered_collector_manager branch June 14, 2023 13:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Introduce a filtered collector manager #96824

Introduce a filtered collector manager #96824

Uh oh!

javanna commented Jun 14, 2023

elasticsearchmachine commented Jun 14, 2023

elasticsearchmachine commented Jun 14, 2023

javanna Jun 14, 2023

cbuescher Jun 14, 2023

javanna Jun 14, 2023

javanna commented Jun 14, 2023

cbuescher left a comment

cbuescher Jun 14, 2023

Labels

3 participants

Introduce a filtered collector manager #96824

Introduce a filtered collector manager #96824

Uh oh!

Conversation

javanna commented Jun 14, 2023

elasticsearchmachine commented Jun 14, 2023

elasticsearchmachine commented Jun 14, 2023

javanna Jun 14, 2023

Choose a reason for hiding this comment

cbuescher Jun 14, 2023

Choose a reason for hiding this comment

javanna Jun 14, 2023

Choose a reason for hiding this comment

javanna commented Jun 14, 2023

cbuescher left a comment

Choose a reason for hiding this comment

cbuescher Jun 14, 2023

Choose a reason for hiding this comment

Labels

3 participants