Skip to content

Conversation

@javanna
Copy link
Member

@javanna javanna commented Jun 14, 2023

In order to add support for inter-segment search concurrency, we need to implement collector managers for all of our custom collectors.

This PR introduces a collector manager that is based on FilteredCollector, used when a post_filter is provided as part of a search request.

Note that the collector manager is not yet integrated in the query phase.

@javanna javanna added >enhancement :Search/Search Search-related issues that do not fall into other categories v8.9.0 labels Jun 14, 2023
@javanna javanna requested a review from cbuescher June 14, 2023 09:55
@elasticsearchmachine elasticsearchmachine added the Team:Search Meta label for search team label Jun 14, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@elasticsearchmachine
Copy link
Collaborator

Hi @javanna, I've created a changelog YAML for you.

directory = newDirectory();
RandomIndexWriter writer = new RandomIndexWriter(random(), directory, newIndexWriterConfig());
numDocs = randomIntBetween(10, 100);
numDocs = randomIntBetween(900, 1000);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to increase the likelihood of leveraging concurrency. It still does not happen super often, around 10% of the runs. Adding flushes would help but random index writer already does it and it also slows down the tests. I have opened 12369 to potentially address this in Lucene,

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, based on several test runs I just did this increases the likelihood of running a multi-threaded collection to about 16%, should be okay for now but it would also be great if we find a way to force this somehow, especially since I'm not sure how reproducible this is even with the random seed. i.e. my current understanding is that some parts of how many segments are created (which later seems to influence how many slices the searcher has) seem to depend on timing issues as well. Maybe I'm wrong on this though and its somehow deterministic due to how RandomIndexWriter works.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The creation of segments and slices should be fully reproducible with the seed, if it's now it sounds like a bug.

@javanna
Copy link
Member Author

javanna commented Jun 14, 2023

run elasticsearch-ci/part-1

@javanna javanna requested a review from iverase June 14, 2023 13:20
Copy link
Member

@cbuescher cbuescher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I think this is a good template for creating other managers as well. LGTM

directory = newDirectory();
RandomIndexWriter writer = new RandomIndexWriter(random(), directory, newIndexWriterConfig());
numDocs = randomIntBetween(10, 100);
numDocs = randomIntBetween(900, 1000);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, based on several test runs I just did this increases the likelihood of running a multi-threaded collection to about 16%, should be okay for now but it would also be great if we find a way to force this somehow, especially since I'm not sure how reproducible this is even with the random seed. i.e. my current understanding is that some parts of how many segments are created (which later seems to influence how many slices the searcher has) seem to depend on timing issues as well. Maybe I'm wrong on this though and its somehow deterministic due to how RandomIndexWriter works.

@javanna javanna merged commit d98b9cb into elastic:main Jun 14, 2023
@javanna javanna deleted the enhancement/filtered_collector_manager branch June 14, 2023 13:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>enhancement :Search/Search Search-related issues that do not fall into other categories Team:Search Meta label for search team v8.9.0

3 participants