- Notifications
You must be signed in to change notification settings - Fork 25.6k
Introduce a filtered collector manager #96824
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce a filtered collector manager #96824
Conversation
| Pinging @elastic/es-search (Team:Search) |
| Hi @javanna, I've created a changelog YAML for you. |
| directory = newDirectory(); | ||
| RandomIndexWriter writer = new RandomIndexWriter(random(), directory, newIndexWriterConfig()); | ||
| numDocs = randomIntBetween(10, 100); | ||
| numDocs = randomIntBetween(900, 1000); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is to increase the likelihood of leveraging concurrency. It still does not happen super often, around 10% of the runs. Adding flushes would help but random index writer already does it and it also slows down the tests. I have opened 12369 to potentially address this in Lucene,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, based on several test runs I just did this increases the likelihood of running a multi-threaded collection to about 16%, should be okay for now but it would also be great if we find a way to force this somehow, especially since I'm not sure how reproducible this is even with the random seed. i.e. my current understanding is that some parts of how many segments are created (which later seems to influence how many slices the searcher has) seem to depend on timing issues as well. Maybe I'm wrong on this though and its somehow deterministic due to how RandomIndexWriter works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The creation of segments and slices should be fully reproducible with the seed, if it's now it sounds like a bug.
| run elasticsearch-ci/part-1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I think this is a good template for creating other managers as well. LGTM
| directory = newDirectory(); | ||
| RandomIndexWriter writer = new RandomIndexWriter(random(), directory, newIndexWriterConfig()); | ||
| numDocs = randomIntBetween(10, 100); | ||
| numDocs = randomIntBetween(900, 1000); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, based on several test runs I just did this increases the likelihood of running a multi-threaded collection to about 16%, should be okay for now but it would also be great if we find a way to force this somehow, especially since I'm not sure how reproducible this is even with the random seed. i.e. my current understanding is that some parts of how many segments are created (which later seems to influence how many slices the searcher has) seem to depend on timing issues as well. Maybe I'm wrong on this though and its somehow deterministic due to how RandomIndexWriter works.
In order to add support for inter-segment search concurrency, we need to implement collector managers for all of our custom collectors.
This PR introduces a collector manager that is based on
FilteredCollector, used when a post_filter is provided as part of a search request.Note that the collector manager is not yet integrated in the query phase.