- Notifications
You must be signed in to change notification settings - Fork 25.6k
Add retriever to the query rewrite phase #110482
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| Pinging @elastic/es-search (Team:Search) |
server/src/main/java/org/elasticsearch/search/builder/SearchSourceBuilder.java Show resolved Hide resolved
server/src/internalClusterTest/java/org/elasticsearch/search/retriever/RetrieverRewriteIT.java Outdated Show resolved Hide resolved
server/src/internalClusterTest/java/org/elasticsearch/search/retriever/RetrieverRewriteIT.java Show resolved Hide resolved
server/src/internalClusterTest/java/org/elasticsearch/search/retriever/RetrieverRewriteIT.java Show resolved Hide resolved
server/src/internalClusterTest/java/org/elasticsearch/search/retriever/RetrieverRewriteIT.java Show resolved Hide resolved
| if (request.source() != null && request.source().pointInTimeBuilder() != null) { | ||
| if (request.source() != null | ||
| && request.source().pointInTimeBuilder() != null | ||
| && request.source().pointInTimeBuilder().singleSession() == false) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it safe to rely on the keepAlive to determine explicit releasing in this case? Could this possible cause any side effects with any user provided requests with -1 as keep_alive ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just about returning the pit id in the response. The logic to release the pit automatically is restricted to the retriever builder case. It might be surprising for existing pit users though since -1 is a valid keep-alive value. We should be able to restrict this behavior to retrievers entirely.
| Thanks so much for this @jimczi ! Added some minor initial comments! |
server/src/main/java/org/elasticsearch/action/search/TransportSearchAction.java Show resolved Hide resolved
| @Override | ||
| public void onResponse(SearchResponse response) { | ||
| String query = response.getHits().getAt(0).field(QUERY_FIELD).getValue(); | ||
| StandardRetrieverBuilder standard = new StandardRetrieverBuilder(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume in the main code we want retrievers to be rewritten into some kind of ScoreDocQuery, but not to another retriever?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Retrievers will always rewrite to a retriever, that's enforced by the design. Using some kind of ScoreDocQuery will be internally done by the rewritten retriever when extractToSourceBuilder is called (when the retriever is guaranteed to be fully rewritten).
mayya-sharipova left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jimczi Thanks Jim, makes sense
javanna left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left some nits and suggestions around splitting this change up and making it a bit easier to follow.
server/src/main/java/org/elasticsearch/action/search/SearchRequest.java Outdated Show resolved Hide resolved
server/src/main/java/org/elasticsearch/search/retriever/RetrieverBuilder.java Show resolved Hide resolved
server/src/main/java/org/elasticsearch/action/search/TransportSearchAction.java Show resolved Hide resolved
server/src/main/java/org/elasticsearch/action/search/TransportSearchAction.java Show resolved Hide resolved
This change moves the handling of the retriever to the rewrite phase. It also adds validation of the search source builder after extracting the retriever into the source builder. Relates elastic#110482
This change moves the handling of the retriever to the rewrite phase. It also adds validation of the search source builder after extracting the retriever into the source builder. Relates #110482
| Hey @javanna! I'm picking up this PR, so whenever you have some time I'd appreciate some feedback after the latest changes :) |
server/src/main/java/org/elasticsearch/action/search/TransportSearchAction.java Show resolved Hide resolved
| @pmpailis I haven't forgotten about this. If you think its ready, I will go through a review. I will likely ask for more tests :) |
| Pinging @elastic/es-search-relevance (Team:Search Relevance) |
| Closing as a fix for this has been merged through #111153. |
| Please ignore the above! Wrong PR to close 😅 |
Hey @benwtrent ; added a take on this in 29266a7. The idea now is that if we specify a compound retriever (which would require opening a PIT) and explicitly define We haven't updated PIT's logic yet, so if any shards are missing we'd throw an exception, the message of which has also been slightly updated to not suggest to use |
| @elasticmachine update branch |
| @Override | ||
| protected void doCheckNoMissingShards( | ||
| String phaseName, | ||
| SearchRequest request, | ||
| GroupShardsIterator<SearchShardIterator> shardsIts | ||
| ) { | ||
| final StringBuilder missingShards = new StringBuilder(); | ||
| // Fail-fast verification of all shards being available | ||
| for (int index = 0; index < shardsIts.size(); index++) { | ||
| final SearchShardIterator shardRoutings = shardsIts.get(index); | ||
| if (shardRoutings.size() == 0) { | ||
| if (missingShards.isEmpty() == false) { | ||
| missingShards.append(", "); | ||
| } | ||
| missingShards.append(shardRoutings.shardId()); | ||
| } | ||
| } | ||
| if (missingShards.isEmpty() == false) { | ||
| // Status red - shard is missing all copies and would produce partial results for an index search | ||
| final String msg = "Cannot execute [open_point_in_time] action due to missing shards [" + missingShards + "]."; | ||
| throw new SearchPhaseExecutionException(phaseName, msg, null, ShardSearchFailure.EMPTY_ARRAY); | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
woah, I am unsure about this. This is a fairly big change in behavior for PIT isn't it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tbh I think that the change is not that big since we're overriding the (previously static)SearchPhase#doCheckNoMissingShards with just a couple of changes:
- we don't have the
if (request.allowPartialSearchResults() == false)check anymore as this is explicitly set by the PIT action (inTransportOpenPointInTimeAction#doExecute) - we have a more specific error message, w/o suggesting the
allow_partial_search_resultsas a solution anymore
The only reason for this addition was to update the error message, so if you think that this isn't needed right now (as this is something that we plan to revisit soon-ish) I can omit this from the PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pmpailis I would prefer this change be in a separate, focused PR. This way foundations, et. al. can focus their review on it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opened #111320
| public ActionRequestValidationException validate() { | ||
| ActionRequestValidationException validationException = null; | ||
| boolean scroll = scroll() != null; | ||
| boolean allowPartialSearchResults = allowPartialSearchResults() != null && allowPartialSearchResults(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❤️
benwtrent left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @pmpailis for going through all this.
Let's make sure we can support partial search results as soon as possible in the future :).
| @elasticmachine update branch |
| @elasticmachine update branch |
This change moves the handling of the retriever to the rewrite phase. It also adds validation of the search source builder after extracting the retriever into the source builder. Relates #110482
This change moves the handling of the retriever to the rewrite phase. It also adds validation of the search source builder after extracting the retriever into the source builder. Relates #110482
This change moves the extraction of retrievers into the search source builder during the coordinator rewrite phase.
By delaying the extraction, retrievers can apply asynchronous actions during the rewrite before executing the search request.
Additionally, a compound mode for retrievers is introduced, creating an internal point in time for the duration of the request. This allows retrievers to perform queries during the rewrite on the same reader that will be used for the main search request.
In a subsequent PR, this capability will be used to implement retrievers that combine multiple retrievers.
This PR consists of three commits. The last commit contains the main change to the search action, which involves creating and deleting a PIT before search execution if a compound retriever is present.