Skip to content

Conversation

@jimczi
Copy link
Contributor

@jimczi jimczi commented Jul 4, 2024

This change moves the extraction of retrievers into the search source builder during the coordinator rewrite phase.
By delaying the extraction, retrievers can apply asynchronous actions during the rewrite before executing the search request.

Additionally, a compound mode for retrievers is introduced, creating an internal point in time for the duration of the request. This allows retrievers to perform queries during the rewrite on the same reader that will be used for the main search request.
In a subsequent PR, this capability will be used to implement retrievers that combine multiple retrievers.

This PR consists of three commits. The last commit contains the main change to the search action, which involves creating and deleting a PIT before search execution if a compound retriever is present.

@jimczi jimczi added >enhancement :Search/Search Search-related issues that do not fall into other categories v8.16.0 labels Jul 4, 2024
@jimczi jimczi requested review from jdconrad and pmpailis July 4, 2024 14:18
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@elasticsearchmachine elasticsearchmachine added the Team:Search Meta label for search team label Jul 4, 2024
if (request.source() != null && request.source().pointInTimeBuilder() != null) {
if (request.source() != null
&& request.source().pointInTimeBuilder() != null
&& request.source().pointInTimeBuilder().singleSession() == false) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it safe to rely on the keepAlive to determine explicit releasing in this case? Could this possible cause any side effects with any user provided requests with -1 as keep_alive ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just about returning the pit id in the response. The logic to release the pit automatically is restricted to the retriever builder case. It might be surprising for existing pit users though since -1 is a valid keep-alive value. We should be able to restrict this behavior to retrievers entirely.

@pmpailis
Copy link
Contributor

pmpailis commented Jul 8, 2024

Thanks so much for this @jimczi ! Added some minor initial comments!

@Override
public void onResponse(SearchResponse response) {
String query = response.getHits().getAt(0).field(QUERY_FIELD).getValue();
StandardRetrieverBuilder standard = new StandardRetrieverBuilder();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume in the main code we want retrievers to be rewritten into some kind of ScoreDocQuery, but not to another retriever?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Retrievers will always rewrite to a retriever, that's enforced by the design. Using some kind of ScoreDocQuery will be internally done by the rewritten retriever when extractToSourceBuilder is called (when the retriever is guaranteed to be fully rewritten).

Copy link
Contributor

@mayya-sharipova mayya-sharipova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jimczi Thanks Jim, makes sense

Copy link
Member

@javanna javanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some nits and suggestions around splitting this change up and making it a bit easier to follow.

jimczi added a commit to jimczi/elasticsearch that referenced this pull request Jul 9, 2024
This change moves the handling of the retriever to the rewrite phase. It also adds validation of the search source builder after extracting the retriever into the source builder. Relates elastic#110482
jimczi added a commit that referenced this pull request Jul 10, 2024
This change moves the handling of the retriever to the rewrite phase. It also adds validation of the search source builder after extracting the retriever into the source builder. Relates #110482
@pmpailis
Copy link
Contributor

Hey @javanna! I'm picking up this PR, so whenever you have some time I'd appreciate some feedback after the latest changes :)

@benwtrent benwtrent self-requested a review July 17, 2024 14:51
@benwtrent
Copy link
Member

@pmpailis I haven't forgotten about this. If you think its ready, I will go through a review. I will likely ask for more tests :)

@javanna javanna added :Search Relevance/Search Catch all for Search Relevance and removed :Search/Search Search-related issues that do not fall into other categories labels Jul 17, 2024
@elasticsearchmachine elasticsearchmachine added Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch and removed Team:Search Meta label for search team labels Jul 17, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-relevance (Team:Search Relevance)

@pmpailis
Copy link
Contributor

Closing as a fix for this has been merged through #111153.

@pmpailis pmpailis closed this Jul 23, 2024
@pmpailis pmpailis reopened this Jul 23, 2024
@pmpailis
Copy link
Contributor

pmpailis commented Jul 23, 2024

Please ignore the above! Wrong PR to close 😅

@pmpailis
Copy link
Contributor

Another thing to consider is what if somebody has set "allow_partial_search_results". Doesn't the PIT opening fail on a missing shard, thus rendering that option useless?

Hey @benwtrent ; added a take on this in 29266a7.

The idea now is that if we specify a compound retriever (which would require opening a PIT) and explicitly define allow_partial_search_results , then we'd throw a validation error when parsing the search request. This is a bit harsh now, and we could be more lenient, as all shards might be present, but it makes it clear for the user that this option is not supported yet.

We haven't updated PIT's logic yet, so if any shards are missing we'd throw an exception, the message of which has also been slightly updated to not suggest to use allow_partial_search_results which is not respected yet by PIT (this could be a follow up work as discussed).

@pmpailis
Copy link
Contributor

@elasticmachine update branch

Comment on lines 216 to 238
@Override
protected void doCheckNoMissingShards(
String phaseName,
SearchRequest request,
GroupShardsIterator<SearchShardIterator> shardsIts
) {
final StringBuilder missingShards = new StringBuilder();
// Fail-fast verification of all shards being available
for (int index = 0; index < shardsIts.size(); index++) {
final SearchShardIterator shardRoutings = shardsIts.get(index);
if (shardRoutings.size() == 0) {
if (missingShards.isEmpty() == false) {
missingShards.append(", ");
}
missingShards.append(shardRoutings.shardId());
}
}
if (missingShards.isEmpty() == false) {
// Status red - shard is missing all copies and would produce partial results for an index search
final String msg = "Cannot execute [open_point_in_time] action due to missing shards [" + missingShards + "].";
throw new SearchPhaseExecutionException(phaseName, msg, null, ShardSearchFailure.EMPTY_ARRAY);
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

woah, I am unsure about this. This is a fairly big change in behavior for PIT isn't it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tbh I think that the change is not that big since we're overriding the (previously static)SearchPhase#doCheckNoMissingShards with just a couple of changes:

  • we don't have the if (request.allowPartialSearchResults() == false) check anymore as this is explicitly set by the PIT action (in TransportOpenPointInTimeAction#doExecute)
  • we have a more specific error message, w/o suggesting the allow_partial_search_results as a solution anymore

The only reason for this addition was to update the error message, so if you think that this isn't needed right now (as this is something that we plan to revisit soon-ish) I can omit this from the PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pmpailis I would prefer this change be in a separate, focused PR. This way foundations, et. al. can focus their review on it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opened #111320

public ActionRequestValidationException validate() {
ActionRequestValidationException validationException = null;
boolean scroll = scroll() != null;
boolean allowPartialSearchResults = allowPartialSearchResults() != null && allowPartialSearchResults();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

Copy link
Member

@benwtrent benwtrent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @pmpailis for going through all this.

Let's make sure we can support partial search results as soon as possible in the future :).

@pmpailis
Copy link
Contributor

@elasticmachine update branch

@pmpailis
Copy link
Contributor

@elasticmachine update branch

@pmpailis pmpailis merged commit 3420354 into elastic:main Jul 30, 2024
tvernum pushed a commit that referenced this pull request Feb 25, 2025
This change moves the handling of the retriever to the rewrite phase. It also adds validation of the search source builder after extracting the retriever into the source builder. Relates #110482
tvernum pushed a commit that referenced this pull request Feb 25, 2025
This change moves the handling of the retriever to the rewrite phase. It also adds validation of the search source builder after extracting the retriever into the source builder. Relates #110482
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>enhancement :Search Relevance/Search Catch all for Search Relevance Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch v8.16.0

7 participants