Skip to content

Conversation

@original-brownbear
Copy link
Contributor

The transport action for freeing contexts is forking to GENERIC nowadays and executed in a fire-and-forget manner in production code. When it was running on the direct executor, no busy wait was needed for a single node since the freeing was fully synchronous inside the searches.
With forking, the waiting on free tasks does not help because the task is created inside the forked execution for a local node execution, so we need to busy wait like we do for multi-node clusters here too. -> dry up the code and use the same method for both.

The transport action for freeing contexts is forking to GENERIC nowadays and executed in a fire-and-forget manner in production code. When it was running on the direct executor, no busy wait was needed for a single node since the freeing was fully synchronous inside the searches. With forking, the waiting on free tasks does not help because the task is created inside the forked execution for a local node execution, so we need to busy wait like we do for multi-node clusters here too. -> dry up the code and use the same method for both.
@original-brownbear original-brownbear added >test Issues or PRs that are addressing/adding tests :Search/Search Search-related issues that do not fall into other categories labels Jul 22, 2024
@elasticsearchmachine elasticsearchmachine added v8.16.0 Team:Search Meta label for search team labels Jul 22, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

SearchService searchService = getInstanceFromNode(SearchService.class);
assertThat(searchService.getActiveContexts(), equalTo(0));
assertThat(searchService.getOpenScrollContexts(), equalTo(0));
ensureAllContextsReleased(getInstanceFromNode(SearchService.class));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know people argued this would increase test runtimes, but not really in my opinion. This is failing at a low rate and we only really add a delay to those tests where the assertions fails, so not a big deal IMO.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW I was one of those people, and I still don't like this change. Fire-and-forget actions on the GENERIC pool is a bad code smell IMO, it's going to cause us problems at scale at some point. I'm not going to ask for rework here, just recording that opinion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The scale problem started worrying me as well, opened #111156 right after :D

Copy link
Contributor

@pmpailis pmpailis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks @original-brownbear ! Should we also backport to 8.15? (I can handle the unmuting of tests once we verify)

@original-brownbear
Copy link
Contributor Author

Thanks Panos! ++ to back porting, added the label, thanks for remembering that!

@original-brownbear original-brownbear merged commit 86544aa into elastic:main Jul 22, 2024
@original-brownbear original-brownbear deleted the ensure-contexts-released branch July 22, 2024 12:15
@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

Status Branch Result
8.15 Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 111153

salvatore-campagna pushed a commit to salvatore-campagna/elasticsearch that referenced this pull request Jul 23, 2024
…astic#111153) The transport action for freeing contexts is forking to GENERIC nowadays and executed in a fire-and-forget manner in production code. When it was running on the direct executor, no busy wait was needed for a single node since the freeing was fully synchronous inside the searches. With forking, the waiting on free tasks does not help because the task is created inside the forked execution for a local node execution, so we need to busy wait like we do for multi-node clusters here too. -> dry up the code and use the same method for both.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Search/Search Search-related issues that do not fall into other categories Team:Search Meta label for search team >test Issues or PRs that are addressing/adding tests v8.15.1 v8.16.0

4 participants