[8.19] [ML] Fix double-counting of inference memory in the assignment rebalancer (#133919) #134054

valeriy42 · 2025-09-03T12:46:33Z

Backports the following commits to 8.19:

[ML] Fix double-counting of inference memory in the assignment rebalancer ([ML] Fix double-counting of inference memory in the assignment rebalancer #133919)

…ncer (elastic#133919) The static method TrainedModelAssignmentRebalancer.getNodeFreeMemoryExcludingPerNodeOverheadAndNativeInference was used to subtract load.getAssignedNativeInferenceMemory() from load.getFreeMemoryExcludingPerNodeOverhead(). However, in NodeLoad.getFreeMemoryExcludingPerNodeOverhead(), native inference memory was already subtracted as part of the getAssignedJobMemoryExcludingPerNodeOverhead() calculation. This led to double-counting of the native inference memory. Avoiding this double-counting allows us to remove the private method getNodeFreeMemoryExcludingPerNodeOverheadAndNativeInference() entirely.

…ncer (elastic#133919) (elastic#134054) The static method TrainedModelAssignmentRebalancer.getNodeFreeMemoryExcludingPerNodeOverheadAndNativeInference was used to subtract load.getAssignedNativeInferenceMemory() from load.getFreeMemoryExcludingPerNodeOverhead(). However, in NodeLoad.getFreeMemoryExcludingPerNodeOverhead(), native inference memory was already subtracted as part of the getAssignedJobMemoryExcludingPerNodeOverhead() calculation. This led to double-counting of the native inference memory. Avoiding this double-counting allows us to remove the private method getNodeFreeMemoryExcludingPerNodeOverheadAndNativeInference() entirely.

valeriy42 added :ml Machine learning >bug auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) backport Team:ML Meta label for the ML team labels Sep 3, 2025

elasticsearchmachine mentioned this pull request Sep 3, 2025

[ML] Fix double-counting of inference memory in the assignment rebalancer #133919

Merged

elasticsearchmachine added the v8.19.4 label Sep 3, 2025

valeriy42 added 2 commits September 8, 2025 11:03

Merge branch '8.19' into backport/8.19/pr-133919

09a3b32

Merge branch '8.19' into backport/8.19/pr-133919

fd299d4

elasticsearchmachine merged commit 8ebc6ae into elastic:8.19 Sep 8, 2025
22 checks passed

valeriy42 deleted the backport/8.19/pr-133919 branch September 8, 2025 14:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[8.19] [ML] Fix double-counting of inference memory in the assignment rebalancer (#133919) #134054

[8.19] [ML] Fix double-counting of inference memory in the assignment rebalancer (#133919) #134054

Uh oh!

valeriy42 commented Sep 3, 2025

Uh oh!

Labels

2 participants

[8.19] [ML] Fix double-counting of inference memory in the assignment rebalancer (#133919) #134054

[8.19] [ML] Fix double-counting of inference memory in the assignment rebalancer (#133919) #134054

Uh oh!

Conversation

valeriy42 commented Sep 3, 2025

Uh oh!

Labels

2 participants