[9.1] (backport #9992) fix: always clear the coordinator overridden state on err inside upgrade of coordinator #9996
Add this suggestion to a batch that can be applied as a single commit. This suggestion is invalid because no changes were made to the code. Suggestions cannot be applied while the pull request is closed. Suggestions cannot be applied while viewing a subset of changes. Only one suggestion per line can be applied in a batch. Add this suggestion to a batch that can be applied as a single commit. Applying suggestions on deleted lines is not supported. You must change the existing code in this line in order to create a valid suggestion. Outdated suggestions cannot be applied. This suggestion has been applied or marked resolved. Suggestions cannot be applied from pending reviews. Suggestions cannot be applied on multi-line comments. Suggestions cannot be applied while the pull request is queued to merge. Suggestion cannot be applied right now. Please check back later.




What does this PR do?
This PR fixes a regression introduced by #9634 where the coordinator’s
overrideStatecould remain set if an upgrade attempt failed early (e.g. agent not upgradeable, capability check denied, or pre-upgrade callback returned an error).Specifically, this PR:
ClearOverrideState()before returning from all early failure branches insideCoordinator.Upgrade.overrideStateis reset tonilafter a failingpreUpgradeCallback, preventing stale state from leaking into subsequent upgrade attempts.Why is it important?
Without this change, failed upgrades could leave the coordinator in a state that incorrectly reflects an ongoing upgrade. This blocks future upgrade attempts until the Elastic Agent is restarted, which is disruptive and operationally undesirable.
By clearing the override state on failure, we ensure the coordinator always returns to a clean state, enabling subsequent upgrade attempts to proceed without requiring a restart.
Checklist
./changelog/fragmentsusing the changelog toolDisruptive User Impact
Previously, users would need to manually restart the Elastic Agent after a failed upgrade attempt in order to retry an upgrade.
With this fix, the agent automatically clears the override state, removing the need for manual intervention.
How to test this PR locally
Run
mage unitTestand confirm that all tests pass, including the updated coordinator tests.Related issues
This is an automatic backport of pull request #9992 done by [Mergify](https://mergify.com).