Skip to content

Conversation

@rubenruizdegauna
Copy link
Member

What does this PR do?

This PR adds the check of the component's unit's state on liveness endpoint. If a component state is healthy, but a unit state is degraded or failed, the liveness endpoint will return a 500.

Checklist

  • I have read and understood the pull request guidelines of this project.
  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in ./changelog/fragments using the changelog tool
  • I have added an integration test or an E2E test

Disruptive User Impact

Liveness probes will now fail if a component state is healthy but any of the units is failed or degraded, likely causing the container to be restarted (see https://kubernetes.io/docs/concepts/configuration/liveness-readiness-startup-probes/#liveness-probe).

Related issues

@rubenruizdegauna rubenruizdegauna requested a review from a team as a code owner September 19, 2025 13:19
@rubenruizdegauna rubenruizdegauna added bug Something isn't working backport-active-all Automated backport with mergify to all the active branches labels Sep 19, 2025
@cmacknz cmacknz added the Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team label Sep 19, 2025
@elasticmachine
Copy link
Collaborator

Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)

@rubenruizdegauna rubenruizdegauna force-pushed the fix/liveness_units branch 3 times, most recently from 35af5b5 to 315fb34 Compare September 22, 2025 08:04
@elasticmachine
Copy link
Collaborator

💛 Build succeeded, but was flaky

Failed CI Steps

History

cc @rubenruizdegauna

Copy link
Contributor

@blakerouse blakerouse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for cleaning up the loop. Looks good.

@rubenruizdegauna rubenruizdegauna merged commit 4b818a1 into elastic:main Sep 23, 2025
23 checks passed
@github-actions
Copy link
Contributor

@Mergifyio backport 8.18 8.19 9.0 9.1

@mergify
Copy link
Contributor

mergify bot commented Sep 23, 2025

mergify bot pushed a commit that referenced this pull request Sep 23, 2025
* fix: take into account units state on liveness * extract check components state to helper * merge conditional assignment into variable declaration (cherry picked from commit 4b818a1)
mergify bot pushed a commit that referenced this pull request Sep 23, 2025
* fix: take into account units state on liveness * extract check components state to helper * merge conditional assignment into variable declaration (cherry picked from commit 4b818a1)
mergify bot pushed a commit that referenced this pull request Sep 23, 2025
* fix: take into account units state on liveness * extract check components state to helper * merge conditional assignment into variable declaration (cherry picked from commit 4b818a1)
mergify bot pushed a commit that referenced this pull request Sep 23, 2025
* fix: take into account units state on liveness * extract check components state to helper * merge conditional assignment into variable declaration (cherry picked from commit 4b818a1)
rubenruizdegauna added a commit that referenced this pull request Sep 23, 2025
* fix: take into account units state on liveness * extract check components state to helper * merge conditional assignment into variable declaration (cherry picked from commit 4b818a1)
rubenruizdegauna added a commit that referenced this pull request Sep 23, 2025
* fix: take into account units state on liveness * extract check components state to helper * merge conditional assignment into variable declaration (cherry picked from commit 4b818a1)
rubenruizdegauna added a commit that referenced this pull request Sep 23, 2025
* fix: take into account units state on liveness * extract check components state to helper * merge conditional assignment into variable declaration (cherry picked from commit 4b818a1) Co-authored-by: Ruben Ruiz de Gauna <rubenruizdegauna@proton.me>
rubenruizdegauna added a commit that referenced this pull request Sep 23, 2025
* fix: take into account units state on liveness * extract check components state to helper * merge conditional assignment into variable declaration (cherry picked from commit 4b818a1)
rubenruizdegauna added a commit that referenced this pull request Sep 23, 2025
* fix: take into account units state on liveness * extract check components state to helper * merge conditional assignment into variable declaration (cherry picked from commit 4b818a1) Co-authored-by: Ruben Ruiz de Gauna <rubenruizdegauna@proton.me>
rubenruizdegauna added a commit that referenced this pull request Sep 23, 2025
* fix: take into account units state on liveness * extract check components state to helper * merge conditional assignment into variable declaration (cherry picked from commit 4b818a1) Co-authored-by: Ruben Ruiz de Gauna <rubenruizdegauna@proton.me>
rubenruizdegauna added a commit that referenced this pull request Sep 23, 2025
* fix: take into account units state on liveness * extract check components state to helper * merge conditional assignment into variable declaration (cherry picked from commit 4b818a1) Co-authored-by: Ruben Ruiz de Gauna <rubenruizdegauna@proton.me>
intxgo pushed a commit to intxgo/elastic-agent that referenced this pull request Sep 24, 2025
* fix: take into account units state on liveness * extract check components state to helper * merge conditional assignment into variable declaration
@rubenruizdegauna rubenruizdegauna deleted the fix/liveness_units branch October 6, 2025 06:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport-active-all Automated backport with mergify to all the active branches bug Something isn't working Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team

4 participants