Skip to content

Conversation

belimawr
Copy link
Contributor

@belimawr belimawr commented Apr 7, 2025

What does this PR do?

Calling proc.Kill causes the process to exit immediately, which
prevents components from executing some clean up functions.

To gracefully terminate the components, on Linux and MacOS SIGTERM
is sent.

On Windows signals are sent to process groups, child process are created
in the same group as the parent, so sending a signal to a child process
causes the Elastic-Agent to receive the signal as well. Therefore child
process need to be created in their own process group.

To do so, the child process needs to be created setting the creation flag
CREATE_NEW_PROCESS_GROUP, which has the side effect of disbling
the CTRL+C signal.

Then to correctly terminate the child process windows.GenerateConsoleCtrlEvent
is called to send CTRL+BREAK signal that is translated by go Go runtime into a
syscall.SIGINT, which is already correctly handled by Beats.

Why is it important?

It correctly signals components to exit, allowing them to run their cleanup.

Checklist

  • I have read and understood the pull request guidelines of this project.
  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in ./changelog/fragments using the changelog tool
  • I have added an integration test or an E2E test

## Disruptive User Impact

How to test this PR locally

mage build:TestBinaries cd pkg/core/process go test . 

Related issues

Questions to ask yourself

  • How are we going to support this in production?
  • How are we going to measure its adoption?
  • How are we going to debug this?
  • What are the metrics I should take care of?
  • ...
Calling proc.Kill causes the process to exit immediately, which prevents Beats from executing some clean up functions. Now windows.GenerateConsoleCtrlEvent to send a CTRL+C signal that is haled as a SIGINT, thus allowing the Beat to execute all necessary clean up.
@belimawr belimawr added bug Something isn't working skip-changelog skip-ci labels Apr 7, 2025
@belimawr belimawr self-assigned this Apr 7, 2025
Copy link
Contributor

mergify bot commented Apr 7, 2025

This pull request does not have a backport label. Could you fix it @belimawr? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-./d./d is the label that automatically backports to the 8./d branch. /d is the digit
  • backport-active-all is the label that automatically backports to all active branches.
  • backport-active-8 is the label that automatically backports to all active minor branches for the 8 major.
  • backport-active-9 is the label that automatically backports to all active minor branches for the 9 major.
@belimawr belimawr added Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team and removed skip-changelog skip-ci labels Apr 7, 2025
@belimawr belimawr marked this pull request as ready for review April 7, 2025 20:55
@belimawr belimawr requested a review from a team as a code owner April 7, 2025 20:55
@elasticmachine
Copy link
Collaborator

Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)

@belimawr
Copy link
Contributor Author

belimawr commented Apr 7, 2025

buildkite test this

@belimawr belimawr mentioned this pull request Apr 8, 2025
8 tasks
@elasticmachine
Copy link
Collaborator

💚 Build Succeeded

History

cc @belimawr

@belimawr belimawr marked this pull request as ready for review April 8, 2025 18:39
Copy link
Contributor

@blakerouse blakerouse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this. Windows hard-stopping everything is not good, this was really needed! Nice work.

@belimawr
Copy link
Contributor Author

Thanks for working on this. Windows hard-stopping everything is not good, this was really needed! Nice work.

Thank you @blakerouse!

Btw, be aware that this PR does not fully fixes the problem, sub process will still be forcefully killed until #7756 is resolved.

@belimawr belimawr merged commit 995c806 into elastic:main Apr 16, 2025
9 of 12 checks passed
@belimawr belimawr added the backport-active-all Automated backport with mergify to all the active branches label Apr 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport-active-all Automated backport with mergify to all the active branches bug Something isn't working Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team

3 participants