- Notifications
You must be signed in to change notification settings - Fork 198
Retry saving Agent information file (fleet.enc) on Windows #9224
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| This pull request does not have a backport label. Could you fix it @ycombinator? 🙏
|
| @pchila @swiatekm Following the feedback on elastic/elastic-agent-libs#333, I'm replacing that PR with this one. At the moment, I've sketched out two alternate implementations in this PR — one that specifically retries saving the Let me know which of these approaches you prefer and I'll get rid of the other one + add tests. Or if you prefer a totally different approach to solving this bug, that's fine too — just let me know. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would start by touching only the handler_action_policy_change with a retry and see how that goes (ideally we would need a test that reproduces the issue but as I understand we don't have one yet)
Please have a look at my comment below
internal/pkg/agent/application/actions/handlers/handler_helpers_windows.go Outdated Show resolved Hide resolved
fleet.enc file rotation on Windowsfleet.enc) on Windows | Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane) |
d835648 to fd1c7dc Compare There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can start with retrying only at the handler_policy_change level (without the additional retry loop at EncryptedStore.
There are also a couple of comments about the usage of the backoff library
internal/pkg/agent/application/actions/handlers/handler_helpers_windows.go Outdated Show resolved Hide resolved
internal/pkg/agent/application/actions/handlers/handler_helpers_windows.go Outdated Show resolved Hide resolved
Can you clarify what you mean by this? In other words, what would you like to see done differently than the current implementation in this PR? |
I would do without the encrypted store retry (I am referring to this part that duplicate the original implementation of the handler retry). The bug we are trying to fix is relative to not being able to write a new |
0d1c73f to 866d457 Compare
@pchila My bad, I just forgot to remove the previous implementation (the one you linked to) which is why I was unsure as to what I was missing 🤦. Thanks for the clarification. Removed the previous implementation in 866d457. |
| The Moving PR to draft while I investigate this issue. |
e52e8c3 to 7d79815 Compare There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a couple of nitpicks, nothing blocking. Please have a look.
internal/pkg/agent/application/actions/handlers/handler_helpers_windows.go Outdated Show resolved Hide resolved
internal/pkg/agent/application/actions/handlers/handler_helpers_other.go Outdated Show resolved Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A small change in the debug logs, not blocking
internal/pkg/agent/application/actions/handlers/handler_helpers_windows.go Outdated Show resolved Hide resolved
internal/pkg/agent/application/actions/handlers/handler_helpers_windows.go Outdated Show resolved Hide resolved
internal/pkg/agent/application/actions/handlers/handler_helpers_windows.go Outdated Show resolved Hide resolved
c2e79a5 to 553678b Compare |
| @Mergifyio backport 8.17 8.18 8.19 9.0 9.1 |
💚 Build Succeeded
History
cc @ycombinator |
✅ Backports have been created
|
* Retry saving Agent information file on Windows * Implement retries on saving fleet.enc file * Remove previous implementation * Adding CHANGELOG file * Running mage fmt * Remove unnecessary changes (cherry picked from commit ead9249)
* Retry saving Agent information file on Windows * Implement retries on saving fleet.enc file * Remove previous implementation * Adding CHANGELOG file * Running mage fmt * Remove unnecessary changes (cherry picked from commit ead9249)
* Retry saving Agent information file on Windows * Implement retries on saving fleet.enc file * Remove previous implementation * Adding CHANGELOG file * Running mage fmt * Remove unnecessary changes (cherry picked from commit ead9249)
* Retry saving Agent information file on Windows * Implement retries on saving fleet.enc file * Remove previous implementation * Adding CHANGELOG file * Running mage fmt * Remove unnecessary changes (cherry picked from commit ead9249)
* Retry saving Agent information file on Windows * Implement retries on saving fleet.enc file * Remove previous implementation * Adding CHANGELOG file * Running mage fmt * Remove unnecessary changes (cherry picked from commit ead9249)
…#9224) * Retry saving Agent information file on Windows * Implement retries on saving fleet.enc file * Remove previous implementation * Adding CHANGELOG file * Running mage fmt * Remove unnecessary changes




What does this PR do?
This PR retries saving the Agent information file,
fleet.enc, on Windows. The retries are done every 50ms for up to 2 seconds.Why is it important?
Saving the Agent information file involves renaming/moving a file to its final destination. However, on Windows, it is sometimes not possible to rename/move a file to its destination file because the destination file is locked by another process (e.g. antivirus software). For such situations, we now retry the save operation on Windows.
Checklist
I have made corresponding changes to the documentationI have made corresponding change to the default configuration files./changelog/fragmentsusing the changelog toolI have added an integration test or an E2E testDisruptive User Impact
None.
Related issues