- Notifications
You must be signed in to change notification settings - Fork 49
[Windows] Rotate (rename) files with retries #333
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| @ycombinator could you please take a look at the test failures? |
| I feel like this should either be a new function, or we should change the signature and make the retry timings configurable. A file operation potentially blocking for two seconds sounds like it might be unexpected for users, even if the alternative is to get an error on Windows. |
I'm torn about this. I see the retry as an implementation detail specifically for Windows. Perhaps we could make the retry frequency more aggressive, e.g. every 50ms or a random duration between 25-75ms? I know that still means the operation could potentially block for two seconds, but it at least drops the lower bound to something much lower than the current 500ms? |
I feel like the wait time here depends on the use case and shouldn't be hardcoded. Sometimes you're ok waiting potentially indefinitely, other times you just want to skip the operation if it runs into a conflict. Is the intent here to retry if the file happened to be held temporarily by an antivirus or similar tool? |
Yes, that's exactly right. The retry mechanism is not really meant to be controllable by callers but rather something we need only in the Windows implementation. The same function's implementation for other OSes does not have retries. FWIW, we do something similar during Agent uninstallation: https://github.com/elastic/elastic-agent/blob/8fbdaaa979db0976e33612be647bb69ffaac70f8/internal/pkg/agent/install/uninstall.go#L277-L284 |
Test failure looks unrelated to the changes in this PR: Moreover, the same test passes locally (on Windows): Retrying in CI... |
To be fair, during uninstall we don't care about the final contents of a file, we just want them gone, here we are writing file content and the conflict may arise from some other concurrent writes and the final result may be unexpected. |
I'm not sure I understand how a conflict may arise. If another process is writing to this file, this function would not be able to rename it until the other process closes its handle, no? |
Yes but depending on the use case the caller of this function may want to handle it differently. |
| Alright, I'm not really convinced the retrying shouldn't just be an implementation detail for Windows but I see I'm in the minority here. So I'll concede and make a new function that explicitly gives the caller more control over the rename. How much control do we want to give the caller here? Is it sufficient to let the caller provide the retry frequency and duration (as @swiatekm suggested earlier)? Or are you thinking something more than that — like a callback that is called when the rename fails, allowing the caller to completely decide what the next action should be (e.g. retry immediately, retry after some time, do nothing, do something else first and then retry, etc.)? [EDIT] Come to think of the callback idea — that sounds like overkill as the caller could simply use the function as it is today and implement any additional logic around it on it's end. So I would not go with this idea. Either we just allow the caller to control the retry frequency and duration or we leave this function as it is today (before this PR) and implement the retry logic entirely on the caller's end around the existing function call. WDYT? |
I would be in favor of handling the retries at the point where new content is saved to fleet.enc by calling this function. |
Got it, thanks. I'll close this PR then to leave the implementation of |
| Bringing back the implementation in this PR in favor of elastic/elastic-agent#9224. See elastic/elastic-agent#9224 (comment) for details. |
b35ef1b to 95c2207 Compare There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, with the variadic opts approach I think that all sides of the discussion above are addressed
d36f8df to 04a4d5e Compare There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for making Windows the default.
💚 Build Succeeded
History
|

What does this PR do?
This PR enhances the
SafeFileRotateimplementation take a variadicoptsargument. This argument can be used to configure the function to retry it's final rename step with a specific interval and up to a specified duration. On Windows, the default behavior is to retry.This is similar to a fix made for Agent uninstallation.
Why is it important?
Windows does not permit renaming files that are currently being held open by other processes, e.g. antivirus scanners. Retrying the rename increases the likelihood that the rename will succeed on one of the retry attempts when the file is not being help open by another process.
Checklist
How to test this PR locally
On a Windows machine, reproduce the failing scenario.
Access deniederror.On a Windows machine, re-run the same test with the fix.
Access deniederror.Related issues