Use SetEvent rather than QueueUserAPC on Windows #1645
Merged
+36 −13
Add this suggestion to a batch that can be applied as a single commit. This suggestion is invalid because no changes were made to the code. Suggestions cannot be applied while the pull request is closed. Suggestions cannot be applied while viewing a subset of changes. Only one suggestion per line can be applied in a batch. Add this suggestion to a batch that can be applied as a single commit. Applying suggestions on deleted lines is not supported. You must change the existing code in this line in order to create a valid suggestion. Outdated suggestions cannot be applied. This suggestion has been applied or marked resolved. Suggestions cannot be applied from pending reviews. Suggestions cannot be applied on multi-line comments. Suggestions cannot be applied while the pull request is queued to merge. Suggestion cannot be applied right now. Please check back later.
This PR fixes a problem I introduced via #1575, namely that when built on Windows platforms Janet can hang when
ev/deadlineis called with a function that infinitely loops. The fix is to useSetEventrather thanQueueUserAPCwith an increasing back-off if a timer completes in less time than the user had specified.Background
The Windows implementation of
ev/deadlineuses a combination ofQueueUserAPCandSleepExto allow a worker thread to interrupt the main thread running the Janet VM if a timeout occurs. While the implementations used on POSIX systems seem to work reliably, CI runs of the Janet project on Windows have, over the past six months, repeatedly hung during the testing oftest/suite-ev.janet.I am now relatively confident that the problem is a consequence of the resolution of timers on Windows.
SleepEx(andWaitForSingleObjectwhich this PR uses instead) claim to support millisecond precision. However, this precision is dependent on the length of the 'tick' of the system clock. If the amount of time is less than the tick, the function may return before the amount of time has elapsed. This creates a problem when testingtest/suite-ev.janetbecause the second test ofev/deadlineuses a timeout of0.01seconds (i.e. 10 milliseconds). What happens on a non-deterministic basis is thatSleepEx(orWaitForSingleObject) will return after less than 10 milliseconds. This will cause the Janet VM to be interrupted and the event loop will iterate. However, on the next iteration, it is possible that the system clock is not at 10 milliseconds ahead and so the timeout in the Janet VM's queue will not trigger. The Janet VM will restart the fiber running the infinite loop and the system will hang.An earlier version of this PR explored whether the problem related to the use of APC and even though that appears to not be the case, this PR uses event objects as a preferable signalling approach.
Implementation
This PR replaces the APC-based mechanism with an event object-based mechanism. It also introduces an increasing back-off mechanism in the timer. The key changes are in
src/core/ev.cinjanet_timeout_body:and in
cfun_ev_deadline:With these changes, I was able to get 10 runs on my fork of Janet to work without incident. I was not previously able to get more than 5 runs.
Limitations
The memory needs of the
JanetThreadedTimeoutstruct andJanetTimeoutstruct are increased on Windows because an additional pointer needs to be allocated for the semaphore event.Additionally, the implementation does not send a message for other return values that
WaitForSingleObjectcan return. I think that's all right forWAIT_OBJECT_Obut perhaps some kind of error should be raised ifWAIT_FAILEDis returned.