- Notifications
You must be signed in to change notification settings - Fork 32
Put error global variables into thread-local storage #112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| (closed and re-opened to recompute the merge commit now that #115 is merged, so CI should have something useful to say!) |
| Hmm, this is looking tedious... it looks like we're pulling in a runtime library somewhere from the error. I'm guessing that |
Pack together the current error code and message into a single structure Ease the transition to putting the error into thread-local storage
| I updated the PR with an implementation using Reproducing locally the test that failed, I get a pop-up saying |
| Hello @shym: just a heads-up that I am planning to read this PR soon. Thank you for your patience! |
nojb left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this patch! The code is extremely clear and a pleasure to review :)
LGTM (modulo a small question)
| Thanks you for your kind and thorough review! |
| int flexdll_relocate(void *tbl) { | ||
| err_t * err; | ||
| err = get_tls_error(TLS_ERROR_RESET_LAST); | ||
| if(err == NULL) return 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking more about this, shouldn't we reset err->code = 0 here? Otherwise, the check below in line 460 will fail if this function is called after another function that has set err->code.
Going one step further, perhaps when in TLS_ERROR_RESET_LAST mode, we should always set err->code = 0. Or is there a case where we want to reset one of the error codes, but not the other?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very good point, thank you very much!
I think I ended up with that code because it was not explicitly reset in the original code. That was arguably correct because flexdll_relocate is called from two places (if I didn't miss any other):
- from
flexdll_dlopenwherecodehas already been reset, - from
flexdll_initwherecodehas been set to its initial value 0.
This made me realize that I had forgotten to initialize the values when they are malloc-ed!
So I’ve rewritten the code so that:
- on
TLS_ERROR_RESET, bothcodeandlast_errorare reset (so no_LAST), - the explicit reset of
codenear the call toget_tls_error(TLS_ERROR_RESET)are removed, - the structure is initialized right after
malloc, just to make sure; the structure should be malloc-ed on a call to one of the initialisation entrypoints, in which case it will be initialised again just a few lines later, but that will ensure that a buggy program callingdlerrorwithout a previous call todlopenwill get a reliable reasonable behaviour.
nojb left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the back-and-forth, but it turns out that there is a function SetLastError https://learn.microsoft.com/en-us/windows/win32/api/errhandlingapi/nf-errhandlingapi-setlasterror
Couldn't we use that instead to restore the result of the call to GetLastError after calling the Tls* functions? It should simplify the code (no need for the last_error field).
| Very good idea indeed! |
nojb left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, thanks!
flexdll.c Outdated
| switch (error) { | ||
| err_t * err; | ||
| err = get_tls_error(TLS_ERROR_NOP); | ||
| if(err == NULL) return "error in accessing thread-local storage"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if(err == NULL) return "error in accessing thread-local storage"; | |
| if(err == NULL) return "error accessing thread-local storage"; |
flexdll.c Outdated
| DWORD msglen; | ||
| err_t * err; | ||
| err = get_tls_error(TLS_ERROR_NOP); | ||
| if(err == NULL) return "error in accessing thread-local storage"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if(err == NULL) return "error in accessing thread-local storage"; | |
| if(err == NULL) return "error accessing thread-local storage"; |
Move the last error into thread-local storage to avoid data races (and thus possible segmentation faults) when the code is used in a multithreaded setting Add a get_tls_error function to access explicitly the thread-local error to bypass limited compiler support for it (`__thread`, etc.) Pass explicitly the current error variable to internal functions to avoid calling get_tls_error when possible Document the mechanism used for TLS errors, to explain its unexpected complexity As a side-effect of that reorganisation of the code, the code of the last error is explicitly reset on all initialisation entry points (flexdll_dlopen, flexdll_wdlopen, flexdll_relocate), even when it was missing before Co-authored-by: Nicolás Ojeda Bär <n.oje.bar@gmail.com>
| Just changed the error message, and added due credit! 😄 |
| Thanks, merged! (I took the liberty of squashing the commits into a single commit; this makes it easier to revert, cherry-pick, etc.) |
Global variable
error_bufferis used to store a string that is returned to callers, so there is a race condition if dynamic linking is invoked from 2 OCaml domains in parallelSince the error message must be returned, a mutex cannot be used to prevent the race condition
GNU libc uses that same solution: keep the last error in thread-local storage; so support for calling dlerror from a different thread than the one calling dlopen is not to be expected
The problem with parallel accesses was discovered investigating segfaults in ocaml-multicore/multicoretests#290.
With this patch, these tests do not segfault.
I see the following 4 dynlink tests from the ocaml test suite failing, but they seem to fail whether or not this patch is applied and the error message seems not related.