Skip to content

6.0.0-beta00: domainworkers fails with Assertion failed or SIGSEGV sometimes #1079

@edwintorok

Description

@edwintorok

Doesn't happen always, but running dune runtest in 6.0.0-beta00 tag sometimes fails like this::

dune runtest --force unixpipe: ✓ Testing library 'retry'... .............. Ok. 14 tests ran, 0 tests skipped in 0.01 seconds Testing library 'lwt_direct'... ............. Ok. 13 tests ran, 0 tests skipped in 0.00 seconds preempting: ✓ Testing library 'core'... .......................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................SSSSSSSSSSSSSSSSSSSSSSS.................................................................................................................................................. Ok. 697 tests ran, 23 tests skipped in 0.09 seconds basic: ✓ moving-promises: ✓ File "test/multidomain/dune", line 2, characters 15-28: 2 | (names basic domainworkers movingpromises unixpipe preempting) ^^^^^^^^^^^^^ Fatal error: exception File "src/core/lwt.ml", line 1039, characters 23-29: Assertion failed Testing library 'ppx'... ................ Ok. 16 tests ran, 0 tests skipped in 1.20 seconds Testing library 'react'... ........... Ok. 11 tests ran, 0 tests skipped in 4.50 seconds Testing library 'unix'... ........................................................................................................................... Ok. 123 tests ran, 0 tests skipped in 6.01 seconds 

It doesn't happen with running just that test in a loop.

Running dune runtest --force a few more times causes domainworkers to fail in a different way though:

dune runtest --force unixpipe: ✓ Testing library 'retry'... .............. Ok. 14 tests ran, 0 tests skipped in 0.01 seconds Testing library 'lwt_direct'... ............. Ok. 13 tests ran, 0 tests skipped in 0.00 seconds preempting: ✓ Testing library 'core'... .......................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................SSSSSSSSSSSSSSSSSSSSSSS.................................................................................................................................................. Ok. 697 tests ran, 23 tests skipped in 0.09 seconds basic: ✓ moving-promises: ✓ File "test/multidomain/dune", line 2, characters 15-28: 2 | (names basic domainworkers movingpromises unixpipe preempting) ^^^^^^^^^^^^^ Command got signal SEGV. Testing library 'ppx'... ................ Ok. 16 tests ran, 0 tests skipped in 1.20 seconds 

This is with OCaml 5.3.0 on AMD Ryzen 9 7950X 16-Core Processor on Fedora 42.

GDB stacktrace
 Id Target Id Frame * 1 Thread 0x7f2eab18f100 (LWP 249173) camlLwt.run_callbacks_1040 () at src/core/lwt.ml:1304 2 Thread 0x7f2e99ffe6c0 (LWP 249184) (Exiting) 0x00007f2eab2813cb in __GI_madvise () at ../sysdeps/unix/syscall-template.S:117 3 Thread 0x7f2e9afff6c0 (LWP 249182) (Exiting) 0x00007f2eab2813cb in __GI_madvise () at ../sysdeps/unix/syscall-template.S:117 4 Thread 0x7f2e92ffe6c0 (LWP 249188) __syscall_cancel_arch () at ../sysdeps/unix/sysv/linux/x86_64/syscall_cancel.S:56 5 Thread 0x7f2e93fff6c0 (LWP 249185) futex_wait (futex_word=0x1cb42e30, expected=2, private=0) at ../sysdeps/nptl/futex-internal.h:146 6 Thread 0x7f2e91ffd6c0 (LWP 249190) __syscall_cancel_arch () at ../sysdeps/unix/sysv/linux/x86_64/syscall_cancel.S:56 Thread 6 (Thread 0x7f2e91ffd6c0 (LWP 249190)): #0 __syscall_cancel_arch () at ../sysdeps/unix/sysv/linux/x86_64/syscall_cancel.S:56 No locals. #1 0x00007f2eab1fe75c in __internal_syscall_cancel (a1=<optimized out>, a2=<optimized out>, a3=<optimized out>, a4=<optimized out>, a5=a5@entry=0, a6=a6@entry=4294967295, nr=202) at cancellation.c:49 result = <optimized out> pd = <optimized out> ch = <optimized out> #2 0x00007f2eab1fedcc in __futex_abstimed_wait_common64 (private=0, futex_word=0x1cb43004, expected=<optimized out>, op=<optimized out>, abstime=0x0, cancel=true) at futex-internal.c:57 No locals. #3 __futex_abstimed_wait_common (futex_word=futex_word@entry=0x1cb43004, expected=<optimized out>, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0, cancel=cancel@entry=true) at futex-internal.c:87 err = <optimized out> clockbit = <optimized out> op = <optimized out> #4 0x00007f2eab1fee2f in __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x1cb43004, expected=<optimized out>, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0) at futex-internal.c:139 No locals. #5 0x00007f2eab20149e in __pthread_cond_wait_common (cond=0x1cb42fe0, mutex=0x1cb42fb8, clockid=0, abstime=0x0) at pthread_cond_wait.c:426 signals = <optimized out> g1_start = <optimized out> buffer = {__routine = 0x7f2eab2012c0 <__condvar_cleanup_waiting>, __arg = 0x7f2e91ffcdc0, __canceltype = 0, __prev = 0x0} cbuffer = {wseq = 3, cond = 0x1cb42fe0, mutex = 0x1cb42fb8, private = 0} err = <optimized out> result = 0 wseq = 3 g = <optimized out> seq = 1 flags = <optimized out> private = 0 #6 ___pthread_cond_wait (cond=cond@entry=0x1cb42fe0, mutex=mutex@entry=0x1cb42fb8) at pthread_cond_wait.c:458 No locals. #7 0x00000000004c8499 in caml_plat_wait (cond=cond@entry=0x1cb42fe0, mut=mut@entry=0x1cb42fb8) at runtime/platform.c:127 No locals. #8 0x00000000004af936 in backup_thread_func (v=0x1cb42fa0) at runtime/domain.c:1068 di = 0x1cb42fa0 msg = <optimized out> s = 0x1cb42fb0 #9 0x00007f2eab201f54 in start_thread (arg=<optimized out>) at pthread_create.c:448 ret = <optimized out> pd = <optimized out> out = <optimized out> unwind_buf = {cancel_jmp_buf = {{jmp_buf = {139837994686144, -2486909826438284792, 139837994686144, 139838011464368, 0, 139838011464631, -2486909826480227832, -2486887426650424824}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}} not_first_call = <optimized out> #10 0x00007f2eab28532c in __GI___clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 No locals. Thread 5 (Thread 0x7f2e93fff6c0 (LWP 249185)): #0 futex_wait (futex_word=0x1cb42e30, expected=2, private=0) at ../sysdeps/nptl/futex-internal.h:146 __ret = -512 err = <optimized out> #1 __GI___lll_lock_wait (futex=futex@entry=0x1cb42e30, private=0) at lowlevellock.c:49 No locals. #2 0x00007f2eab205501 in lll_mutex_lock_optimized (mutex=0x1cb42e30) at pthread_mutex_lock.c:48 __futex = 0x1cb42e30 private = <optimized out> #3 ___pthread_mutex_lock (mutex=mutex@entry=0x1cb42e30) at pthread_mutex_lock.c:93 type = <optimized out> __PRETTY_FUNCTION__ = "___pthread_mutex_lock" id = <optimized out> #4 0x00000000004af8c4 in caml_plat_lock_blocking (m=0x1cb42e30) at runtime/caml/platform.h:458 No locals. #5 backup_thread_func (v=0x1cb42d90) at runtime/domain.c:1076 di = 0x1cb42d90 msg = <optimized out> s = 0x1cb42da0 #6 0x00007f2eab201f54 in start_thread (arg=<optimized out>) at pthread_create.c:448 ret = <optimized out> pd = <optimized out> out = <optimized out> unwind_buf = {cancel_jmp_buf = {{jmp_buf = {139838028248768, -2486905429465515512, 139838028248768, 140735637062656, 0, 140735637062919, -2486905429507458552, -2486887426650424824}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}} not_first_call = <optimized out> #7 0x00007f2eab28532c in __GI___clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 No locals. Thread 4 (Thread 0x7f2e92ffe6c0 (LWP 249188)): #0 __syscall_cancel_arch () at ../sysdeps/unix/sysv/linux/x86_64/syscall_cancel.S:56 No locals. #1 0x00007f2eab1fe75c in __internal_syscall_cancel (a1=<optimized out>, a2=<optimized out>, a3=<optimized out>, a4=<optimized out>, a5=a5@entry=0, a6=a6@entry=0, nr=232) at cancellation.c:49 result = <optimized out> pd = <optimized out> ch = <optimized out> #2 0x00007f2eab1fe7a4 in __syscall_cancel (a1=<optimized out>, a2=<optimized out>, a3=<optimized out>, a4=<optimized out>, a5=a5@entry=0, a6=a6@entry=0, nr=232) at cancellation.c:75 r = <optimized out> #3 0x00007f2eab285615 in epoll_wait (epfd=<optimized out>, events=<optimized out>, maxevents=<optimized out>, timeout=<optimized out>) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30 No locals. #4 0x00007f2eab474c44 in epoll_poll (loop=0x7f2e8c026080, timeout=<optimized out>) at /usr/src/debug/libev-4.33-13.fc42.x86_64/ev_epoll.c:155 i = <optimized out> eventcnt = <optimized out> #5 0x00007f2eab477807 in ev_run (loop=0x7f2e8c026080, flags=2) at /usr/src/debug/libev-4.33-13.fc42.x86_64/ev.c:4157 waittime = 0.01999962985428283 sleeptime = 0 prev_mn_now = <optimized out> to = <optimized out> to = <optimized out> __PRETTY_FUNCTION__ = "ev_run" #6 0x000000000049b418 in ev_loop (loop=0x7f2e8c026080, flags=2) at /usr/include/ev.h:841 No locals. #7 0x000000000049b6d0 in lwt_libev_loop (val_loop=139838110178112, val_block=3) at lwt_libev_stubs.c:123 loop = 0x7f2e8c026080 #8 <signal handler called> No symbol table info available. #9 0x00000000004063a6 in camlLwt_engine.fun_2534 () at src/unix/lwt_engine.ml:187 No locals. #10 0x00000000004166aa in camlLwt_main.run_loop_696 () at src/unix/lwt_main.ml:45 No locals. #11 0x000000000041698f in camlLwt_main.run_756 () at src/unix/lwt_main.ml:113 No locals. #12 0x000000000045ced6 in camlStdlib__Domain.body_741 () at domain.ml:266 No locals. #13 <signal handler called> No symbol table info available. #14 0x00000000004ac8b0 in caml_callback_exn (closure=<optimized out>, closure@entry=139838147682368, arg=<optimized out>, arg@entry=1) at runtime/callback.c:208 domain_state = 0x7f2e8c002b80 #15 0x00000000004acd79 in caml_callback_res (closure=closure@entry=139838147682368, arg=arg@entry=1) at runtime/callback.c:321 No locals. #16 0x00000000004af006 in domain_thread_func (v=<optimized out>) at runtime/domain.c:1244 unrooted_callback = 139838147682368 res = <optimized out> mut = <optimized out> p = <optimized out> ml_values = 0x1cb9f1f0 signal_stack = 0x7f2e8c000b70 #17 0x00007f2eab201f54 in start_thread (arg=<optimized out>) at pthread_create.c:448 ret = <optimized out> pd = <optimized out> out = <optimized out> unwind_buf = {cancel_jmp_buf = {{jmp_buf = {139838011467456, -2486903229905389048, 139838011467456, 140735637062944, 0, 140735637063207, -2486903229947332088, -2486887426650424824}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}} not_first_call = <optimized out> #18 0x00007f2eab28532c in __GI___clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 No locals. Thread 3 (Thread 0x7f2e9afff6c0 (LWP 249182) (Exiting)): #0 0x00007f2eab2813cb in __GI_madvise () at ../sysdeps/unix/syscall-template.S:117 No locals. #1 0x00007f2eab20210f in advise_stack_range (mem=0x7f2e99fff000, size=16781312, pd=139838145689280, guardsize=<optimized out>) at /usr/src/debug/glibc-2.41-11.fc42.x86_64/nptl/allocatestack.c:196 sp = 139838145687152 pagesize_m1 = <optimized out> freesize = <optimized out> __PRETTY_FUNCTION__ = "advise_stack_range" #2 start_thread (arg=<optimized out>) at pthread_create.c:558 pd = <optimized out> out = <optimized out> unwind_buf = {cancel_jmp_buf = {{jmp_buf = {139838145689280, -2486920822628304376, 139838145689280, 140735637062944, 0, 140735637063207, -2486920822670247416, -2486887426650424824}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}} not_first_call = <optimized out> #3 0x00007f2eab28532c in __GI___clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 No locals. Thread 2 (Thread 0x7f2e99ffe6c0 (LWP 249184) (Exiting)): #0 0x00007f2eab2813cb in __GI_madvise () at ../sysdeps/unix/syscall-template.S:117 No locals. #1 0x00007f2eab20210f in advise_stack_range (mem=0x7f2e98ffe000, size=16781312, pd=139838128907968, guardsize=<optimized out>) at /usr/src/debug/glibc-2.41-11.fc42.x86_64/nptl/allocatestack.c:196 sp = 139838128905840 pagesize_m1 = <optimized out> freesize = <optimized out> __PRETTY_FUNCTION__ = "advise_stack_range" #2 start_thread (arg=<optimized out>) at pthread_create.c:558 pd = <optimized out> out = <optimized out> unwind_buf = {cancel_jmp_buf = {{jmp_buf = {139838128907968, -2486927419161200120, 139838128907968, 139838145686192, 0, 139838145686455, -2486927419203143160, -2486887426650424824}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}} not_first_call = <optimized out> #3 0x00007f2eab28532c in __GI___clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 No locals. Thread 1 (Thread 0x7f2eab18f100 (LWP 249173)): #0 camlLwt.run_callbacks_1040 () at src/core/lwt.ml:1304 No locals. #1 0x000000000041a393 in camlLwt.run_in_resolution_loop_1147 () at src/core/lwt.ml:1339 No locals. #2 0x000000000041a5f8 in camlLwt.resolve_1165 () at src/core/lwt.ml:1375 No locals. #3 0x000000000041b367 in camlLwt.callback_1437 () at src/core/lwt.ml:1701 No locals. #4 0x0000000000417fa0 in camlLwt_sequence.loop_347 () at src/core/lwt_sequence.ml:132 No locals. #5 0x000000000044f045 in camlStdlib__Array.iter_340 () at array.ml:113 No locals. #6 <signal handler called> No symbol table info available. #7 0x00000000004ac8b0 in caml_callback_exn (closure=<optimized out>, arg=<optimized out>) at runtime/callback.c:208 domain_state = 0x1cb4b9c0 #8 0x00000000004ace09 in caml_callback (closure=<optimized out>, arg=<optimized out>) at runtime/callback.c:347 No locals. #9 0x000000000049b779 in handle_io (loop=0x1cba93b0, watcher=0x1cba9bd0, revents=1) at lwt_libev_stubs.c:161 No locals. #10 0x00007f2eab47423b in ev_invoke_pending (loop=0x1cba93b0) at /usr/src/debug/libev-4.33-13.fc42.x86_64/ev.c:3770 p = <optimized out> #11 0x000000000049b6e1 in lwt_libev_loop (val_loop=139838147681440, val_block=3) at lwt_libev_stubs.c:127 loop = 0x1cba93b0 #12 <signal handler called> No symbol table info available. #13 0x00000000004063a6 in camlLwt_engine.fun_2534 () at src/unix/lwt_engine.ml:187 No locals. #14 0x00000000004166aa in camlLwt_main.run_loop_696 () at src/unix/lwt_main.ml:45 No locals. #15 0x000000000041698f in camlLwt_main.run_756 () at src/unix/lwt_main.ml:113 No locals. #16 0x0000000000404e5e in camlDune__exe__Domainworkers.main_850 () at test/multidomain/domainworkers.ml:45 No locals. #17 0x0000000000405232 in camlDune__exe__Domainworkers.entry () at test/multidomain/domainworkers.ml:74 No locals. #18 0x0000000000401af7 in caml_program () No symbol table info available. #19 <signal handler called> No symbol table info available. #20 0x00000000004d2954 in caml_startup_common (pooling=<optimized out>, argv=0x7fff91a78978) at runtime/startup_nat.c:127 exe_name = <optimized out> proc_self_exe = <optimized out> res = <optimized out> #21 caml_startup_common (argv=0x7fff91a78978, pooling=<optimized out>) at runtime/startup_nat.c:86 exe_name = <optimized out> proc_self_exe = <optimized out> res = <optimized out> #22 0x00000000004d29cb in caml_startup_exn (argv=<optimized out>) at runtime/startup_nat.c:134 No locals. #23 caml_startup (argv=<optimized out>) at runtime/startup_nat.c:139 res = <optimized out> #24 caml_main (argv=<optimized out>) at runtime/startup_nat.c:146 No locals. #25 0x000000000040166c in main (argc=<optimized out>, argv=<optimized out>) at runtime/main.c:37 No locals. quit 

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions