-   Notifications  
You must be signed in to change notification settings  - Fork 8.2k
 
Description
Describe the bug
The tests/kernel/smp_abort test fails on ARMv8-R FVP
 (fvp_baser_aemv8r/fvp_aemv8r_aarch64/smp) with threads continuing
 execution after they should have been aborted. The test expects all
 threads to be aborted through a circular dependency mechanism, but
 one or more threads escape abortion and trigger assertion failures.
Target platform: ARM FVP Base RevC AEMv8-R (fvp_baser_aemv8r/fvp_aemv8r_aarch64/smp)
Root cause: The test has a timing assumption that doesn't hold on
 platforms with large emulator/simulator quantum sizes. The test creates a
 circular abort scenario where each thread (in its ISR) aborts the next
 thread in sequence. However, if the quantum is large enough, one thread's
 ISR may complete and return before the "previous" thread in the chain
 even calls k_thread_abort() on it, preventing the circular dependency
 from forming.
What was tried:
- Initially suspected kernel scheduler bugs
 - Investigated memory barrier issues (ARMv8-R weak memory ordering)
 - Examined shared 
_thread_dummystructure race conditions - Analyzed 
__attribute_const__optimization onz_smp_current_get() - All led to discovering this was a test timing issue, not a kernel bug
 
Regression
- This is a regression.
 
No - test has always had this latent timing assumption
Steps to reproduce
west build -p -b fvp_baser_aemv8r/fvp_aemv8r_aarch64/smp tests/kernel/smp_abort/ west build -t runRelevant log output
START - test_smp_thread_abort_deadlock Thread 0 started Thread 1 started Thread 2 started Thread 3 started Assertion failed at WEST_TOPDIR/zephyr/tests/kernel/smp_abort/src/main.c:53: thread_entry: (false is false) Thread 0 did not abort! Impact
Functional Limitation – Test fails on ARMv8-R FVP, but the kernel scheduler
 code is correct. The test works on QEMU and ARMv9-A FVP due to different
 timing characteristics.
Environment
- OS: Linux
 - Toolchain: Zephyr SDK
 - Board: fvp_baser_aemv8r/fvp_aemv8r_aarch64/smp
 - Commit: a3aa513
 
Additional Context
PR #98682 must be applied to get rid of the following:
ASSERTION FAIL [old_thread->switch_handle == ((void *)0)] @ WEST_TOPDIR/zephyr/kernel/sched.c:865 old thread handle should be null. This is however a separate issue unrelated to this one.
Fix: PR #98817 fixes this issue.