Skip to content

tests: kernel: smp_abort timing race on ARMv8-R FVP #98819

@npitre

Description

@npitre

Describe the bug

The tests/kernel/smp_abort test fails on ARMv8-R FVP
(fvp_baser_aemv8r/fvp_aemv8r_aarch64/smp) with threads continuing
execution after they should have been aborted. The test expects all
threads to be aborted through a circular dependency mechanism, but
one or more threads escape abortion and trigger assertion failures.

Target platform: ARM FVP Base RevC AEMv8-R (fvp_baser_aemv8r/fvp_aemv8r_aarch64/smp)

Root cause: The test has a timing assumption that doesn't hold on
platforms with large emulator/simulator quantum sizes. The test creates a
circular abort scenario where each thread (in its ISR) aborts the next
thread in sequence. However, if the quantum is large enough, one thread's
ISR may complete and return before the "previous" thread in the chain
even calls k_thread_abort() on it, preventing the circular dependency
from forming.

What was tried:

  • Initially suspected kernel scheduler bugs
  • Investigated memory barrier issues (ARMv8-R weak memory ordering)
  • Examined shared _thread_dummy structure race conditions
  • Analyzed __attribute_const__ optimization on z_smp_current_get()
  • All led to discovering this was a test timing issue, not a kernel bug

Regression

  • This is a regression.

No - test has always had this latent timing assumption

Steps to reproduce

west build -p -b fvp_baser_aemv8r/fvp_aemv8r_aarch64/smp tests/kernel/smp_abort/ west build -t run

Relevant log output

START - test_smp_thread_abort_deadlock Thread 0 started Thread 1 started Thread 2 started Thread 3 started Assertion failed at WEST_TOPDIR/zephyr/tests/kernel/smp_abort/src/main.c:53: thread_entry: (false is false) Thread 0 did not abort! 

Impact

Functional Limitation – Test fails on ARMv8-R FVP, but the kernel scheduler
code is correct. The test works on QEMU and ARMv9-A FVP due to different
timing characteristics.

Environment

  • OS: Linux
  • Toolchain: Zephyr SDK
  • Board: fvp_baser_aemv8r/fvp_aemv8r_aarch64/smp
  • Commit: a3aa513

Additional Context

PR #98682 must be applied to get rid of the following:

ASSERTION FAIL [old_thread->switch_handle == ((void *)0)] @ WEST_TOPDIR/zephyr/kernel/sched.c:865 old thread handle should be null. 

This is however a separate issue unrelated to this one.

Fix: PR #98817 fixes this issue.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions