Skip to content

Conversation

@bjoernQ
Copy link
Contributor

@bjoernQ bjoernQ commented Sep 29, 2025

Thank you for your contribution!

We appreciate the time and effort you've put into this pull request.
To help us review it efficiently, please ensure you've gone through the following checklist:

Submission Checklist 📝

  • I have updated existing examples or added new ones (if applicable).
  • I have used cargo xtask fmt-packages command to ensure that all changed code is formatted correctly.
  • My changes were added to the CHANGELOG.md in the proper section.
  • I have added necessary changes to user code to the latest Migration Guide.
  • My changes are in accordance to the esp-rs developer guidelines

Extra:

Pull Request Details 📖

Description

This gives us the stack-overflow part from the (unstable) stack-protector feature for free.

We could even make the "save zone" a bit larger - up to 64 bytes on Xtensa and "as much as we like" on RISC-V - but it would create constraints on the alignment - it seems this might already help.

e.g. this example on C6 does funny things without the protection (using rustc 1.91.0-nightly (5eda692e7 2025-09-11))
(make sure to set the guard-offset to approx. 4096)

//% CHIPS: esp32 esp32c3 esp32c6 esp32h2 esp32s2 esp32s3 #![no_std] #![no_main] use esp_backtrace as _; use esp_hal::{delay::Delay, main}; esp_bootloader_esp_idf::esp_app_desc!(); #[main] fn main() -> ! { let _peripherals = esp_hal::init(esp_hal::Config::default()); let _delay = Delay::new(); smash_it(); loop {} } const FOO: usize = 1000; fn smash_it() { let f = [0u8; FOO]; smash(0, &f); } #[allow(unconditional_recursion)] #[inline(never)] fn smash(i: usize, _foo: &[u8; FOO]) { esp_println::println!("{i}"); let f = [0u8; FOO]; smash(i + 1, &f); }

If we see this working "well enough" we can consider expanding it to esp-rtos tasks.

skip-changelog for the change in esp-preempt

Testing

Manual testing, running existing examples

- '8'
active: 'chip == "esp32s3"'

# TODO somni doesn't yet support `%` and evalexpr doesn't support `&`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be available with somni-expr 0.2.0


#[cfg(xtensa)]
#[unsafe(no_mangle)]
#[unsafe(link_section = ".rwtext")]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this have to be in RAM? We should be able to load this from flash, and the performance isn't critical as this will just die miserably if called.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought the same - for now this was just copied, but in this case since we know "the world is about to end" a cache miss is not the worst thing that could happen 👍

@bjoernQ bjoernQ added the skip-changelog No changelog modification needed label Sep 29, 2025
@bjoernQ bjoernQ force-pushed the main-stack-overflow-check branch from 7f1c63a to 7aa0d62 Compare September 30, 2025 08:43
debug_assert!(!entry.is_null());

#[cfg(stack_guard_monitoring)]
crate::soc::enable_main_stack_guard_monitoring();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this will work as is, at least it's not watching the second core's stack.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's not intended for now to monitor the second core's stack - just detect the second core trashing the first core's stack - maybe not too useful/likely - t.b.h. I'm not sure if observing the other core's stack is really helpful 🤔

Copy link
Contributor

@bugadani bugadani Sep 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Observing the second core's stack (that is, the second core guarding its own stack) makes a lot more sense to me, especially since it's much more limited than the first one's. Cross-checking like this is probably not very useful, randomly writing to that one address is very unlikely.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thinking about it - trying to do that will probably open a can of worms - I changed my mind - let's not monitor the other cores stack

@bjoernQ bjoernQ force-pushed the main-stack-overflow-check branch 2 times, most recently from 89a7f26 to 71b9f54 Compare September 30, 2025 10:32
@bugadani
Copy link
Contributor

That recursive assert failure doesn't look too pretty 😅

// is copied to the core's stack.
static START_CORE1_FUNCTION: AtomicPtr<()> = AtomicPtr::new(core::ptr::null_mut());
static APP_CORE_STACK_TOP: AtomicPtr<u32> = AtomicPtr::new(core::ptr::null_mut());
static APP_CORE_STACK_GUARD: AtomicPtr<u32> = AtomicPtr::new(core::ptr::null_mut());
Copy link
Contributor

@bugadani bugadani Sep 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be cfg-gated to not generate warnings - or maybe the whole functionality shouldn't 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah it's not unused because of the always present start_app_core_with_stack_guard_offset - but I wasn't honoring stack_guard_monitoring in start_app_core 👍

@bjoernQ bjoernQ force-pushed the main-stack-overflow-check branch from efcb57f to c259809 Compare September 30, 2025 13:05
Copy link
Contributor

@bugadani bugadani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

S3 tests are broken. Please avoid pushing to this PR without fixing it, the HIL runner can't recover by itself.

@bjoernQ
Copy link
Contributor Author

bjoernQ commented Sep 30, 2025

S3 tests are broken. Please avoid pushing to this PR without fixing it, the HIL runner can't recover by itself.

This is an interesting thing - the offending test works perfectly well locally - something to figure out later probably

Ok - not having an env var set to disable it helps a lot

{
let stack_guard = APP_CORE_STACK_GUARD.load(Ordering::Acquire);
// setting 0 effectively disables the functionality
crate::debugger::set_stack_watchpoint(stack_guard as usize);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The stack guard value should be written before enabling the watchpoint on the second core

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't write the stack guard yet - but why not might help if someone is manually inspecting the stack

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is currently incidental, esp-rtos may decide to place it's stack canary to the watched address

bjoernQ and others added 2 commits September 30, 2025 18:07
@bugadani bugadani enabled auto-merge September 30, 2025 16:26
@bugadani bugadani added this pull request to the merge queue Sep 30, 2025
Merged via the queue into esp-rs:main with commit c66e4cf Sep 30, 2025
28 checks passed
SergioGasquez added a commit to SergioGasquez/esp-hal that referenced this pull request Oct 2, 2025
* Allow setting a custom idle hook (esp-rs#4209) * Document esp-rtos just a bit (esp-rs#4208) * Slight MG touchup * Document esp-rtos in migration guide * Document things * Apply suggestion from @Copilot Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Improve esp-rtos docs --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * refactor(ledc): introduce freq unset error and mode conversion (esp-rs#4214) * feat: introduce freq unset error and mode conversion * chore: format * refactor * refactor!: rename api * fix * fix docs * Disable unusable pins for various chips (esp-rs#4202) * ESP32: disable integrated SPI-connected pins * remove unavailable/non-existent pins for esp32c2 and esp32h2 * address reviews, drop more user-unusable pins * edit the changelog * fix psram (esp32) oi * RTOS: Use simpler sync object for executor (esp-rs#4215) * Use simpler sync object internally * Update esp-rtos/src/task/mod.rs Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Monitor the main stack's stack guard (esp-rs#4207) * Monitor the main stack's stack guard * CHANGELOG.md * Clippy * Save a bit of RAM * Some some IRAM * Avoid warning * Hint on register names * Protect the main stack on the second core, too * Fix `rt` w/o main stack protection * Use correct cfg gates * Refactor * Improvements * More rt gating * Stack-guard for second core * Adapt examples and tests * Bump somni-expr * CHANGELOG and migration guide * Fix doc-tests * fmt * Use ESP_HAL_CONFIG_STACK_GUARD_OFFSET for the second core * Fix test * Honor stack_guard_monitoring everywhere * Fix rebased migration guide * Fix doc-tests again * Update esp-hal/src/debugger.rs Co-authored-by: Dániel Buga <bugadani@gmail.com> * Write the stack guard value on the second core --------- Co-authored-by: Dániel Buga <bugadani@gmail.com> * Move `multicore` from `esp-storage` to `esp-hal` (esp-rs#4188) * Move multicore from esp-storage to esp-hal * changelog * reviews * address review comments and clean up * remove duplicated park_core() and split is_running() per chip * cleanup * Clean up blocking until deadline (esp-rs#4217) * Hw-based RTOS stack overflow detection (esp-rs#4218) * Respect the stack guard offset in thread stacks * Use the hardware watchpoint to protect rtos thread stacks * Use watchpoints for stack overflow detection * Do not rely on the value of the stack guard (esp-rs#4220) * Make BLE task configurable (esp-rs#4223) * Redo config options (esp-rs#4224) * Remove unused deps (esp-rs#4230) * `esp-phy` the second (esp-rs#4228) * Hopefully fixes it * fmt * Fix documentation link in Cargo.toml --------- Co-authored-by: Dániel Buga <bugadani@gmail.com> * twai: do not abort transmissions on recoverable errors (esp-rs#4227) * twai: do not abort transmissions on recoverable errors The TWAI driver's interrupt handler currently aborts transmissions if certain (error) interrupts were triggered. This causes the transmission of frames to become unreliable, because the `transmit_async` operation does not return any error codes if a transmission was aborted. Additionally, the TWAI controller is designed to be a reliable and, with the exception of the bus-off state, normally would simply try to rentransmit a message on its own. Therefore, this commit removes transmission abortion on errors and, instead, only aborts a transmission if the controller changes into the bus-off state. Fixes esp-rs#4222 * Update esp-hal/CHANGELOG.md Co-authored-by: Juraj Sadel <jurajsadel@gmail.com> --------- Co-authored-by: Juraj Sadel <jurajsadel@gmail.com> * Change default wifi_max_burst_size (esp-rs#4231) * Remove outdated limitation (esp-rs#4232) * feat: Use github actions artifacts to store semver baseline --------- Co-authored-by: Dániel Buga <bugadani@gmail.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Ethan Wu <7086cmd@gmail.com> Co-authored-by: Kirill Mikhailov <62840029+playfulFence@users.noreply.github.com> Co-authored-by: Björn Quentin <bjoernQ@users.noreply.github.com> Co-authored-by: Juraj Sadel <juraj.sadel@espressif.com> Co-authored-by: Simon Neuenhausen <frostie.neuenhausen@gmail.com> Co-authored-by: Harald Böhm <harald@boehm.dev> Co-authored-by: Juraj Sadel <jurajsadel@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

skip-changelog No changelog modification needed

2 participants