Skip to content

Conversation

sayantn
Copy link
Contributor

@sayantn sayantn commented Oct 4, 2025

This PR adds an alignment parameter in simd_masked_load and simd_masked_store, in the form of a const-generic enum core::intrinsics::simd::SimdAlign. This represents the alignment of the ptr argument in these intrinsics as follows

  • SimdAlign::Unaligned - ptr is unaligned/1-byte aligned
  • SimdAlign::Element - ptr is aligned to the element type of the SIMD vector (default behavior in the old signature)
  • SimdAlign::Vector - ptr is aligned to the SIMD vector type

The main motive for this is stdarch - most vector loads are either fully aligned (to the vector size) or unaligned (byte-aligned), so the previous signature doesn't cut it.

Now, stdarch will mostly use SimdAlign::Unaligned and SimdAlign::Vector, whereas portable-simd will use SimdAlign::Element.

  • cg_llvm
  • cg_clif
  • miri/const_eval

Alternatives

Using a const-generic/"const" u32 parameter as alignment (and we error during codegen if this argument is not a power of two). This, although more flexible than this, has a few drawbacks

  • If we use an const-generic argument, then portable-simd somehow needs to pass align_of::<T>() as the alignment, which isn't possible without GCE
  • "const" function parameters are just an ugly hack, and a pain to deal with in non-LLVM backends

We can remedy the problem with the const-generic u32 parameter by adding a special rule for the element alignment case (e.g. 0 can mean "use the alignment of the element type), but I feel like this is not as expressive as the enum approach, although I am open to suggestions

cc @workingjubilee @RalfJung @BoxyUwU

@rustbot
Copy link
Collaborator

rustbot commented Oct 4, 2025

The Miri subtree was changed

cc @rust-lang/miri

Portable SIMD is developed in its own repository. If possible, consider making this change to rust-lang/portable-simd instead.

cc @calebzulawski, @programmerjake

Some changes occurred to the platform-builtins intrinsics. Make sure the
LLVM backend as well as portable-simd gets adapted for the changes.

cc @antoyo, @GuillaumeGomez, @bjorn3, @calebzulawski, @programmerjake

Some changes occurred in compiler/rustc_codegen_ssa

cc @WaffleLapkin

Some changes occurred to the intrinsics. Make sure the CTFE / Miri interpreter
gets adapted for the changes, if necessary.

cc @rust-lang/miri, @RalfJung, @oli-obk, @lcnr

@rustbot rustbot added A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Oct 4, 2025
@rustbot
Copy link
Collaborator

rustbot commented Oct 4, 2025

r? @lcnr

rustbot has assigned @lcnr.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@rust-log-analyzer

This comment has been minimized.

@RalfJung
Copy link
Member

RalfJung commented Oct 5, 2025

If we only need normally-aligned and unaligned loads, IMO it'd be better to just have a const generic boolean indicating which of them we want for any particular operation. That avoids ad-hoc hacks such as const parameters in intrinsics.

@programmerjake
Copy link
Member

If we only need normally-aligned and unaligned loads, IMO it'd be better to just have a const generic boolean indicating which of them we want for any particular operation. That avoids ad-hoc hacks such as const parameters in intrinsics.

for portable-simd I think we should default to element-level-alignment since I expect that to be more efficient than unaligned ops on some targets (GPUs? maybe RISC-V V?)

@RalfJung
Copy link
Member

RalfJung commented Oct 5, 2025

If we only need normally-aligned and unaligned loads, IMO it'd be better to just have a const generic boolean indicating which of them we want for any particular operation. That avoids ad-hoc hacks such as const parameters in intrinsics.

for portable-simd I think we should default to element-level-alignment since I expect that to be more efficient than unaligned ops on some targets (GPUs? maybe RISC-V V?)

Yes, so...?

IIUC we either want element-level alignment or no alignment, so we can just have a const bool generic controlling that.

@programmerjake
Copy link
Member

If we only need normally-aligned and unaligned loads, IMO it'd be better to just have a const generic boolean indicating which of them we want for any particular operation. That avoids ad-hoc hacks such as const parameters in intrinsics.

for portable-simd I think we should default to element-level-alignment since I expect that to be more efficient than unaligned ops on some targets (GPUs? maybe RISC-V V?)

Yes, so...?

I thought you meant full-simd-type alignment or unaligned, since that's what x86 uses for simd instructions it calls aligned.

@RalfJung
Copy link
Member

RalfJung commented Oct 5, 2025

This is about simd_masked_load/store which are currently documented as

/// Unmasked values in `T` must be readable as if by `<ptr>::read` (e.g. aligned to the element /// type). 
@sayantn
Copy link
Contributor Author

sayantn commented Oct 5, 2025

As a summary, we need 3 types of alignments

  • element alignment (used in portable simd)
  • simd type alignment (x86 aligned)
  • fully unaligned (x86 unaligned)

So a bool flag won't cut it, at best we can use a const generic parameter, with 0 meaning element size aligned (because that is the most used, and can't be specified using const generics (requires gce))

@RalfJung
Copy link
Member

RalfJung commented Oct 5, 2025

As a summary, we need 3 types of alignments

Now you are expanding the scope of the PR. So far the motivation has been, we'd like an unaligned version of the existing intrinsics. If you also want SIMD type aligned variants, the PR description needs to be expanded to argue for this.

IIRC, last time this was looked into, the SIMD type alignment option wasn't necessary -- LLVM was more than able to use surrounding info on reference types to deduce the right alignment for the desired codegen. So please show some concrete undesirable codegen if you want to motivate a form of this intrinsic that requires SIMD type alignment.

@sayantn
Copy link
Contributor Author

sayantn commented Oct 5, 2025

I apologise if I was unclear, but the motivation was always adding these 3 types of loads. LLVM will always generate an unaligned (byte-aligned) load/store if we pass any alignment less that the vector type size (because it is guaranteed to be safe). But for _mm_mask_load_ intrinsics, the pointer needs to be aligned to the vector size, so LLVM won't generate aligned loads unless we pass the vector size as alignment.

@RalfJung
Copy link
Member

RalfJung commented Oct 5, 2025

LLVM will always generate an unaligned (byte-aligned) load/store if we pass any alignment less that the vector type size

I don't think the "always" here is correct. If we are loading from an &__m128 LLVM should be able to use that fact to generate the aligned intrinsics.

However, I guess stdarch uses raw pointers in its API. So yeah this definitely needs to be explained properly in the PR description, currently it is at best confusing.

@RalfJung
Copy link
Member

RalfJung commented Oct 5, 2025

If we need 3 different alignment modes (Cc @Amanieu for the stdarch part here), that can still be done using const generics with a new 3-variant enum (similar to the enum we have for atomic memory access orderings).

@workingjubilee
Copy link
Member

workingjubilee commented Oct 5, 2025

Yes, std::arch tries to reflect the type signatures used by the C vendor functions... it's not exactly just "bindgen for vendor functions", but it kinda is bindgen for vendor functions.

@Amanieu
Copy link
Member

Amanieu commented Oct 5, 2025

I'm happy with the current API that takes a constant (either as an argument or a const generic). An enum doesn't really provide much of an advantage when the desired alignment can just be explicitly provided.

@sayantn
Copy link
Contributor Author

sayantn commented Oct 6, 2025

If everyone agrees, I can substitute the const argument for a const-generic u32 alignment. But then portable-simd would face problems because it has to pass align_of::<T> to the const-generic argument somehow, so I propose to add a special meaning to 0 - if 0 is passed as the alignment it is interpreted as the element type's alignment. This doesn't affect stdarch, all invocations of this intrinsic there will use literal (for x86 at least, I don't know much about other archs)

@RalfJung
Copy link
Member

RalfJung commented Oct 6, 2025

I'm happy with the current API that takes a constant (either as an argument or a const generic). An enum doesn't really provide much of an advantage when the desired alignment can just be explicitly provided.

The enum provides the big advantage that we don't need more ad-hock "constant argument" hacks.

What I was hoping to get from you is confirmation on which forms of the intrinsic are needed for stdarch.

@RalfJung
Copy link
Member

RalfJung commented Oct 6, 2025

If everyone agrees, I can substitute the const argument for a const-generic u32 alignment. But then portable-simd would face problems because it has to pass align_of::<T> to the const-generic argument somehow, so I propose to add a special meaning to 0 - if 0 is passed as the alignment it is interpreted as the element type's alignment. This doesn't affect stdarch, all invocations of this intrinsic there will use literal (for x86 at least, I don't know much about other archs)

I would prefer that over the "constant argument" hack. Not sure if it's better than a 3-value enum but 🤷 .

@lcnr
Copy link
Contributor

lcnr commented Oct 6, 2025

r? @RalfJung though feel free to reassigned

@rustbot rustbot added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Oct 9, 2025
let vals = unsafe {
simd_masked_load::<_, _, _, { SimdAlign::Unaligned }>(
mask,
buf.as_ptr().byte_offset(1), // this is guaranteed to be unaligned
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on AVR, alignment of u32 is 1, so the comment isn't always correct

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine to ignore tier 3 platforms here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also in those cases we can't get unaligned pointers at all, this works if align of u32 is not 1. Anyway this is just to check that Miri handles the alignment argument correctly, and doesn't panic

Copy link
Member

@RalfJung RalfJung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are no Miri tests for SimdAlign::Vector, I think?

View changes since this review

@rustbot rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Oct 10, 2025
@RalfJung
Copy link
Member

@bjorn3 how should we handle cranelift here? Can you help with updating the intrinsics, or is it okay to temporarily regress?

@bjorn3
Copy link
Member

bjorn3 commented Oct 10, 2025

Basically for the case where individual lanes are unaligned, you did have to replace the MemFlags::trusted() argument of load and store with MemFlags::new().with_notrap(). MemFlags::trusted() is equivalent to MemFlags::new().with_notrap().with_aligned().

@sayantn
Copy link
Contributor Author

sayantn commented Oct 10, 2025

@bjorn3 wow, that seems easy! Thanks.

And I guess we don't need to do anything fancy for the SimdAlign::Vector case, as that is basically just an optimization?

@bjorn3
Copy link
Member

bjorn3 commented Oct 10, 2025

Yeah, cg_clif currently loads and stores a lane at a time.

@rustbot
Copy link
Collaborator

rustbot commented Oct 10, 2025

Some changes occurred in compiler/rustc_codegen_cranelift

cc @bjorn3

@sayantn
Copy link
Contributor Author

sayantn commented Oct 10, 2025

@RalfJung I have added tests for SimdAlign::Vector. Thanks for reminding me, the implementation actually had a bug - it was checking the pointer alignment with the alignment of the array obtained from projecting into the SIMD type.

@bjorn3 I tried to implement it in cg_clif, could you just check it once please?

@rustbot ready

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Oct 10, 2025
@rust-log-analyzer

This comment has been minimized.

@sayantn
Copy link
Contributor Author

sayantn commented Oct 10, 2025

That's weird, the tests pass locally!

@RalfJung
Copy link
Member

That's weird, the tests pass locally!

Maybe the --target i686-pc-windows-msvc part is crucial to reproduce the problem?

@sayantn
Copy link
Contributor Author

sayantn commented Oct 10, 2025

Seems like it, what should we do? Seems like the test output is dependent on architecture.

@rust-log-analyzer
Copy link
Collaborator

The job x86_64-gnu-miri failed! Check out the build log: (web) (plain enhanced) (plain)

Click to see the possible cause of the failure (guessed by this bot)
tests/fail/weak_memory/weak_uninit.rs ... ok tests/fail/tree_borrows/reserved/int-protected-write.rs ... ok tests/fail/tree_borrows/reserved/cell-protected-write.rs ... ok FAILED TEST: tests/fail/intrinsics/simd_masked_load_underaligned.rs command: MIRI_ENV_VAR_TEST="0" MIRI_TEMP="/tmp/miri-uitest-YhdPFr" RUST_BACKTRACE="1" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage2/bin/miri" "--error-format=json" "--sysroot=/checkout/obj/build/x86_64-unknown-linux-gnu/miri-sysroot" "-Dwarnings" "-Dunused" "-Ainternal_features" "-Zui-testing" "--out-dir" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage2-tools/x86_64-unknown-linux-gnu/tmp/miri_ui/0/tests/fail/intrinsics" "tests/fail/intrinsics/simd_masked_load_underaligned.rs" "--edition" "2021" "--target" "i686-pc-windows-msvc" error: test got exit status: 0, but expected 1 = note: compilation succeeded, but was expected to fail error: no output was emitted --- full stdout: FAILED TEST: tests/fail/intrinsics/simd_masked_store_underaligned.rs command: MIRI_ENV_VAR_TEST="0" MIRI_TEMP="/tmp/miri-uitest-YhdPFr" RUST_BACKTRACE="1" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage2/bin/miri" "--error-format=json" "--sysroot=/checkout/obj/build/x86_64-unknown-linux-gnu/miri-sysroot" "-Dwarnings" "-Dunused" "-Ainternal_features" "-Zui-testing" "--out-dir" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage2-tools/x86_64-unknown-linux-gnu/tmp/miri_ui/0/tests/fail/intrinsics" "tests/fail/intrinsics/simd_masked_store_underaligned.rs" "--edition" "2021" "--target" "i686-pc-windows-msvc" error: test got exit status: 0, but expected 1 = note: compilation succeeded, but was expected to fail error: no output was emitted --- Location: /cargo/registry/src/index.crates.io-1949cf8c6b5b557f/ui_test-0.30.2/src/lib.rs:365 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ BACKTRACE ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1: <color_eyre[503d3e52f1227b6f]::config::EyreHook>::into_eyre_hook::{closure#0}<unknown> at <unknown source file>:<unknown line> 2: eyre[ee62eca60309ba8]::private::format_err<unknown> at <unknown source file>:<unknown line> 3: ui_test[6614206fa75e23c]::run_tests_generic::<ui_test[6614206fa75e23c]::default_file_filter, ui[dec2bd4cbaba463]::run_tests::{closure#1}, alloc[1dbecc4300bc0743]::boxed::Box<dyn ui_test[6614206fa75e23c]::status_emitter::StatusEmitter>><unknown> at <unknown source file>:<unknown line> 4: ui[dec2bd4cbaba463]::ui<unknown> at <unknown source file>:<unknown line> 5: ui[dec2bd4cbaba463]::main<unknown> at <unknown source file>:<unknown line> 6: std[5c593a824caf5f45]::sys::backtrace::__rust_begin_short_backtrace::<fn() -> core[2e4aa476c25a4c2]::result::Result<(), eyre[ee62eca60309ba8]::Report>, core[2e4aa476c25a4c2]::result::Result<(), eyre[ee62eca60309ba8]::Report>><unknown> at <unknown source file>:<unknown line> 7: std[5c593a824caf5f45]::rt::lang_start::<core[2e4aa476c25a4c2]::result::Result<(), eyre[ee62eca60309ba8]::Report>>::{closure#0}<unknown> at <unknown source file>:<unknown line> 8: std[5c593a824caf5f45]::rt::lang_start_internal<unknown> at <unknown source file>:<unknown line> 9: main<unknown> at <unknown source file>:<unknown line> 10: __libc_start_main<unknown> at <unknown source file>:<unknown line> 11: _start<unknown> at <unknown source file>:<unknown line> Run with COLORBT_SHOW_HIDDEN=1 environment variable to disable frame filtering. Run with RUST_BACKTRACE=full to include source snippets. error: test failed, to rerun pass `--test ui` Caused by: process didn't exit successfully: `/checkout/obj/build/x86_64-unknown-linux-gnu/stage2-tools/x86_64-unknown-linux-gnu/release/deps/ui-eca9d5916b357dc3` (exit status: 1) Bootstrap failed while executing `test --stage 2 src/tools/miri src/tools/miri/cargo-miri --target i686-pc-windows-msvc` Command `/checkout/obj/build/x86_64-unknown-linux-gnu/stage0/bin/cargo test --target x86_64-unknown-linux-gnu -Zbinary-dep-depinfo -j 4 -Zroot-dir=/checkout --locked --color always --release --manifest-path /checkout/src/tools/miri/Cargo.toml -- [workdir=/checkout]` failed with exit code 1 Created at: src/bootstrap/src/core/build_steps/tool.rs:191:21 Executed at: src/bootstrap/src/core/build_steps/test.rs:677:19 Command has failed. Rerun with -v to see more details. Build completed unsuccessfully in 0:02:46 local time: Fri Oct 10 22:11:14 UTC 2025 network time: Fri, 10 Oct 2025 22:11:14 GMT ##[error]Process completed with exit code 1. Post job cleanup. 
@RalfJung
Copy link
Member

RalfJung commented Oct 11, 2025

Well apparently your hypothesis that Simd::<i32, 5> has a larger alignment than i32 does not hold true on all targets. Maybe try Simd::<i32, 8>?
EDIT: Or maybe it's about the alignment of i32x4?

You can easily run tests on other targets yourself, Miri supports cross-interpretation. (You may have to shut up bootstrap's checks by setting BOOTSTRAP_SKIP_TARGET_SANITY=1).

@RalfJung
Copy link
Member

Ah no, you were just lucky in your previous tests. Simd::<i32, 5> is 4-aligned so it can, by pure chance, be the case that adding 4 to that actually ends up 16-aligned. You'll need to make the test reliably produce an unaligned address, e.g. by starting with something 16-aligned and then adding 4 to that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The names of the tests make little sense -- why is it "unaligned" and "underaligned"? This is about the two SimdAlign modes but those are not even mentioned.

Maybe ..._element_misaligned and ..._vector_misaligned?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. A-test-infra-minicore Area: `minicore` test auxiliary and `//@ add-core-stubs` S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue.

10 participants