-   Notifications  You must be signed in to change notification settings 
- Fork 17
Open
Description
Just want to point that dot4 can be improved with the dpps instruction (that I just discovered), it requres SSE4.1 (99.84% of cpus in the Steam Hardware Survey, April 2025)
pub fn dot4(v0: Vec, v1: Vec) Vec { return asm ( \\dpps $0xff, %xmm1, %xmm0 : [ret] "={xmm0}" (-> Vec), // output : [v0] "{xmm0}" (v0), // inputs [v1] "{xmm1}" (v1), ); } Didn't test if it's how mutch faster it is...
Metadata
Metadata
Assignees
Labels
No labels