You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
MDEV-36184 - mhnsw: support powerpc64 SIMD instructions
This patch optimises the dot_product function by leveraging vectorisation through SIMD intrinsics. This transformation enables parallel execution of multiple operations, significantly improving the performance of dot product computation on supported architectures. The original dot_product function does undergo auto-vectorisation when compiled with -O3. However, performance analysis has shown that the newly optimised implementation performs better on Power10 and achieves comparable performance on Power9 machines. Benchmark tests were conducted on both Power9 and Power10 machines, comparing the time taken by the original (auto-vectorized) code and the new vectorised code. GCC 11.5.0 on RHEL 9.5 operating system with -O3 were used. The benchmarks were performed using a sample test code with a vector size of 4096 and 10⁷ loop iterations. Here are the average execution times (in seconds) over multiple runs: Power9: Before change: ~16.364 s After change: ~16.180 s Performance gain is modest but measurable. Power10: Before change: ~8.989 s After change: ~6.446 s Significant improvement, roughly 28–30% faster. Signed-off-by: Manjul Mohan <manjul.mohan@ibm.com>
0 commit comments