InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now. Learn more β
Top 22 C++ Avx512 Projects
-
simdjson
Parsing gigabytes of JSON per second : used by Facebook/Meta Velox, the Node.js runtime, ClickHouse, WatermelonDB, Apache Doris, Milvus, StarRocks
This reminds me of an old bug in simdjson - any usage of it breaks std::unordered_map in unrelated parts of the code due to an unintentional modification of FPU flags: https://github.com/simdjson/simdjson/issues/169
-
InfluxDB
InfluxDB β Built for High-Performance Time Series Workloads. InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.
-
-
-
xsimd
C++ wrappers for SIMD intrinsics and parallelized, optimized mathematical functions (SSE, AVX, AVX512, NEON, SVE, WebAssembly, VSX, RISC-V))
Thanks, that's an important caveat!
> Meanwhile xsimd (https://github.com/xtensor-stack/xsimd) has the feature level as a template parameter on its vector objects
That's pretty cool because you can write function templates and instantiate different versions that you can select at runtime!
-
Simd
C++ image processing and machine learning library with using of SIMD: SSE, AVX, AVX-512, AMX for x86/x64, NEON for ARM. (by ermig1979)
-
less_slow.cpp
Playing around "Less Slow" coding practices in C++ 20, C, CUDA, PTX, & Assembly, from numerics & SIMD to coroutines, ranges, exception handling, networking and user-space IO
Project mention: Processing Strings 109x Faster Than Nvidia on H100 | news.ycombinator.com | 2025-09-19Yes, at the scale of 128-bit registers NEON is mostly enough, except for a few categories of instructions missing in that ISA subset, like scatter/gather ops, that can yield 30% boost over serial memory accesses: https://github.com/ashvardanian/less_slow.cpp/releases/tag/v...
-
kfr
Fast, modern C++ DSP framework, FFT, Sample Rate Conversion, FIR/IIR/Biquad Filters (SSE, AVX, AVX-512, ARM NEON, RISC-V RVV)
Project mention: Show HN: KFR 7 β major update for C++ DSP library | news.ycombinator.com | 2025-11-17 -
Stream
Stream - Scalable APIs for Chat, Feeds, Moderation, & Video. Stream helps developers build engaging apps that scale to millions with performant and flexible Chat, Feeds, Moderation, and Video APIs and SDKs powered by a global edge network and enterprise-grade infrastructure.
-
-
-
- Project mention: Copilot implemented a ThreadPool to serve as a replacement for OpenMP | news.ycombinator.com | 2025-05-09
-
-
-
-
-
-
ParallelReductionsBenchmark
Thrust, CUB, TBB, AVX2, AVX-512, CUDA, OpenCL, OpenMP, Metal, and Rust - all it takes to sum a lot of numbers fast!
I was asked this a few months back but donβt have the measurements fresh anymore. In general, I think TBB is one of the more thorough and feature-rich parallelism libraries out there. That said, I just found a comparable usage example in my benchmarks, and it doesnβt look like TBB will have the same low-latency profile as Fork Union: https://github.com/ashvardanian/ParallelReductionsBenchmark/...
-
Jsonifier
A few classes for extremely fast json parsing/serializing in modern C++. Possibly the fastest json parser in C++. Possibly the fastest json serializer in C++. (by RealTimeChris)
Project mention: Achieving a 600% Performance Improvement in String-Literal Comparisons | news.ycombinator.com | 2025-02-18 -
-
-
VectorizedKernel
Running GPGPU-like kernels on CPU with auto-vectorization for SSE/AVX/AVX512 SIMD Architectures
-
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
C++ Avx512 discussion
C++ Avx512 related posts
-
Copilot implemented a ThreadPool to serve as a replacement for OpenMP
-
Show HN: Less Slow C++
-
Expressive Vector Engine β SIMD in C++
-
Intel Releases x86-SIMD-sort 6.0 for 10x faster AVX2/AVX-512 Sorting
-
SIMD-accelerated computer vision on a $2 microcontroller
-
Measuring energy usage: regular code vs. SIMD code
-
SIMD Everywhere Optimization from ARM Neon to RISC-V Vector Extensions
- A note from our sponsor - InfluxDB www.influxdata.com | 23 Dec 2025
Index
What are some of the best open-source Avx512 projects in C++? This list will help you:
| # | Project | Stars |
|---|---|---|
| 1 | simdjson | 22,987 |
| 2 | highway | 5,205 |
| 3 | oneDNN | 3,937 |
| 4 | xsimd | 2,565 |
| 5 | Simd | 2,223 |
| 6 | less_slow.cpp | 1,886 |
| 7 | kfr | 1,823 |
| 8 | Vc | 1,512 |
| 9 | libsimdpp | 1,292 |
| 10 | primesieve | 1,052 |
| 11 | x86-simd-sort | 987 |
| 12 | std-simd | 631 |
| 13 | toys | 371 |
| 14 | sse-popcount | 348 |
| 15 | primecount | 343 |
| 16 | md5-optimisation | 145 |
| 17 | ParallelReductionsBenchmark | 113 |
| 18 | Jsonifier | 90 |
| 19 | std_find_simd | 21 |
| 20 | modernRX | 16 |
| 21 | VectorizedKernel | 9 |
| 22 | ThinkingInSimd | 5 |