InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now. Learn more →
Top 12 C++ gpu-acceleration Projects
-
TensorRT
NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
Project mention: Generative AI Interview for Senior Data Scientists: 50 Key Questions and Answers | dev.to | 2025-05-06What is the purpose of using ONNX or TensorRT for deployment? When deploying a trained deep learning model into a real-world service environment for inference, optimization to increase execution speed and reduce resource consumption is crucial. ONNX and TensorRT are prominent tools and frameworks widely used for this purpose.
-
Stream
Stream - Scalable APIs for Chat, Feeds, Moderation, & Video. Stream helps developers build engaging apps that scale to millions with performant and flexible Chat, Feeds, Moderation, and Video APIs and SDKs powered by a global edge network and enterprise-grade infrastructure.
- Project mention: Delivering the Missing Building Blocks for Nvidia CUDA Kernel Fusion in Python | news.ycombinator.com | 2025-07-16
There’s an extensive change-log supporting the CCCL 3.0 release on GitHub from 3 hours ago: https://github.com/NVIDIA/cccl/releases/tag/v3.0.0
-
-
-
-
-
OpenCL-Wrapper
OpenCL is the most powerful programming language ever created. Yet the OpenCL C++ bindings are cumbersome and the code overhead prevents many people from getting started. I created this lightweight OpenCL-Wrapper to greatly simplify OpenCL software development with C++ while keeping functionality and performance.
-
InfluxDB
InfluxDB – Built for High-Performance Time Series Workloads. InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.
-
-
stitchEm
Vahana VR & VideoStitch Studio: software to create immersive 360° VR video, live and in post-production
-
-
ParallelReductionsBenchmark
Thrust, CUB, TBB, AVX2, AVX-512, CUDA, OpenCL, OpenMP, Metal, and Rust - all it takes to sum a lot of numbers fast!
I was asked this a few months back but don’t have the measurements fresh anymore. In general, I think TBB is one of the more thorough and feature-rich parallelism libraries out there. That said, I just found a comparable usage example in my benchmarks, and it doesn’t look like TBB will have the same low-latency profile as Fork Union: https://github.com/ashvardanian/ParallelReductionsBenchmark/...
-
samarium
2D physics simulation and rendering library in modern C++, with GPU acceleration (by jjbel)
C++ gpu-acceleration discussion
C++ gpu-acceleration related posts
-
Am I the only one who sometimes edits stills in Nuke?
-
Official Firefox add-on bringing offline translation support to Firefox
-
Do any GAN's exist to restore photos with reduced colors (not grayscale coloring)?
-
C++ Show and Tell - May 2022
-
2D compositor architecture?
-
Hacker News top posts: Mar 14, 2022
-
Cascade: Open-source Node-based image editor with GPU-acceleration
- A note from our sponsor - InfluxDB www.influxdata.com | 23 Dec 2025
Index
What are some of the best open-source gpu-acceleration projects in C++? This list will help you:
| # | Project | Stars |
|---|---|---|
| 1 | TensorRT | 12,489 |
| 2 | cccl | 2,086 |
| 3 | Anime4KCPP | 1,957 |
| 4 | stdgpu | 1,238 |
| 5 | TerraForge3D | 1,105 |
| 6 | DREAMPlace | 912 |
| 7 | OpenCL-Wrapper | 454 |
| 8 | vuh | 350 |
| 9 | stitchEm | 320 |
| 10 | marian-dev | 283 |
| 11 | ParallelReductionsBenchmark | 113 |
| 12 | samarium | 9 |