This repository is meant to provide open source resources for educational purposes about CUDA C/C++ programming, which is the C/C++ interface to the CUDA parallel computing platform. In CUDA, the host refers to the CPU and its memory, while the device refers to the GPU and its memory. Code run on the host can manage memory on both the host and device, and also launches kernels which are functions executed on the device by many GPU threads in parallel.
NOTE: it is assumed that you have access to a computer with a CUDA-enabled NVIDIA GPU.
Here you can find the solutions for different simple exercises about GPU programming in CUDA C/C++. The source code is well commented and easy to follow, though a minimum knowledge of parallel architectures is recommended.
- exercise 0: print devices properties
- exercise 1: hello, world!
- exercise 2: addition
- exercise 3: vector addition using parallel blocks
- exercise 4: vector addition using parallel threads
- exercise 5: vector addition combining blocks and threads
- exercise 6: Single-precision A*X Plus Y
- exercise 7: multiplication of square matrices
- exercise 8: transpose of a square matrix
The CUDA C/C++ compiler nvcc is part of the NVIDIA CUDA Toolkit which is used to separate source code into host and device components. Then, you can compile the code with nvcc.
NOTE: to find out how long the kernel takes to run or to check the memory usage, you can type nvprof ./<binary> or cuda-memcheck ./<binary> on the command line, respectively.
This project is licensed under the MIT License - see the LICENSE file for details.