GPU ACCELERATED HIGH PERFORMANCE COMPUTING PRIMER G RAJA SUMANT ASHOK ASHWIN //early draft copy
CONTENTS 1. INTRODUCTION 2.WHY USE GPU? 3.CPU ARCHITECTURE 4.GPU ARCHITECTURE 5.WHY PYTHON FOR GPU 6.HOW GPU ACCELERATION WORKS 7.TECHNOLOGIES AVAILABLE TODAY FOR GPU COMPUTING 8.CUDA+PYTHON 9.PyCuda sample code
INTRODUCTION GPU-accelerated computing is the use of a graphics processing unit (GPU) together with a CPU to accelerate scientific, engineering, and enterprise applications. Pioneered in 2007 by NVIDIA, GPUs now power energy-efficient datacenters in government labs, universities, enterprises, and small-and-medium businesses around the world.
WHY USE GPU? GPU-accelerated computing offers unprecedented application performance by offloading compute- intensive portions of the application to the GPU, while the remainder of the code still runs on the CPU. From a user's perspective, applications simply run significantly faster.
CPU ARCHITECTURE Design target for CPUs: 1)Focus on Task parallelism 2)Make a single thread very fast 3)Hide latency through large caches 4)Predict, speculate
GPU ARCHITECTURE The GPU architecture of AMD
GPU ARCHITECTURE NVIDIA ARCHITECTURE
GPU ARCHITECTURE
WHY PYTHON FOR GPU Go to a terminal , type python >> import this read the output
WHY PYTHON FOR GPU GPUs are everything that scripting languages are not. >Highly parallel >Very architecture-sensitive >Built for maximum FP/memory throughput >complement each other CPU: largely restricted to control >tasks (1000/sec) >Scripting fast enough >Python + CUDA = PyCUDA >Python + OpenCL = PyOpenCL
http://www.nvidia.com/object/what-is-gpu-computing.html
TECHNOLOGIES AVAILABLE TODAY FOR GPU COMPUTING Open computing language (OpenCL) > Many vendors: AMD, Nvidia, Apple, Intel, IBM... > Standard CPUs may report themselves as OpenCL capable >Works on most devices, but >Implemented feature set and extensions may vary Compute unified device architecture (CUDA) >One vendor: Nvidia (more mature tools) >Better coherence across a limited set of devices
CUDA + PYTHON PyCUDA >You still have to write your kernel in CUDA C >. . . but integrates easily with numpy >Higher level than CUDA C, but not much higher >Full CUDA support and performance gnumpy/CUDAMat/cuBLAS >gnumpy: numpy-like wrapper for CUDAMat >CUDAMat: Pre-written kernels and partial cuBLAS wrapper >cuBLAS: (incomplete) CUDA implementation of BLAS
PyCUDA sample code >> open helloCUDA.py in editor look and analyse the code
CUDAMAT The aim of the cudamat project is to make it easy to perform basic matrix calculations on CUDA-enabled GPUs from Python. cudamat provides a Python matrix class that performs calculations on a GPU. At present, some of the operations the GPU matrix class supports include: Easy conversion to and from instances of numpy.ndarray. Limited slicing support. Matrix multiplication and transpose, Elementwise addition, subtraction, multiplication, and division. open cudamat examples.
GNumpy Module gnumpy contains class garray, which behaves much like numpy.ndarray Module gnumpy also contains methods like tile() and rand(), which behave like their numpy counterparts except that they deal with gnumpy.garray instances, instead of numpy.ndarray instances. gnumpy builds on cudamat
References ● http://documen.tician.de/pycuda/tutorial.html ● http://on-demand.gputechconf.com/gtc/2010/presentations/S12041-PyCUDA-Simpler-GPU-Programming-Python.pdf ● http://www.tsc.uc3m.es/~miguel/MLG/adjuntos/slidesCUDA.pdf ● http://femhub.com/docs/cuda_en.pdf ● http://www.ieap.uni-kiel.de/et/people/kruse/tutorials/cuda/tutorial01p/web01p/tutorial01p.pdf ● http://conference.scipy.org/static/wiki/scipy09-pycuda-tut.pdf

Pycon2014 GPU computing

  • 1.
    GPU ACCELERATED HIGH PERFORMANCE COMPUTINGPRIMER G RAJA SUMANT ASHOK ASHWIN //early draft copy
  • 2.
    CONTENTS 1. INTRODUCTION 2.WHY USEGPU? 3.CPU ARCHITECTURE 4.GPU ARCHITECTURE 5.WHY PYTHON FOR GPU 6.HOW GPU ACCELERATION WORKS 7.TECHNOLOGIES AVAILABLE TODAY FOR GPU COMPUTING 8.CUDA+PYTHON 9.PyCuda sample code
  • 3.
    INTRODUCTION GPU-accelerated computing isthe use of a graphics processing unit (GPU) together with a CPU to accelerate scientific, engineering, and enterprise applications. Pioneered in 2007 by NVIDIA, GPUs now power energy-efficient datacenters in government labs, universities, enterprises, and small-and-medium businesses around the world.
  • 4.
    WHY USE GPU? GPU-acceleratedcomputing offers unprecedented application performance by offloading compute- intensive portions of the application to the GPU, while the remainder of the code still runs on the CPU. From a user's perspective, applications simply run significantly faster.
  • 5.
    CPU ARCHITECTURE Design targetfor CPUs: 1)Focus on Task parallelism 2)Make a single thread very fast 3)Hide latency through large caches 4)Predict, speculate
  • 6.
    GPU ARCHITECTURE The GPUarchitecture of AMD
  • 7.
  • 8.
  • 9.
    WHY PYTHON FORGPU Go to a terminal , type python >> import this read the output
  • 10.
    WHY PYTHON FORGPU GPUs are everything that scripting languages are not. >Highly parallel >Very architecture-sensitive >Built for maximum FP/memory throughput >complement each other CPU: largely restricted to control >tasks (1000/sec) >Scripting fast enough >Python + CUDA = PyCUDA >Python + OpenCL = PyOpenCL
  • 13.
  • 14.
    TECHNOLOGIES AVAILABLE TODAY FORGPU COMPUTING Open computing language (OpenCL) > Many vendors: AMD, Nvidia, Apple, Intel, IBM... > Standard CPUs may report themselves as OpenCL capable >Works on most devices, but >Implemented feature set and extensions may vary Compute unified device architecture (CUDA) >One vendor: Nvidia (more mature tools) >Better coherence across a limited set of devices
  • 15.
    CUDA + PYTHON PyCUDA >Youstill have to write your kernel in CUDA C >. . . but integrates easily with numpy >Higher level than CUDA C, but not much higher >Full CUDA support and performance gnumpy/CUDAMat/cuBLAS >gnumpy: numpy-like wrapper for CUDAMat >CUDAMat: Pre-written kernels and partial cuBLAS wrapper >cuBLAS: (incomplete) CUDA implementation of BLAS
  • 16.
    PyCUDA sample code >>open helloCUDA.py in editor look and analyse the code
  • 17.
    CUDAMAT The aim ofthe cudamat project is to make it easy to perform basic matrix calculations on CUDA-enabled GPUs from Python. cudamat provides a Python matrix class that performs calculations on a GPU. At present, some of the operations the GPU matrix class supports include: Easy conversion to and from instances of numpy.ndarray. Limited slicing support. Matrix multiplication and transpose, Elementwise addition, subtraction, multiplication, and division. open cudamat examples.
  • 18.
    GNumpy Module gnumpy containsclass garray, which behaves much like numpy.ndarray Module gnumpy also contains methods like tile() and rand(), which behave like their numpy counterparts except that they deal with gnumpy.garray instances, instead of numpy.ndarray instances. gnumpy builds on cudamat
  • 19.
    References ● http://documen.tician.de/pycuda/tutorial.html ● http://on-demand.gputechconf.com/gtc/2010/presentations/S12041-PyCUDA-Simpler-GPU-Programming-Python.pdf ●http://www.tsc.uc3m.es/~miguel/MLG/adjuntos/slidesCUDA.pdf ● http://femhub.com/docs/cuda_en.pdf ● http://www.ieap.uni-kiel.de/et/people/kruse/tutorials/cuda/tutorial01p/web01p/tutorial01p.pdf ● http://conference.scipy.org/static/wiki/scipy09-pycuda-tut.pdf