GPU Computing Motivation
Computing Challenge graphic Task Computing Data Computing © NVIDIA Corporation 2007
Extreme Growth in Raw Data YouTube Bandwidth Growth Walmart Transaction Tracking Millions Millions Source: Alexa, YouTube 2006 Source: Hedburg, CPI, Walmart BP Oil and Gas Active Data NOAA Weather Data NOAA NASA Weather Data in Petabytes 90 80 70 Terabytes 60 Petabytes 50 40 30 20 10 0 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 Source: Jim Farnsworth, BP May 2005 © NVIDIA Corporation 2007 Source: John Bates, NOAA Nat. Climate Center
Computational Horsepower GPU is a massively parallel computation engine High memory bandwidth (5-10x CPU) High floating-point performance (5-10x CPU) © NVIDIA Corporation 2007
Benchmarking: CPU vs. GPU Computing G80 vs. Core2 Duo 2.66 GHz Measured against commercial CPU benchmarks when possible © NVIDIA Corporation 2007
“Free” Massively Parallel Processors It’s not science fiction, it’s just funded by them Asst Master Chief Harvard
Success Stories
Success Stories: Data to Design Acceleware EM Field simulation technology for the GPU 3D Finite-Difference and Finite-Element (FDTD) Modeling of: Cell phone irradiation MRI Design / Modeling Printed Circuit Boards Radar Cross Section (Military) 700 20X 600 500 400 Performance (Mcells/s) 10X Pacemaker with Transmit Antenna 300 200 5X 100 1X 0 CPU 1 GPU 2 GPUs 4 GPUs 3.2 GHz © NVIDIA Corporation 2007
EvolvedMachines 130X Speed up Simulate brain circuitry Sensory computing: vision, olfactory EvolvedMachines © NVIDIA Corporation 2007
Matlab: Language of Science 10X with MATLAB CPU+GPU Pseudo-spectral simulation of 2D Isotropic turbulence http://developer.nvidia.com/object/matlab_cuda.html http://www.amath.washington.edu/courses/571-winter-2006/matlab/FS_2Dturb.m © NVIDIA Corporation 2007
MATLAB Example: Advection of an elliptic vortex 256x256 mesh, 512 RK4 steps, Linux, MATLAB file http://www.amath.washington.edu/courses/571-winter-2006/matlab/FS_vortex.m Matlab 168 seconds Matlab with CUDA (single precision FFTs) 20 seconds © NVIDIA Corporation 2007
MATLAB Example: Pseudo-spectral simulation of 2D Isotropic turbulence 512x512 mesh, 400 RK4 steps, Windows XP, MATLAB file http://www.amath.washington.edu/courses/571-winter-2006/matlab/FS_2Dturb.m MATLAB 992 seconds MATLAB with CUDA (single precision FFTs) 93 seconds © NVIDIA Corporation 2007
NAMD/VMD Molecular Dynamics 240X speedup Computational biology © NVIDIA Corporation 2007 http://www.ks.uiuc.edu/Research/vmd/projects/ece498/lecture/
Molecular Dynamics Example Case study: molecular dynamics research at U. Illinois Urbana-Champaign (Scientist-sponsored) course project for CS 498AL: Programming Massively Parallel Multiprocessors (Kirk/Hwu) Next slides stolen from a nice description of problem, algorithms, and iterative optimization process available at: http://www.ks.uiuc.edu/Research/vmd/projects/ece498/lecture/ © NVIDIA Corporation 2007
© NVIDIA Corporation 2007
Molecular Modeling: Ion Placement Biomolecular simulations attempt to replicate in vivo conditions in silico. Model structures are initially constructed in vacuum Solvent (water) and ions are added as necessary for the required biological conditions Computational requirements scale with the size of the simulated structure © NVIDIA Corporation 2007
Evolution of Ion Placement Code First implementation was sequential Virus structure with 10^6 atoms would require 10 CPU days Tuned for Intel C/C++ vectorization+SSE, ~20x speedup Parallelized /w pthreads: high data parallelism = linear speedup Parallelized GPU accelerated implementation: 3 GeForce 8800GTX cards outrun ~300 Itanium2 CPUs! Virus structure now runs in 25 seconds on 3 GPUs! Further speedups should still be possible… © NVIDIA Corporation 2007
Multi-GPU CUDA Coulombic Potential Map Performance Host: Intel Core 2 Quad, 8GB RAM, ~$3,000 3 GPUs: NVIDIA GeForce 8800GTX, ~$550 each 32-bit RHEL4 Linux (want 64-bit CUDA!!) 235 GFLOPS per GPU for current version of coulombic potential map kernel 705 GFLOPS total for multithreaded multi-GPU version Three GeForce 8800GTX GPUs in a single machine, cost ~$4,650 © NVIDIA Corporation 2007
Professor Partnership
NVIDIA Professor Partnership Support faculty research & teaching efforts Small equipment gifts (1-2 GPUs) Significant discounts on GPU purchases Easy Especially Quadro, Tesla equipment Useful for cost matching Research contracts Small cash grants (typically ~$25K gifts) Competitive Medium-scale equipment donations (10-30 GPUs) Informal proposals, reviewed quarterly Focus areas: GPU computing, especially with an educational mission or component http://www.nvidia.com/page/professor_partnership.html © NVIDIA Corporation 2007

Example Application of GPU

  • 1.
    GPU Computing Motivation
  • 2.
    Computing Challenge graphic Task Computing Data Computing © NVIDIA Corporation 2007
  • 3.
    Extreme Growth inRaw Data YouTube Bandwidth Growth Walmart Transaction Tracking Millions Millions Source: Alexa, YouTube 2006 Source: Hedburg, CPI, Walmart BP Oil and Gas Active Data NOAA Weather Data NOAA NASA Weather Data in Petabytes 90 80 70 Terabytes 60 Petabytes 50 40 30 20 10 0 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 Source: Jim Farnsworth, BP May 2005 © NVIDIA Corporation 2007 Source: John Bates, NOAA Nat. Climate Center
  • 4.
    Computational Horsepower GPU is a massively parallel computation engine High memory bandwidth (5-10x CPU) High floating-point performance (5-10x CPU) © NVIDIA Corporation 2007
  • 5.
    Benchmarking: CPU vs.GPU Computing G80 vs. Core2 Duo 2.66 GHz Measured against commercial CPU benchmarks when possible © NVIDIA Corporation 2007
  • 6.
    “Free” Massively ParallelProcessors It’s not science fiction, it’s just funded by them Asst Master Chief Harvard
  • 7.
  • 8.
    Success Stories: Datato Design Acceleware EM Field simulation technology for the GPU 3D Finite-Difference and Finite-Element (FDTD) Modeling of: Cell phone irradiation MRI Design / Modeling Printed Circuit Boards Radar Cross Section (Military) 700 20X 600 500 400 Performance (Mcells/s) 10X Pacemaker with Transmit Antenna 300 200 5X 100 1X 0 CPU 1 GPU 2 GPUs 4 GPUs 3.2 GHz © NVIDIA Corporation 2007
  • 9.
    EvolvedMachines 130X Speed up Simulatebrain circuitry Sensory computing: vision, olfactory EvolvedMachines © NVIDIA Corporation 2007
  • 10.
    Matlab: Language ofScience 10X with MATLAB CPU+GPU Pseudo-spectral simulation of 2D Isotropic turbulence http://developer.nvidia.com/object/matlab_cuda.html http://www.amath.washington.edu/courses/571-winter-2006/matlab/FS_2Dturb.m © NVIDIA Corporation 2007
  • 11.
    MATLAB Example: Advection ofan elliptic vortex 256x256 mesh, 512 RK4 steps, Linux, MATLAB file http://www.amath.washington.edu/courses/571-winter-2006/matlab/FS_vortex.m Matlab 168 seconds Matlab with CUDA (single precision FFTs) 20 seconds © NVIDIA Corporation 2007
  • 12.
    MATLAB Example: Pseudo-spectral simulationof 2D Isotropic turbulence 512x512 mesh, 400 RK4 steps, Windows XP, MATLAB file http://www.amath.washington.edu/courses/571-winter-2006/matlab/FS_2Dturb.m MATLAB 992 seconds MATLAB with CUDA (single precision FFTs) 93 seconds © NVIDIA Corporation 2007
  • 13.
    NAMD/VMD Molecular Dynamics 240X speedup Computational biology © NVIDIA Corporation 2007 http://www.ks.uiuc.edu/Research/vmd/projects/ece498/lecture/
  • 14.
    Molecular Dynamics Example Case study: molecular dynamics research at U. Illinois Urbana-Champaign (Scientist-sponsored) course project for CS 498AL: Programming Massively Parallel Multiprocessors (Kirk/Hwu) Next slides stolen from a nice description of problem, algorithms, and iterative optimization process available at: http://www.ks.uiuc.edu/Research/vmd/projects/ece498/lecture/ © NVIDIA Corporation 2007
  • 15.
  • 16.
    Molecular Modeling: IonPlacement Biomolecular simulations attempt to replicate in vivo conditions in silico. Model structures are initially constructed in vacuum Solvent (water) and ions are added as necessary for the required biological conditions Computational requirements scale with the size of the simulated structure © NVIDIA Corporation 2007
  • 17.
    Evolution of IonPlacement Code First implementation was sequential Virus structure with 10^6 atoms would require 10 CPU days Tuned for Intel C/C++ vectorization+SSE, ~20x speedup Parallelized /w pthreads: high data parallelism = linear speedup Parallelized GPU accelerated implementation: 3 GeForce 8800GTX cards outrun ~300 Itanium2 CPUs! Virus structure now runs in 25 seconds on 3 GPUs! Further speedups should still be possible… © NVIDIA Corporation 2007
  • 18.
    Multi-GPU CUDA Coulombic PotentialMap Performance Host: Intel Core 2 Quad, 8GB RAM, ~$3,000 3 GPUs: NVIDIA GeForce 8800GTX, ~$550 each 32-bit RHEL4 Linux (want 64-bit CUDA!!) 235 GFLOPS per GPU for current version of coulombic potential map kernel 705 GFLOPS total for multithreaded multi-GPU version Three GeForce 8800GTX GPUs in a single machine, cost ~$4,650 © NVIDIA Corporation 2007
  • 19.
  • 20.
    NVIDIA Professor Partnership Support faculty research & teaching efforts Small equipment gifts (1-2 GPUs) Significant discounts on GPU purchases Easy Especially Quadro, Tesla equipment Useful for cost matching Research contracts Small cash grants (typically ~$25K gifts) Competitive Medium-scale equipment donations (10-30 GPUs) Informal proposals, reviewed quarterly Focus areas: GPU computing, especially with an educational mission or component http://www.nvidia.com/page/professor_partnership.html © NVIDIA Corporation 2007