GPU Computing with Ruby

GPU Computing with Ruby SpeedGo Computing Chung Shin Yee shinyee@speedgocomputing.com

CPU vs GPU Architecture 6 Core vs 1024 Core 6 GB/s vs 300 GB/s Memory Bandwidth By CUDA C Programming Guide

CUDA Programming Model . . . . By CUDA C Programming Guide

Existing Programming Tools ● Cg ● BrookGPU ● GLSL (OpenGL Shading Language) ● Nvidia CUDA C/C++ ● OpenCL ● PyCUDA Where is the Red Ruby ?

Bridging Ruby & CUDA C/C++ ● Ruby C extension – Hard to manipulate Ruby objects in C. – Compilation problems. ● Ruby FFI – Bridging purely in Ruby. – Support multiple Ruby implementations.

Developing SGC Ruby CUDA ● Object-oriented API. ● Start with crucial operations. – Memory allocation. – Memory transfer. – Kernel launch. – Wrapper for structures. ● Documented with YARD.

Driver vs Runtime API ● CUDA Driver API – For system developers. – Supported by PyCUDA. ● CUDA Runtime API – For computation centric developers. We going to support both API !

Using SGC Ruby CUDA ● Kernel program in CUDA C.

Using SGC Ruby CUDA ● Compiling kernel into PTX. – nvcc --ptx vadd.cu

Using SGC Ruby CUDA ● Setup require 'rubycu' include SGC::CU CUInit.init d = CUDevice.get(0) c = CUContext.create(d) m = CUModule.new.load(“vadd.ptx”) f = m.function(“vadd”)

Using SGC Ruby CUDA ● Memory allocations da = CUDevice.malloc(10*4) db = CUDevice.malloc(10*4) dc = CUDevice.malloc(10*4) ha = Buffer.new(:int, 10) hb = Buffer.new(:int, 10) hc = Buffer.new(:int, 10)

Using SGC Ruby CUDA ● Initialization (0...10).each { |i| ha[i] = i hb[i] = 1 hc[i] = ha[i] + hb[i] hd[i] = 0 }

Using SGC Ruby CUDA ● Transfer inputs to the GPU CUMemory.memcpy_htod(da, ha, 4*10) CUMemory.memcpy_htod(db, hb, 4*10) CUMemory.memcpy_htod(dc, hc, 4*10)

Using SGC Ruby CUDA ● Launch kernel on GPU # Launch with 1x1x1 grid, # 10x1x1 blocks, params = [da, db, dc, 10] f.launch_kernel(1, 1, 1, 10, 1, 1, 0, 0, params) By CUDA C Programming Guide By CUDA C Programming Guide

Using SGC Ruby CUDA ● Transfer results back to system memory CUMemory.memcpy_dtoh(hd, dc, 4*10) ● Verify results (0...10).each { |i| assert_equal(hc[i], hd[i]) }

Problematic CUDA Runtime API ● For use in a CUDA C/C++ program. ● Workaround – CUDA C/C++ effectively uses C/C++ bindings. – Create dynamic library for the kernel programs. – Load the library at runtime.

Current Limitations ● Support limited data types. – Fixnum → int – ?? → long – Float → float – ?? → double ● No supports for CUDA C++ templates. ● No Ruby in a kernel program.

To Support ● Texture memory. ● New features in CUDA 4.0 – Multi-GPU. – Unified Virtual Memory. ● More C data types. ● Mac platform.

Try It Now! Thank You ~ git clone git://github.com/xman/sgc-ruby-cuda.git cd sgc-ruby-cuda gem install ffi yard rake test rake yard

GPU Computing with Ruby

More Related Content

What's hot

Viewers also liked

Similar to GPU Computing with Ruby

Recently uploaded

GPU Computing with Ruby