From NumPy to PyTorch Mike Ruberry software engineer @ Facebook
Outline - NumPy and working with tensors - PyTorch and hardware accelerators, autograd, and computational graphs - Adding NumPy operators to Pytorch - When PyTorch is Different from NumPy - Lessons learned and future work
NumPy and working with tensors
1 >> import numpy as np 2 >> a = np.array(((1, 2), (3, 4))) array([[1, 2], [3, 4]]) 3 >> b = np.array(((-1, -2), (-3, -4))) 4 >> np.add(a, b) array([[0, 0], [0, 0]]) 5 >> np.matmul(a, b) array([[ -7, -10], [-15, -22]]) Simple NumPy Snippets
1 >> import numpy as np 2 >> a = np.array(((1, 2), (3, 4))) array([[1, 2], [3, 4]]) 3 >> b = np.array(((-1, -2), (-3, -4))) 4 >> np.add(a, b) array([[0, 0], [0, 0]]) 5 >> np.matmul(a, b) array([[ -7, -10], [-15, -22]]) Simple NumPy Snippets Tensor creation
1 >> import numpy as np 2 >> a = np.array(((1, 2), (3, 4))) array([[1, 2], [3, 4]]) 3 >> b = np.array(((-1, -2), (-3, -4))) 4 >> np.add(a, b) array([[0, 0], [0, 0]]) 5 >> np.matmul(a, b) array([[ -7, -10], [-15, -22]]) Simple NumPy Snippets Addition
1 >> import numpy as np 2 >> a = np.array(((1, 2), (3, 4))) array([[1, 2], [3, 4]]) 3 >> b = np.array(((-1, -2), (-3, -4))) 4 >> np.add(a, b) array([[0, 0], [0, 0]]) 5 >> np.matmul(a, b) array([[ -7, -10], [-15, -22]]) Simple NumPy Snippets Matrix multiplication
1 >> np.fft.fft(np.exp(2j * np.pi * np.arange(8) / 8)) array([-3.44509285e-16 +1.14423775e-17 j, 8.00000000e+00 -8.11483250e-16 j, 2.33486982e-16 +1.22464680e-16 j, 0.00000000e+00 +1.22464680e-16 j, 9.95799250e-17 +2.33486982e-16 j, 0.00000000e+00 +7.66951701e-17 j, 1.14423775e-17 +1.22464680e-16 j, 0.00000000e+00 +1.22464680e-16 j]) 2 >> A = np.array([[1,-2j],[2j,5]]) 3 >> np.linalg.cholesky(A) array([[1.+0.j, 0.+0.j], [0.+2.j, 1.+0.j]]) More Complicated NumPy Snippets
NumPy Operators Composites Primitives
Composites Primitives 1 def sinc(x): 2 x = np.asanyarray(x) 3 y = pi * where(x == 0, 1.0e-20, x) 4 return sin(y)/y 1 double npy_copysign( double x, double y) 2 { 3 npy_uint32 hx , hy; 4 GET_HIGH_WORD(hx, x); 5 GET_HIGH_WORD(hy, y); 6 SET_HIGH_WORD(x, (hx & 0x7fffffff) | (hy & 0x80000000)); 7 return x; 8 }
PyTorch and hardware accelerators, autograd, and computational graphs
1 >> import numpy as np 2 >> a = np.array(((1, 2), (3, 4))) array([[1, 2], [3, 4]]) 3 >> b = np.array(((-1, -2), (-3, -4))) 4 >> np.add(a, b) array([[0, 0], [0, 0]]) 5 >> np.matmul(a, b) array([[ -7, -10], [-15, -22]]) Simple NumPy Snippets (Again)
1 >> import torch 2 >> a = torch.tensor(((1, 2), (3, 4))) tensor([[1, 2], [3, 4]]) 3 >> b = np.array(((-1, -2), (-3, -4))) 4 >> np.add(a, b) array([[0, 0], [0, 0]]) 5 >> np.matmul(a, b) array([[ -7, -10], [-15, -22]]) Simple NumPy Snippets to PyTorch Snippets Tensor creation
1 >> import torch 2 >> a = torch.tensor(((1, 2), (3, 4))) tensor([[1, 2], [3, 4]]) 3 >> b = torch.tensor(((-1, -2), (-3, -4))) 4 >> torch.add(a, b) tensor([[0, 0], [0, 0]]) 5 >> np.matmul(a, b) array([[ -7, -10], [-15, -22]]) Addition Simple NumPy Snippets to PyTorch Snippets
1 >> import torch 2 >> a = torch.tensor(((1, 2), (3, 4))) tensor([[1, 2], [3, 4]]) 3 >> b = torch.tensor(((-1, -2), (-3, -4))) 4 >> torch.add(a, b) tensor([[0, 0], [0, 0]]) 5 >> torch.matmul(a, b) tensor([[ -7, -10], [-15, -22]]) Simple NumPy Snippets to PyTorch Snippets Matrix multiplication
1 >> import torch 2 >> a = torch.tensor(((1, 2), (3, 4))) tensor([[1, 2], [3, 4]]) 3 >> b = torch.tensor(((-1, -2), (-3, -4))) 4 >> torch.add(a, b) tensor([[0, 0], [0, 0]]) 5 >> torch.matmul(a, b) tensor([[ -7, -10], [-15, -22]]) Simple PyTorch Snippets
1 >> np.fft.fft(np.exp(2j * np.pi * np.arange(8) / 8)) array([-3.44509285e-16 +1.14423775e-17 j, 8.00000000e+00 -8.11483250e-16 j, 2.33486982e-16 +1.22464680e-16 j, 0.00000000e+00 +1.22464680e-16 j, 9.95799250e-17 +2.33486982e-16 j, 0.00000000e+00 +7.66951701e-17 j, 1.14423775e-17 +1.22464680e-16 j, 0.00000000e+00 +1.22464680e-16 j]) 2 >> A = np.array([[1,-2j],[2j,5]]) 3 >> np.linalg.cholesky(A) array([[1.+0.j, 0.+0.j], [0.+2.j, 1.+0.j]]) More Complicated NumPy Snippets (Again)
1 >> torch.fft.fft(torch.exp(2j * math.pi * torch.arange(8) / 8)) 2 tensor([ 3.2584e-07+3.1787e-08j, 8.0000e+00+4.8023e-07j, 3 -3.2584e-07+3.1787e-08j, -1.6859e-07+3.1787e-08j, 4 -3.8941e-07-2.0663e-07j, 1.3691e-07-1.9412e-07j, 5 3.8941e-07-2.0663e-07j, 1.6859e-07+3.1787e-08j]) 1 >> A = torch.tensor([[1,-2j],[2j,5]]) 2 >> torch.linalg.cholesky(A) 3 tensor([[1.+0.j, 0.+0.j], 4 [0.+2.j, 1.+0.j]]) More Complicated PyTorch Snippets
1 >> t = torch.tensor((1, 2, 3)) 2 >> a = t.numpy() 3 array([1, 2, 3]) 3 >> b = np.array((-1, -2, -3)) 4 >> result = a + b array([0, 0, 0]) 5 >> torch.from_numpy(result) tensor([0, 0, 0]) PyTorch and NumPy Interoperability
Does PyTorch have EVERY NumPy operator? - No! - NumPy has a lot of operators: A LOT - Many of them are rarely used, niche, deprecated, or in need of deprecation - But PyTorch does have hundreds of NumPy operators
1 >> import torch 2 >> a = torch.tensor(((1., 2), (3, 4)), device='cuda') tensor([[1, 2], [3, 4]], device='cuda:0') 3 >> b = torch.tensor(((-1, -2), (-3, -4)), device='cuda') 4 >> torch.add(a, b) tensor([[0, 0], [0, 0]], device='cuda:0') 5 >> torch.matmul(a.float(), b.float()) tensor([[ -7., -10.], [-15., -22.]], device='cuda:0') Simple PyTorch Snippets on CUDA
1 >> a = torch.tensor((1., 2.), requires_grad=True) 2 >> b = torch.tensor((3., 4.)) 3 >> result = (a * b).sum() 4 >> result.backward() 5 >> a.grad tensor([3., 4.]) Autograd in PyTorch
1 def sinc(x): 2 y = math.pi * torch.where(x == 0, 1.0e-20, x) 3 return torch.sin(y)/y 4 5 scripted_sinc = torch.jit.script(sinc) graph(%x.1 : Tensor): %1 : float = prim::Constant[value=3.1415926535897931 ] %3 : int = prim::Constant[value=0] %5 : float = prim::Constant[value=9.9999999999999995e-21 ] %4 : Tensor = aten::eq(%x.1, %3) %7 : Tensor = aten::where(%4, %5, %x.1) %y.1 : Tensor = aten::mul(%7, %1) %10 : Tensor = aten::sin(%y.1) %12 : Tensor = aten::div(%10, %y.1) return (%12) Computational Graphs in PyTorch
1 >> t = torch.randn(10) 2 >> linear_layer = torch.nn.Linear(10, 5) 3 >> linear_layer(t) tensor([ 0.0066, 0.2467, -0.0137, -0.4091, -1.1756], grad_fn=<AddBackward0>) Deep Learning in PyTorch
PyTorch as NumPy+ - While PyTorch doesn’t have every NumPy operator, for those it supports we can think of it as NumPy PLUS: - Support for hardware accelerators, like GPUs and TPUs - Support for autograd - Support computational graphs - Support for deep learning - A C++ API - … and many additional features (visualization, distributed training, …) - PyTorch also has additional operators that NumPy does not
PyTorch Behind the Scenes - To recap, NumPy had… - Composite operators (typically implemented in Python) - Primitive operators (implemented in C++) - And PyTorch has... - Composite operators (implemented in C++) - Primitive operators (implemented in C++, CPU intrinsics, and CUDA) - Computational graphs (executed by torchscript or XLA) - Plus autograd formulas for differentiable operations
1 def sinc(x): 2 x = np.asanyarray(x) 3 y = pi * where(x == 0, 1.0e-20, x) 4 return sin(y)/y Sinc in NumPy (reminder)
1 static void sinc_kernel(TensorIteratorBase& iter) { 2 AT_DISPATCH_FLOATING_AND_COMPLEX_TYPES_AND1( kBFloat16, iter.common_dtype(), "sinc_cpu", [&]() { 3 cpu_kernel( 4 iter, 5 [=](scalar_t a) -> scalar_t { 6 if (a == scalar_t(0)) { 7 return scalar_t(1); 8 } else { 9 scalar_t product = c10::pi<scalar_t> * a; 10 return std::sin(product) / product; 11 } 12 }); 13 }); 14 } Sinc in PyTorch, CPU kernel
Sinc in PyTorch, Autograd Formula 1 name: sinc(Tensor self) -> Tensor 2 self: grad * ((M_PI * self * (M_PI * self).cos() - (M_PI * self).sin()) / (M_PI * self * self)).conj()
Adding NumPy Operators to PyTorch
Porting an operator from NumPy - Need to write a C++ implementation - Possibly a CPU kernel or a CUDA kernel - Need to write an autograd formula (if the op is differentiable) - Need to write comprehensive tests (more on this in a moment) … why do we bother?
Porting an operator from NumPy - Need to write a C++ implementation - Possibly a CPU kernel or a CUDA kernel - Made easier with the C++ “TensorIterator” architecture - Need to write an autograd formula (if the op is differentiable) - Simplified by allowing users to write Pythonic YAML formulas - Need to write comprehensive tests (more on this in a moment) - Significant coverage automated with PyTorch’s OpInfo metadata and test generation framework
PyTorch’s test matrix - Tensor properties: - Datatype (long, float, complexfloat, etc.) - Device (CPU, CUDA, TPU, etc.) - Differentiable operations support autograd - Operations need to work in computational graphs - Operations have “function,” “method” and “inplace” variants
OpInfo for torch.mul 1 OpInfo('mul', 2 aliases =('multiply',), 3 dtypes =all_types_and_complex_and ( torch.float16, torch.bfloat16, torch.bool), 4 sample_inputs_func =sample_inputs_binary_pwise )
OpInfo for torch.sin 1 UnaryUfuncInfo ('sin', 2 ref=np.sin, 3 dtypes=all_types_and_complex_and ( torch.bool, torch.bfloat16), 4 dtypesIfCUDA=all_types_and_complex_and ( torch.bool, torch.half), 5 handles_large_floats =False, 6 handles_complex_extremals =False, 7 safe_casts_outputs =True, 8 decorators=(precisionOverride ({torch.bfloat16: 1e-2}),))
OpInfo test template 1 @ops(unary_ufuncs) 2 def test_contig_vs_transposed (self, device, dtype, op): 3 contig = make_tensor((789, 357), device=device, dtype=dtype, low=op.domain[0], high=op.domain[1]) 4 non_contig = contig.T 5 self.assertTrue(contig.is_contiguous()) 6 self.assertFalse(non_contig.is_contiguous()) 7 torch_kwargs, _ = op.sample_kwargs(device, dtype, contig) 8 self.assertEqual( op(contig, **torch_kwargs).T, op(non_contig, **torch_kwargs))
Instantiated tests for torch.sin @ops(unary_ufuncs) def test_contig_vs_transposed (self, device, dtype, op): test_contig_vs_transposed_sin_cuda_complex64 test_contig_vs_transposed_sin_cuda_float16 test_contig_vs_transposed_sin_cuda_float32 test_contig_vs_transposed_sin_cuda_int64 test_contig_vs_transposed_sin_cuda_uint8 test_contig_vs_transposed_sin_cpu_complex64 test_contig_vs_transposed_sin_cpu_float16 test_contig_vs_transposed_sin_cpu_float32 test_contig_vs_transposed_sin_cpu_int64 test_contig_vs_transposed_sin_cpu_uint8
Example properties validated for every operator - Autograd is implemented correctly - Tested using finite differences - The operation works with torchscript and torch.fx - The operation’s function, method, and inplace variants all compute the same operation - One big caveat: can’t automatically test correctness except for special classes of operators (like unary ufuncs)
Features of PyTorch’s test generator - Works with pytest and unittest - Dynamically identifies available device types - Allows for device type-specific logic for setup and teardown - Extensible by other packages adding new device types (like PyTorch/XLA) - Provides a central “source of truth” for operator’s functionality - Makes it easy to test new features with every PyTorch operator
When PyTorch is Different from NumPy
NumPy PyTorch 1 >> a = np.array((1, 2, 3)) 2 >> np.reciprocal(a) array([1, 0, 0]) np.reciprocal vs torch.reciprocal 1 >> t = torch.tensor((1, 2, 3)) 2 >> torch.reciprocal(t) tensor([ 1.0000, 0.5000, 0.3333])
NumPy PyTorch 1 >> a = np.diag( np.array((1., 2, 3))) 2 >> w, v = np.linalg.eig(a) array([1., 2., 3.]), array([ [1., 0., 0.], [0., 1., 0.], [0., 0., 1.]])) np.linalg.eig vs torch.linalg.eig 1 >> t = torch.diag( torch.tensor((1., 2, 3))) 2 >> w, v = torch.linalg.eig(t) torch.return_types.linalg_eig( eigenvalues=tensor( [1.+0.j, 2.+0.j, 3.+0.j]), eigenvectors=tensor( [[1.+0.j, 0.+0.j, 0.+0.j], [0.+0.j, 1.+0.j, 0.+0.j], [0.+0.j, 0.+0.j, 1.+0.j]]))
NumPy PyTorch 1 >> a = np.array( (complex(1, 2), complex(2, 1))) 2 >> np.amax(a) (2+1j) 3 >> np.sort(a) array([1.+2.j, 2.+1.j], dtype=complex64) Ordering complex numbers in NumPy vs. PyTorch 1 >> t = torch.tensor( (complex(1, 2), complex(2, 1))) 2 >> torch.amax(t) RUNTIME ERROR 3 >> torch.sort(t) RUNTIME ERROR
Principled discrepancies - The PyTorch community seems OK with these principled discrepancies - Different behavior must be very similar to NumPy’s behavior - It’s OK to not support some things, as long as there are other mechanisms to do them - PyTorch also has systematic discrepancies with NumPy that pass without comment - Type promotion - Functions vs. method variants - Returning scalars vs tensors
Lessons Learned and Future Work
Recap - NumPy and PyTorch are popular Python packages with operators that manipulate tensors - PyTorch implements many of NumPy’s operators, and extends them with support for hardware accelerators, autograd, and other systems that support modern scientific computing and deep learning - The PyTorch community wants both the functionality and familiarity these operators provide - But it’s OK with principled differences - To make implementing all these operators tractable, PyTorch has had to develop architecture supporting C++ and CUDA implementations, autograd formulas and testing
Lessons Learned - Do the work to engage your community and listen carefully to their feedback - At first it wasn’t clear whether people just wanted the functionality of NumPy operators, but our community has clarified they also want fidelity - Focus on developer efficiency - Be clear about your own principles when implementing operators from another project
Future Work - Prioritize deprecating and updating the few PyTorch operators with significantly different behavior than their NumPy counterparts - Make success criteria clearer: implementing every NumPy operator is impractical and inadvisable - The new Python Array API may solve this problem - More focus on SciPy functionality, including SciPy’s special module, linear algebra module, and optimizers
Thank you!

From NumPy to PyTorch

  • 1.
    From NumPy toPyTorch Mike Ruberry software engineer @ Facebook
  • 2.
    Outline - NumPy andworking with tensors - PyTorch and hardware accelerators, autograd, and computational graphs - Adding NumPy operators to Pytorch - When PyTorch is Different from NumPy - Lessons learned and future work
  • 3.
  • 4.
    1 >> importnumpy as np 2 >> a = np.array(((1, 2), (3, 4))) array([[1, 2], [3, 4]]) 3 >> b = np.array(((-1, -2), (-3, -4))) 4 >> np.add(a, b) array([[0, 0], [0, 0]]) 5 >> np.matmul(a, b) array([[ -7, -10], [-15, -22]]) Simple NumPy Snippets
  • 5.
    1 >> importnumpy as np 2 >> a = np.array(((1, 2), (3, 4))) array([[1, 2], [3, 4]]) 3 >> b = np.array(((-1, -2), (-3, -4))) 4 >> np.add(a, b) array([[0, 0], [0, 0]]) 5 >> np.matmul(a, b) array([[ -7, -10], [-15, -22]]) Simple NumPy Snippets Tensor creation
  • 6.
    1 >> importnumpy as np 2 >> a = np.array(((1, 2), (3, 4))) array([[1, 2], [3, 4]]) 3 >> b = np.array(((-1, -2), (-3, -4))) 4 >> np.add(a, b) array([[0, 0], [0, 0]]) 5 >> np.matmul(a, b) array([[ -7, -10], [-15, -22]]) Simple NumPy Snippets Addition
  • 7.
    1 >> importnumpy as np 2 >> a = np.array(((1, 2), (3, 4))) array([[1, 2], [3, 4]]) 3 >> b = np.array(((-1, -2), (-3, -4))) 4 >> np.add(a, b) array([[0, 0], [0, 0]]) 5 >> np.matmul(a, b) array([[ -7, -10], [-15, -22]]) Simple NumPy Snippets Matrix multiplication
  • 8.
    1 >> np.fft.fft(np.exp(2j* np.pi * np.arange(8) / 8)) array([-3.44509285e-16 +1.14423775e-17 j, 8.00000000e+00 -8.11483250e-16 j, 2.33486982e-16 +1.22464680e-16 j, 0.00000000e+00 +1.22464680e-16 j, 9.95799250e-17 +2.33486982e-16 j, 0.00000000e+00 +7.66951701e-17 j, 1.14423775e-17 +1.22464680e-16 j, 0.00000000e+00 +1.22464680e-16 j]) 2 >> A = np.array([[1,-2j],[2j,5]]) 3 >> np.linalg.cholesky(A) array([[1.+0.j, 0.+0.j], [0.+2.j, 1.+0.j]]) More Complicated NumPy Snippets
  • 9.
  • 10.
    Composites Primitives 1 defsinc(x): 2 x = np.asanyarray(x) 3 y = pi * where(x == 0, 1.0e-20, x) 4 return sin(y)/y 1 double npy_copysign( double x, double y) 2 { 3 npy_uint32 hx , hy; 4 GET_HIGH_WORD(hx, x); 5 GET_HIGH_WORD(hy, y); 6 SET_HIGH_WORD(x, (hx & 0x7fffffff) | (hy & 0x80000000)); 7 return x; 8 }
  • 11.
  • 12.
    1 >> importnumpy as np 2 >> a = np.array(((1, 2), (3, 4))) array([[1, 2], [3, 4]]) 3 >> b = np.array(((-1, -2), (-3, -4))) 4 >> np.add(a, b) array([[0, 0], [0, 0]]) 5 >> np.matmul(a, b) array([[ -7, -10], [-15, -22]]) Simple NumPy Snippets (Again)
  • 13.
    1 >> importtorch 2 >> a = torch.tensor(((1, 2), (3, 4))) tensor([[1, 2], [3, 4]]) 3 >> b = np.array(((-1, -2), (-3, -4))) 4 >> np.add(a, b) array([[0, 0], [0, 0]]) 5 >> np.matmul(a, b) array([[ -7, -10], [-15, -22]]) Simple NumPy Snippets to PyTorch Snippets Tensor creation
  • 14.
    1 >> importtorch 2 >> a = torch.tensor(((1, 2), (3, 4))) tensor([[1, 2], [3, 4]]) 3 >> b = torch.tensor(((-1, -2), (-3, -4))) 4 >> torch.add(a, b) tensor([[0, 0], [0, 0]]) 5 >> np.matmul(a, b) array([[ -7, -10], [-15, -22]]) Addition Simple NumPy Snippets to PyTorch Snippets
  • 15.
    1 >> importtorch 2 >> a = torch.tensor(((1, 2), (3, 4))) tensor([[1, 2], [3, 4]]) 3 >> b = torch.tensor(((-1, -2), (-3, -4))) 4 >> torch.add(a, b) tensor([[0, 0], [0, 0]]) 5 >> torch.matmul(a, b) tensor([[ -7, -10], [-15, -22]]) Simple NumPy Snippets to PyTorch Snippets Matrix multiplication
  • 16.
    1 >> importtorch 2 >> a = torch.tensor(((1, 2), (3, 4))) tensor([[1, 2], [3, 4]]) 3 >> b = torch.tensor(((-1, -2), (-3, -4))) 4 >> torch.add(a, b) tensor([[0, 0], [0, 0]]) 5 >> torch.matmul(a, b) tensor([[ -7, -10], [-15, -22]]) Simple PyTorch Snippets
  • 17.
    1 >> np.fft.fft(np.exp(2j* np.pi * np.arange(8) / 8)) array([-3.44509285e-16 +1.14423775e-17 j, 8.00000000e+00 -8.11483250e-16 j, 2.33486982e-16 +1.22464680e-16 j, 0.00000000e+00 +1.22464680e-16 j, 9.95799250e-17 +2.33486982e-16 j, 0.00000000e+00 +7.66951701e-17 j, 1.14423775e-17 +1.22464680e-16 j, 0.00000000e+00 +1.22464680e-16 j]) 2 >> A = np.array([[1,-2j],[2j,5]]) 3 >> np.linalg.cholesky(A) array([[1.+0.j, 0.+0.j], [0.+2.j, 1.+0.j]]) More Complicated NumPy Snippets (Again)
  • 18.
    1 >> torch.fft.fft(torch.exp(2j* math.pi * torch.arange(8) / 8)) 2 tensor([ 3.2584e-07+3.1787e-08j, 8.0000e+00+4.8023e-07j, 3 -3.2584e-07+3.1787e-08j, -1.6859e-07+3.1787e-08j, 4 -3.8941e-07-2.0663e-07j, 1.3691e-07-1.9412e-07j, 5 3.8941e-07-2.0663e-07j, 1.6859e-07+3.1787e-08j]) 1 >> A = torch.tensor([[1,-2j],[2j,5]]) 2 >> torch.linalg.cholesky(A) 3 tensor([[1.+0.j, 0.+0.j], 4 [0.+2.j, 1.+0.j]]) More Complicated PyTorch Snippets
  • 19.
    1 >> t= torch.tensor((1, 2, 3)) 2 >> a = t.numpy() 3 array([1, 2, 3]) 3 >> b = np.array((-1, -2, -3)) 4 >> result = a + b array([0, 0, 0]) 5 >> torch.from_numpy(result) tensor([0, 0, 0]) PyTorch and NumPy Interoperability
  • 20.
    Does PyTorch haveEVERY NumPy operator? - No! - NumPy has a lot of operators: A LOT - Many of them are rarely used, niche, deprecated, or in need of deprecation - But PyTorch does have hundreds of NumPy operators
  • 21.
    1 >> importtorch 2 >> a = torch.tensor(((1., 2), (3, 4)), device='cuda') tensor([[1, 2], [3, 4]], device='cuda:0') 3 >> b = torch.tensor(((-1, -2), (-3, -4)), device='cuda') 4 >> torch.add(a, b) tensor([[0, 0], [0, 0]], device='cuda:0') 5 >> torch.matmul(a.float(), b.float()) tensor([[ -7., -10.], [-15., -22.]], device='cuda:0') Simple PyTorch Snippets on CUDA
  • 22.
    1 >> a= torch.tensor((1., 2.), requires_grad=True) 2 >> b = torch.tensor((3., 4.)) 3 >> result = (a * b).sum() 4 >> result.backward() 5 >> a.grad tensor([3., 4.]) Autograd in PyTorch
  • 23.
    1 def sinc(x): 2y = math.pi * torch.where(x == 0, 1.0e-20, x) 3 return torch.sin(y)/y 4 5 scripted_sinc = torch.jit.script(sinc) graph(%x.1 : Tensor): %1 : float = prim::Constant[value=3.1415926535897931 ] %3 : int = prim::Constant[value=0] %5 : float = prim::Constant[value=9.9999999999999995e-21 ] %4 : Tensor = aten::eq(%x.1, %3) %7 : Tensor = aten::where(%4, %5, %x.1) %y.1 : Tensor = aten::mul(%7, %1) %10 : Tensor = aten::sin(%y.1) %12 : Tensor = aten::div(%10, %y.1) return (%12) Computational Graphs in PyTorch
  • 24.
    1 >> t= torch.randn(10) 2 >> linear_layer = torch.nn.Linear(10, 5) 3 >> linear_layer(t) tensor([ 0.0066, 0.2467, -0.0137, -0.4091, -1.1756], grad_fn=<AddBackward0>) Deep Learning in PyTorch
  • 25.
    PyTorch as NumPy+ -While PyTorch doesn’t have every NumPy operator, for those it supports we can think of it as NumPy PLUS: - Support for hardware accelerators, like GPUs and TPUs - Support for autograd - Support computational graphs - Support for deep learning - A C++ API - … and many additional features (visualization, distributed training, …) - PyTorch also has additional operators that NumPy does not
  • 26.
    PyTorch Behind theScenes - To recap, NumPy had… - Composite operators (typically implemented in Python) - Primitive operators (implemented in C++) - And PyTorch has... - Composite operators (implemented in C++) - Primitive operators (implemented in C++, CPU intrinsics, and CUDA) - Computational graphs (executed by torchscript or XLA) - Plus autograd formulas for differentiable operations
  • 27.
    1 def sinc(x): 2x = np.asanyarray(x) 3 y = pi * where(x == 0, 1.0e-20, x) 4 return sin(y)/y Sinc in NumPy (reminder)
  • 28.
    1 static voidsinc_kernel(TensorIteratorBase& iter) { 2 AT_DISPATCH_FLOATING_AND_COMPLEX_TYPES_AND1( kBFloat16, iter.common_dtype(), "sinc_cpu", [&]() { 3 cpu_kernel( 4 iter, 5 [=](scalar_t a) -> scalar_t { 6 if (a == scalar_t(0)) { 7 return scalar_t(1); 8 } else { 9 scalar_t product = c10::pi<scalar_t> * a; 10 return std::sin(product) / product; 11 } 12 }); 13 }); 14 } Sinc in PyTorch, CPU kernel
  • 29.
    Sinc in PyTorch,Autograd Formula 1 name: sinc(Tensor self) -> Tensor 2 self: grad * ((M_PI * self * (M_PI * self).cos() - (M_PI * self).sin()) / (M_PI * self * self)).conj()
  • 30.
  • 31.
    Porting an operatorfrom NumPy - Need to write a C++ implementation - Possibly a CPU kernel or a CUDA kernel - Need to write an autograd formula (if the op is differentiable) - Need to write comprehensive tests (more on this in a moment) … why do we bother?
  • 35.
    Porting an operatorfrom NumPy - Need to write a C++ implementation - Possibly a CPU kernel or a CUDA kernel - Made easier with the C++ “TensorIterator” architecture - Need to write an autograd formula (if the op is differentiable) - Simplified by allowing users to write Pythonic YAML formulas - Need to write comprehensive tests (more on this in a moment) - Significant coverage automated with PyTorch’s OpInfo metadata and test generation framework
  • 36.
    PyTorch’s test matrix -Tensor properties: - Datatype (long, float, complexfloat, etc.) - Device (CPU, CUDA, TPU, etc.) - Differentiable operations support autograd - Operations need to work in computational graphs - Operations have “function,” “method” and “inplace” variants
  • 37.
    OpInfo for torch.mul 1OpInfo('mul', 2 aliases =('multiply',), 3 dtypes =all_types_and_complex_and ( torch.float16, torch.bfloat16, torch.bool), 4 sample_inputs_func =sample_inputs_binary_pwise )
  • 38.
    OpInfo for torch.sin 1UnaryUfuncInfo ('sin', 2 ref=np.sin, 3 dtypes=all_types_and_complex_and ( torch.bool, torch.bfloat16), 4 dtypesIfCUDA=all_types_and_complex_and ( torch.bool, torch.half), 5 handles_large_floats =False, 6 handles_complex_extremals =False, 7 safe_casts_outputs =True, 8 decorators=(precisionOverride ({torch.bfloat16: 1e-2}),))
  • 39.
    OpInfo test template 1@ops(unary_ufuncs) 2 def test_contig_vs_transposed (self, device, dtype, op): 3 contig = make_tensor((789, 357), device=device, dtype=dtype, low=op.domain[0], high=op.domain[1]) 4 non_contig = contig.T 5 self.assertTrue(contig.is_contiguous()) 6 self.assertFalse(non_contig.is_contiguous()) 7 torch_kwargs, _ = op.sample_kwargs(device, dtype, contig) 8 self.assertEqual( op(contig, **torch_kwargs).T, op(non_contig, **torch_kwargs))
  • 40.
    Instantiated tests fortorch.sin @ops(unary_ufuncs) def test_contig_vs_transposed (self, device, dtype, op): test_contig_vs_transposed_sin_cuda_complex64 test_contig_vs_transposed_sin_cuda_float16 test_contig_vs_transposed_sin_cuda_float32 test_contig_vs_transposed_sin_cuda_int64 test_contig_vs_transposed_sin_cuda_uint8 test_contig_vs_transposed_sin_cpu_complex64 test_contig_vs_transposed_sin_cpu_float16 test_contig_vs_transposed_sin_cpu_float32 test_contig_vs_transposed_sin_cpu_int64 test_contig_vs_transposed_sin_cpu_uint8
  • 41.
    Example properties validatedfor every operator - Autograd is implemented correctly - Tested using finite differences - The operation works with torchscript and torch.fx - The operation’s function, method, and inplace variants all compute the same operation - One big caveat: can’t automatically test correctness except for special classes of operators (like unary ufuncs)
  • 42.
    Features of PyTorch’stest generator - Works with pytest and unittest - Dynamically identifies available device types - Allows for device type-specific logic for setup and teardown - Extensible by other packages adding new device types (like PyTorch/XLA) - Provides a central “source of truth” for operator’s functionality - Makes it easy to test new features with every PyTorch operator
  • 43.
    When PyTorch isDifferent from NumPy
  • 44.
    NumPy PyTorch 1 >>a = np.array((1, 2, 3)) 2 >> np.reciprocal(a) array([1, 0, 0]) np.reciprocal vs torch.reciprocal 1 >> t = torch.tensor((1, 2, 3)) 2 >> torch.reciprocal(t) tensor([ 1.0000, 0.5000, 0.3333])
  • 45.
    NumPy PyTorch 1 >>a = np.diag( np.array((1., 2, 3))) 2 >> w, v = np.linalg.eig(a) array([1., 2., 3.]), array([ [1., 0., 0.], [0., 1., 0.], [0., 0., 1.]])) np.linalg.eig vs torch.linalg.eig 1 >> t = torch.diag( torch.tensor((1., 2, 3))) 2 >> w, v = torch.linalg.eig(t) torch.return_types.linalg_eig( eigenvalues=tensor( [1.+0.j, 2.+0.j, 3.+0.j]), eigenvectors=tensor( [[1.+0.j, 0.+0.j, 0.+0.j], [0.+0.j, 1.+0.j, 0.+0.j], [0.+0.j, 0.+0.j, 1.+0.j]]))
  • 46.
    NumPy PyTorch 1 >>a = np.array( (complex(1, 2), complex(2, 1))) 2 >> np.amax(a) (2+1j) 3 >> np.sort(a) array([1.+2.j, 2.+1.j], dtype=complex64) Ordering complex numbers in NumPy vs. PyTorch 1 >> t = torch.tensor( (complex(1, 2), complex(2, 1))) 2 >> torch.amax(t) RUNTIME ERROR 3 >> torch.sort(t) RUNTIME ERROR
  • 47.
    Principled discrepancies - ThePyTorch community seems OK with these principled discrepancies - Different behavior must be very similar to NumPy’s behavior - It’s OK to not support some things, as long as there are other mechanisms to do them - PyTorch also has systematic discrepancies with NumPy that pass without comment - Type promotion - Functions vs. method variants - Returning scalars vs tensors
  • 48.
  • 49.
    Recap - NumPy andPyTorch are popular Python packages with operators that manipulate tensors - PyTorch implements many of NumPy’s operators, and extends them with support for hardware accelerators, autograd, and other systems that support modern scientific computing and deep learning - The PyTorch community wants both the functionality and familiarity these operators provide - But it’s OK with principled differences - To make implementing all these operators tractable, PyTorch has had to develop architecture supporting C++ and CUDA implementations, autograd formulas and testing
  • 50.
    Lessons Learned - Dothe work to engage your community and listen carefully to their feedback - At first it wasn’t clear whether people just wanted the functionality of NumPy operators, but our community has clarified they also want fidelity - Focus on developer efficiency - Be clear about your own principles when implementing operators from another project
  • 51.
    Future Work - Prioritizedeprecating and updating the few PyTorch operators with significantly different behavior than their NumPy counterparts - Make success criteria clearer: implementing every NumPy operator is impractical and inadvisable - The new Python Array API may solve this problem - More focus on SciPy functionality, including SciPy’s special module, linear algebra module, and optimizers
  • 52.