"PyTorch Deep Learning Framework: Status and Directions," a Presentation from Facebook

J O S E P H S P I S A K P R O D U C T M A N A G E R P Y T O R C H | O N D E V I C E

OVERVIEW AND INTEGRATION PYTORCH MOBILE STATE OF THE STATE ON DEVICE DYNAMIC VS. STATIC FRAMEWORKS PRINCIPLES, MAKEUP, .. OVERVIEW

REIMPLEMENTATIONTAKESWEEKSORMONTHS Research to Production at Facebook – Early 2017

ENABLINGMODELORMODELFRAGMENTTRANSFER Research to Production at Facebook – Early 2018

Research to Production at Facebook

SIMPLICITY OVER COMPLEXITY HARDWARE ACCELERATED INFERENCE DISTRIBUTED TRAINING DYNAMIC   NEURAL   NETWORKS EAGER & GRAPH-BASED EXECUTION WHAT IS PYTORCH?

P Y T O R C H R E S E A R C H P R O T O T Y P I N G P R O D U C T I O N D E P L O Y M E N T +

C O R E P R I N C I P L E S BUILDING FOR SCALE DEVELOPER EFFICIENCY

DEVELOPER EFFICIENCY ENABLING A HIGH VELOCITY OF MODEL ITERATION AND INNOVATION

` T O R C H S C R I P T Models are Python TorchScript programs, an optimizable subset of Python + Same “models are programs” idea + Production deployment + No Python dependency + Compilation for performance optimization class RNN(nn.Module): def __init__(self, W_h, U_h, W_y, b_h, b_y): super(RNN, self).__init__() self.W_h = nn.Parameter(W_h) self.U_h = nn.Parameter(U_h) self.W_y = nn.Parameter(W_y) self.b_h = nn.Parameter(b_h) self.b_y = nn.Parameter(b_y) def forward(self, x, h): y = [] for t in range(x.size(0)): h = torch.tanh(x[t] @ self.W_h + h @ self.U_h + self.b_h) y += [torch.tanh(h @ self.W_y + self.b_y)] if t % 10 == 0: print("stats: ", h.mean(), h.var()) return torch.stack(y), h   # one annotation! script_rnn = torch.jit.script(RNN(W_h, U_h, W_y, b_h, b_y))

` T O R C H S C R I P T Models are Python TorchScript programs, an optimizable subset of Python + Same “models are programs” idea + Prod deployment + No Python dependency + Optimizable (incl. codegen!)

~1,230C O N T R I B U T O R S 50%+Y O Y G R O W T H 23KP Y T O R C H F O R U M U S E R S

GROW TH IN ARXIV MENTIONS IN RESEARCH PAPERS 0 100 200 300 400 500 Jan 17 Feb 17 M ar17 Apr17 M ay17 Jun 17 Jul17 Aug17 Sep 17 Jan 18 Feb 18 M ar18 Apr18 M ay18 Jun 18 Jul18 Aug18 Sep 18 Jan 19 Feb 19 M ar19 Apr19 M ay19 Jun 19 Jul19

F R A M E W O R K S D Y N A M I C V S . S T A T I C

Declarative Toolkits COMPUTATIONGRAPH

DECLARATIVETOOLKITS Declare and compile a model Repeatedly execute the model in a VM TOOLKIT VM PYTHON SCRIPT RELU CONNV2D BATCHNORM RELU CONNV2D BATCHNORM RELU CONNV2D BATCHNORM RELU CONNV2D BATCHNORM RELU CONNV2D BATCHNORM RELU CONNV2D BATCHNORM RELU CONNV2D BATCHNORM RELU CONNV2D BATCHNORM RELU CONNV2D BATCHNORM RELU CONNV2D BATCHNORM RELU CONNV2D BATCHNORM RELU CONNV2D BATCHNORM

import tensorflow as tf  import numpy as np trX = np.linspace(-1, 1, 101)  trY = 2 * trX + np.random.randn(*trX.shape) * 0.33 X = tf.placeholder("float")  Y = tf.placeholder("float") def model(X, w): return tf.multiply(X, w) w = tf.Variable(0.0, name="weights")  y_model = model(X, w) cost = tf.square(Y - y_model) train_op = tf.train.GradientDescentOptimizer(0.01).minimize(cost) with tf.Session() as sess: tf.global_variables_initializer().run() for i in range(100): for (x, y) in zip(trX, trY): sess.run(train_op, feed_dict={X: x, Y: y}) print(sess.run(w)) DECLARATIVETOOLKITS Computation Graph • Declare a computation • Placeholder variables • Compile it • Run it in a Session

import tensorflow as tf  import numpy as np trX = np.linspace(-1, 1, 101)  trY = 2 * trX + np.random.randn(*trX.shape) * 0.33 X = tf.placeholder("float")  Y = tf.placeholder("float") def model(X, w): return tf.multiply(X, w) w = tf.Variable(0.0, name="weights")  y_model = model(X, w) cost = tf.square(Y - y_model) train_op = tf.train.GradientDescentOptimizer(0.01).minimize(cost) with tf.Session() as sess: tf.global_variables_initializer().run() for i in range(100): for (x, y) in zip(trX, trY): sess.run(train_op, feed_dict={X: x, Y: y}) print(sess.run(w)) X = tf.placeholder("float")  Y = tf.placeholder("float") DECLARATIVETOOLKITS Computation Graph • Declare a computation • Placeholder variables • Compile it • Run it in a Session

import tensorflow as tf  import numpy as np trX = np.linspace(-1, 1, 101)  trY = 2 * trX + np.random.randn(*trX.shape) * 0.33 X = tf.placeholder("float")  Y = tf.placeholder("float") def model(X, w): return tf.multiply(X, w) w = tf.Variable(0.0, name="weights")  y_model = model(X, w) cost = tf.square(Y - y_model) train_op = tf.train.GradientDescentOptimizer(0.01).minimize(cost) with tf.Session() as sess: tf.global_variables_initializer().run() for i in range(100): for (x, y) in zip(trX, trY): sess.run(train_op, feed_dict={X: x, Y: y}) print(sess.run(w)) Model definition def model(X, w): return tf.multiply(X, w) w = tf.Variable(0.0, name="weights")  y_model = model(X, w) cost = tf.square(Y - y_model) train_op = tf.train.GradientDescentOptimizer(0.01).minimize(cost) DECLARATIVETOOLKITS Computation Graph • Declare a computation • Placeholder variables • Compile it • Run it in a Session

import tensorflow as tf  import numpy as np trX = np.linspace(-1, 1, 101)  trY = 2 * trX + np.random.randn(*trX.shape) * 0.33 X = tf.placeholder("float")  Y = tf.placeholder("float") def model(X, w): return tf.multiply(X, w) w = tf.Variable(0.0, name="weights")  y_model = model(X, w) cost = tf.square(Y - y_model) train_op = tf.train.GradientDescentOptimizer(0.01).minimize(cost) with tf.Session() as sess: tf.global_variables_initializer().run() for i in range(100): for (x, y) in zip(trX, trY): sess.run(train_op, feed_dict={X: x, Y: y}) print(sess.run(w)) sess.run(train_op, feed_dict={X: x, Y: y}) print(sess.run(w)) A separate, Turing complete, virtual machine. for i in range(100): for (x, y) in zip(trX, trY): DECLARATIVETOOLKITS Computation Graph • Declare a computation • Placeholder variables • Compile it • Run it in a Session

Imperative Toolkits DEFINE-BY-RUN

IMPERATIVETOOLKITS Run a series of computation Implicitly defining the model as execution goes PYTHON NATIVE RUNTIME PYTHON INSTRUCTIONS RELU CONNV2D BATCHNORM RELU CONNV2D BATCHNORM RELU CONNV2D BATCHNORM RELU CONNV2D BATCHNORM RELU CONNV2D BATCHNORM RELU CONNV2D BATCHNORM RELU CONNV2D BATCHNORM RELU CONNV2D BATCHNORM RELU CONNV2D BATCHNORM RELU CONNV2D BATCHNORM RELU CONNV2D BATCHNORM RELU CONNV2D BATCHNORM

Imperative Toolkits import torch  from torch.autograd import Variable trX = torch.linspace(-1, 1, 101)  trY = 2 * trX + torch.random(*trX.size()) * 0.33 w = Variable(trX.new([0.0]), requires_grad=True) for i in range(100): for (x, y) in zip(trX, trY): X = Variable(x) Y = Variable(y) print(X) print(Y) y_model = X * w.expand_as(X) cost = (Y - y_model) * 2 Cost.backward(torch.ones(*cost.size())) w.data = w.data + 0.01 * w.grad.data print(w) • Deﬁne a model by execution • No separate compilation stage • No separate execution engine

import torch  from torch.autograd import Variable trX = torch.linspace(-1, 1, 101)  trY = 2 * trX + torch.random(*trX.size()) * 0.33 w = Variable(trX.new([0.0]), requires_grad=True) for i in range(100): for (x, y) in zip(trX, trY): X = Variable(x) Y = Variable(y) print(X) print(Y) y_model = X * w.expand_as(X) cost = (Y - y_model) * 2 Cost.backward(torch.ones(*cost.size())) w.data = w.data + 0.01 * w.grad.data print(w) Imperative Toolkits Model constructed and values computed as we deﬁne it. • Deﬁne a model by execution • No separate compilation stage • No separate execution engine

P Y T O R C H F O R E M B E D D E D S T A T E O F T H E S T A T E

H O W D O I R U N P Y T O R C H M O D E L S O N D E V I C E ?

H O W D O I R U N P Y T O R C H M O D E L S O N D E V I C E ? E X P O R T O N N X F O R M A T T E D M O D E L S

H O W D O I R U N P Y T O R C H M O D E L S O N D E V I C E ? E X P O R T O N N X F O R M A T T E D M O D E L S P Y T O R C H M O B I L E

P Y T O R C H O N N X S U P P O R T

A R C H I T E C T U R E   A N D F L O W JIT Tracer and Torchscript ONNX  Exporter Optimizer Torch IR to ONNX IR Translator Torch IR ONNX torch.onnx  .export() PyTorch Model Sample Input ONNX Graph

P Y T O R C H M O B I L E P R E V I E W R E L E A S E

W H A T I S P Y T O R C H M O B I L E ?

W H A T I S P Y T O R C H M O B I L E ? I T ’ S P Y T O R C H

W H A T I S P Y T O R C H M O B I L E ? I T ’ S P Y T O R C H F O R M O B I L E 😃

W H A T I S P Y T O R C H M O B I L E ? I T ’ S P Y T O R C H F O R M O B I L E B U T N O P Y T H O N 😃

W H A T C A N I T R U N ? A N Y T O R C H S C R I P T M O D E L .

W H A T C A N I T R U N ? A N Y T O R C H S C R I P T M O D E L . L O O P S ? Y E S

W H A T C A N I T R U N ? A N Y T O R C H S C R I P T M O D E L . L O O P S ? F U N C T I O N S ? Y E S Y E S

W H A T C A N I T R U N ? A N Y T O R C H S C R I P T M O D E L . L O O P S ? F U N C T I O N S ? T U P L E S ? Y E S Y E S   Y E S

W H A T C A N I T R U N ? A N Y T O R C H S C R I P T M O D E L . L O O P S ? F U N C T I O N S ? T U P L E S ? N A M E D T U P L E ? Y E S Y E S   Y E S Y E S

ANDROID - MAVEN iOS - COCOAPODS MODEL OPTIMIZATION (OPTIONAL ) PY TORCH MOBILE • No separate runtime to export P Y T O R C H 1 . 3 AUTHOR A MODEL IN PYTORCH implementation 'org.pytorch:pytorch_ android:1.3.0' pod ‘LibTorch’ qmodel = quantization.convert(my_mobile_model) torch.jit.script(qmodel).save(“my_mobile_model.pt") C O M I N G S O O N • Build level optimization and selective compilation • Whole program optimization with link time optimization End-to-end workflows for mobile in iOS   and Android: EXPERIMENTAL

QUANTIZATION P Y T O R C H 1 . 3 model = ResNet50() model.load_state_dict(torch.load("model.pt")) qmodel = quantization.prepare( model, {"": quantization.default_qconfig}) qmodel.eval() for batch, target in data_loader: model(batch) qmodel = quantization.convert(qmodel) 4XL E S S M E M O R Y   U S A G E 2-4XS P E E D U P S I N   C O M P U T E EXPERIMENTAL • Neural networks inference is expensive • IoT and mobile devices have limited resources • Quantizing models enables efficient inference at scale

H O W D O I U S E I T ? TorchScript A static, high-performance subset of Python. 1. Prototype your model with PyTorch 2. Control flow is preserved 3. First-class support for lists, dicts, etc. import torch class MyModule(torch.nn.Module): def __init__(self, N, M, state: List[Tensor]): super(MyModule, self).__init__() self.weight = torch.nn.Parameter(torch.rand(N, M)) self.state = state def forward(self, input): self.state.append(input) if input.sum() > 0: output = self.weight.mv(input) else: output = self.weight + input return output # Compile the model code to a static representation my_module = MyModule(3, 4, [torch.rand(3, 4)]) my_script_module = torch.jit.script(my_module) # Save the compiled code and model data # so it can be loaded elsewhere my_script_module.save("my_script_module.pt")

H O W D O I U S E I T ? # Compile the model code to a static representation my_module = MyModule(3, 4, [torch.rand(3, 4)]) my_script_module = torch.jit.script(my_module) # Save the compiled code and model data # so it can be loaded elsewhere my_script_module.save("my_script_module.pt")

H O W D O I U S E I T ? ANDROID iOS implementation 'org.pytorch:pytorch_android:1.3.0' pod 'LibTorch'

H O W D O E S I T W O R K ? ANDROID iOS

H O W D O E S I T W O R K ? ANDROID iOS https://github.com/pytorch/android-demo-app https://github.com/pytorch/ios-demo-app

W H A T ' S H E R E T O D A Y ? Full TorchScript support. Pre-built binary releases in JCenter and CocoaPods. Java bindings. All forward CPU operators. Some optimized float operators (based on Caffe2Go). Some optimized quantized operators (based on QNNPACK w/ XNNPACK WIP).

W H A T ' S C O M I N G U P ? Faster. Smaller. Customized builds. Obj-C/Swift API? Kotlin wrapper? GPU support?? Accelerator support??

"PyTorch Deep Learning Framework: Status and Directions," a Presentation from Facebook

In this document

More Related Content

What's hot

Similar to "PyTorch Deep Learning Framework: Status and Directions," a Presentation from Facebook

More from Edge AI and Vision Alliance

Recently uploaded

"PyTorch Deep Learning Framework: Status and Directions," a Presentation from Facebook