"PyTorch Deep Learning Framework: Status and Directions," a Presentation from Facebook
The document provides an overview of PyTorch, focusing on mobile integration and the evolution of deploying models from research to production. It outlines key principles such as dynamic vs. static frameworks, the benefits of PyTorch's eager execution model, and the usability of TorchScript for optimizing Python models without Python dependencies. Additionally, it highlights efforts in model quantization and optimization for mobile devices, aiming for efficient inference in constrained environments.
` T O RC H S C R I P T Models are Python TorchScript programs, an optimizable subset of Python + Same “models are programs” idea + Production deployment + No Python dependency + Compilation for performance optimization class RNN(nn.Module): def __init__(self, W_h, U_h, W_y, b_h, b_y): super(RNN, self).__init__() self.W_h = nn.Parameter(W_h) self.U_h = nn.Parameter(U_h) self.W_y = nn.Parameter(W_y) self.b_h = nn.Parameter(b_h) self.b_y = nn.Parameter(b_y) def forward(self, x, h): y = [] for t in range(x.size(0)): h = torch.tanh(x[t] @ self.W_h + h @ self.U_h + self.b_h) y += [torch.tanh(h @ self.W_y + self.b_y)] if t % 10 == 0: print("stats: ", h.mean(), h.var()) return torch.stack(y), h # one annotation! script_rnn = torch.jit.script(RNN(W_h, U_h, W_y, b_h, b_y))
13.
` T O RC H S C R I P T Models are Python TorchScript programs, an optimizable subset of Python + Same “models are programs” idea + Prod deployment + No Python dependency + Optimizable (incl. codegen!)
~1,230C O NT R I B U T O R S 50%+Y O Y G R O W T H 23KP Y T O R C H F O R U M U S E R S
16.
GROW TH INARXIV MENTIONS IN RESEARCH PAPERS 0 100 200 300 400 500 Jan 17 Feb 17 M ar17 Apr17 M ay17 Jun 17 Jul17 Aug17 Sep 17 Jan 18 Feb 18 M ar18 Apr18 M ay18 Jun 18 Jul18 Aug18 Sep 18 Jan 19 Feb 19 M ar19 Apr19 M ay19 Jun 19 Jul19
17.
F R AM E W O R K S D Y N A M I C V S . S T A T I C
DECLARATIVETOOLKITS Declare and compilea model Repeatedly execute the model in a VM TOOLKIT VM PYTHON SCRIPT RELU CONNV2D BATCHNORM RELU CONNV2D BATCHNORM RELU CONNV2D BATCHNORM RELU CONNV2D BATCHNORM RELU CONNV2D BATCHNORM RELU CONNV2D BATCHNORM RELU CONNV2D BATCHNORM RELU CONNV2D BATCHNORM RELU CONNV2D BATCHNORM RELU CONNV2D BATCHNORM RELU CONNV2D BATCHNORM RELU CONNV2D BATCHNORM
20.
import tensorflow astf import numpy as np trX = np.linspace(-1, 1, 101) trY = 2 * trX + np.random.randn(*trX.shape) * 0.33 X = tf.placeholder("float") Y = tf.placeholder("float") def model(X, w): return tf.multiply(X, w) w = tf.Variable(0.0, name="weights") y_model = model(X, w) cost = tf.square(Y - y_model) train_op = tf.train.GradientDescentOptimizer(0.01).minimize(cost) with tf.Session() as sess: tf.global_variables_initializer().run() for i in range(100): for (x, y) in zip(trX, trY): sess.run(train_op, feed_dict={X: x, Y: y}) print(sess.run(w)) DECLARATIVETOOLKITS Computation Graph • Declare a computation • Placeholder variables • Compile it • Run it in a Session
21.
import tensorflow astf import numpy as np trX = np.linspace(-1, 1, 101) trY = 2 * trX + np.random.randn(*trX.shape) * 0.33 X = tf.placeholder("float") Y = tf.placeholder("float") def model(X, w): return tf.multiply(X, w) w = tf.Variable(0.0, name="weights") y_model = model(X, w) cost = tf.square(Y - y_model) train_op = tf.train.GradientDescentOptimizer(0.01).minimize(cost) with tf.Session() as sess: tf.global_variables_initializer().run() for i in range(100): for (x, y) in zip(trX, trY): sess.run(train_op, feed_dict={X: x, Y: y}) print(sess.run(w)) X = tf.placeholder("float") Y = tf.placeholder("float") DECLARATIVETOOLKITS Computation Graph • Declare a computation • Placeholder variables • Compile it • Run it in a Session
22.
import tensorflow astf import numpy as np trX = np.linspace(-1, 1, 101) trY = 2 * trX + np.random.randn(*trX.shape) * 0.33 X = tf.placeholder("float") Y = tf.placeholder("float") def model(X, w): return tf.multiply(X, w) w = tf.Variable(0.0, name="weights") y_model = model(X, w) cost = tf.square(Y - y_model) train_op = tf.train.GradientDescentOptimizer(0.01).minimize(cost) with tf.Session() as sess: tf.global_variables_initializer().run() for i in range(100): for (x, y) in zip(trX, trY): sess.run(train_op, feed_dict={X: x, Y: y}) print(sess.run(w)) Model definition def model(X, w): return tf.multiply(X, w) w = tf.Variable(0.0, name="weights") y_model = model(X, w) cost = tf.square(Y - y_model) train_op = tf.train.GradientDescentOptimizer(0.01).minimize(cost) DECLARATIVETOOLKITS Computation Graph • Declare a computation • Placeholder variables • Compile it • Run it in a Session
23.
import tensorflow astf import numpy as np trX = np.linspace(-1, 1, 101) trY = 2 * trX + np.random.randn(*trX.shape) * 0.33 X = tf.placeholder("float") Y = tf.placeholder("float") def model(X, w): return tf.multiply(X, w) w = tf.Variable(0.0, name="weights") y_model = model(X, w) cost = tf.square(Y - y_model) train_op = tf.train.GradientDescentOptimizer(0.01).minimize(cost) with tf.Session() as sess: tf.global_variables_initializer().run() for i in range(100): for (x, y) in zip(trX, trY): sess.run(train_op, feed_dict={X: x, Y: y}) print(sess.run(w)) sess.run(train_op, feed_dict={X: x, Y: y}) print(sess.run(w)) A separate, Turing complete, virtual machine. for i in range(100): for (x, y) in zip(trX, trY): DECLARATIVETOOLKITS Computation Graph • Declare a computation • Placeholder variables • Compile it • Run it in a Session
Imperative Toolkits import torch fromtorch.autograd import Variable trX = torch.linspace(-1, 1, 101) trY = 2 * trX + torch.random(*trX.size()) * 0.33 w = Variable(trX.new([0.0]), requires_grad=True) for i in range(100): for (x, y) in zip(trX, trY): X = Variable(x) Y = Variable(y) print(X) print(Y) y_model = X * w.expand_as(X) cost = (Y - y_model) * 2 Cost.backward(torch.ones(*cost.size())) w.data = w.data + 0.01 * w.grad.data print(w) • Define a model by execution • No separate compilation stage • No separate execution engine
27.
import torch from torch.autogradimport Variable trX = torch.linspace(-1, 1, 101) trY = 2 * trX + torch.random(*trX.size()) * 0.33 w = Variable(trX.new([0.0]), requires_grad=True) for i in range(100): for (x, y) in zip(trX, trY): X = Variable(x) Y = Variable(y) print(X) print(Y) y_model = X * w.expand_as(X) cost = (Y - y_model) * 2 Cost.backward(torch.ones(*cost.size())) w.data = w.data + 0.01 * w.grad.data print(w) Imperative Toolkits Model constructed and values computed as we define it. • Define a model by execution • No separate compilation stage • No separate execution engine
28.
P Y TO R C H F O R E M B E D D E D S T A T E O F T H E S T A T E
29.
H O WD O I R U N P Y T O R C H M O D E L S O N D E V I C E ?
30.
H O WD O I R U N P Y T O R C H M O D E L S O N D E V I C E ? E X P O R T O N N X F O R M A T T E D M O D E L S
31.
H O WD O I R U N P Y T O R C H M O D E L S O N D E V I C E ? E X P O R T O N N X F O R M A T T E D M O D E L S P Y T O R C H M O B I L E
A R CH I T E C T U R E A N D F L O W JIT Tracer and Torchscript ONNX Exporter Optimizer Torch IR to ONNX IR Translator Torch IR ONNX torch.onnx .export() PyTorch Model Sample Input ONNX Graph
W H AT I S P Y T O R C H M O B I L E ? I T ’ S P Y T O R C H
39.
W H AT I S P Y T O R C H M O B I L E ? I T ’ S P Y T O R C H F O R M O B I L E 😃
40.
W H AT I S P Y T O R C H M O B I L E ? I T ’ S P Y T O R C H F O R M O B I L E B U T N O P Y T H O N 😃
41.
W H AT C A N I T R U N ? A N Y T O R C H S C R I P T M O D E L .
42.
W H AT C A N I T R U N ? A N Y T O R C H S C R I P T M O D E L . L O O P S ? Y E S
43.
W H AT C A N I T R U N ? A N Y T O R C H S C R I P T M O D E L . L O O P S ? F U N C T I O N S ? Y E S Y E S
44.
W H AT C A N I T R U N ? A N Y T O R C H S C R I P T M O D E L . L O O P S ? F U N C T I O N S ? T U P L E S ? Y E S Y E S Y E S
45.
W H AT C A N I T R U N ? A N Y T O R C H S C R I P T M O D E L . L O O P S ? F U N C T I O N S ? T U P L E S ? N A M E D T U P L E ? Y E S Y E S Y E S Y E S
46.
ANDROID - MAVENiOS - COCOAPODS MODEL OPTIMIZATION (OPTIONAL ) PY TORCH MOBILE • No separate runtime to export P Y T O R C H 1 . 3 AUTHOR A MODEL IN PYTORCH implementation 'org.pytorch:pytorch_ android:1.3.0' pod ‘LibTorch’ qmodel = quantization.convert(my_mobile_model) torch.jit.script(qmodel).save(“my_mobile_model.pt") C O M I N G S O O N • Build level optimization and selective compilation • Whole program optimization with link time optimization End-to-end workflows for mobile in iOS and Android: EXPERIMENTAL
47.
QUANTIZATION P Y TO R C H 1 . 3 model = ResNet50() model.load_state_dict(torch.load("model.pt")) qmodel = quantization.prepare( model, {"": quantization.default_qconfig}) qmodel.eval() for batch, target in data_loader: model(batch) qmodel = quantization.convert(qmodel) 4XL E S S M E M O R Y U S A G E 2-4XS P E E D U P S I N C O M P U T E EXPERIMENTAL • Neural networks inference is expensive • IoT and mobile devices have limited resources • Quantizing models enables efficient inference at scale
H O WD O I U S E I T ? TorchScript A static, high-performance subset of Python. 1. Prototype your model with PyTorch 2. Control flow is preserved 3. First-class support for lists, dicts, etc. import torch class MyModule(torch.nn.Module): def __init__(self, N, M, state: List[Tensor]): super(MyModule, self).__init__() self.weight = torch.nn.Parameter(torch.rand(N, M)) self.state = state def forward(self, input): self.state.append(input) if input.sum() > 0: output = self.weight.mv(input) else: output = self.weight + input return output # Compile the model code to a static representation my_module = MyModule(3, 4, [torch.rand(3, 4)]) my_script_module = torch.jit.script(my_module) # Save the compiled code and model data # so it can be loaded elsewhere my_script_module.save("my_script_module.pt")
50.
H O WD O I U S E I T ? TorchScript A static, high-performance subset of Python. 1. Prototype your model with PyTorch 2. Control flow is preserved 3. First-class support for lists, dicts, etc. import torch class MyModule(torch.nn.Module): def __init__(self, N, M, state: List[Tensor]): super(MyModule, self).__init__() self.weight = torch.nn.Parameter(torch.rand(N, M)) self.state = state def forward(self, input): self.state.append(input) if input.sum() > 0: output = self.weight.mv(input) else: output = self.weight + input return output # Compile the model code to a static representation my_module = MyModule(3, 4, [torch.rand(3, 4)]) my_script_module = torch.jit.script(my_module) # Save the compiled code and model data # so it can be loaded elsewhere my_script_module.save("my_script_module.pt")
51.
H O WD O I U S E I T ? # Compile the model code to a static representation my_module = MyModule(3, 4, [torch.rand(3, 4)]) my_script_module = torch.jit.script(my_module) # Save the compiled code and model data # so it can be loaded elsewhere my_script_module.save("my_script_module.pt")
52.
H O WD O I U S E I T ? ANDROID iOS implementation 'org.pytorch:pytorch_android:1.3.0' pod 'LibTorch'
H O WD O E S I T W O R K ? ANDROID iOS https://github.com/pytorch/android-demo-app https://github.com/pytorch/ios-demo-app
55.
W H AT ' S H E R E T O D A Y ? Full TorchScript support. Pre-built binary releases in JCenter and CocoaPods. Java bindings. All forward CPU operators. Some optimized float operators (based on Caffe2Go). Some optimized quantized operators (based on QNNPACK w/ XNNPACK WIP).
56.
W H AT ' S C O M I N G U P ? Faster. Smaller. Customized builds. Obj-C/Swift API? Kotlin wrapper? GPU support?? Accelerator support??