PyTorch Tutorial (Updated) -NTU Machine Learning Course- Lyman Lin 林裕訓 Nov. 03, 2017 lymanblue[at]gmail.com
What is PyTorch? • Developed by Facebook – Python first – Dynamic Neural Network – This tutorial is for PyTorch 0.2.0 • Endorsed by Director of AI at Tesla
Installation • PyTorch Web: http://pytorch.org/
Packages of PyTorch Package Description torch a Tensor library like Numpy, with strong GPU support torch.autograd a tape based automatic differentiation library that supports all differentiable Tensor operations in torch torch.nn a neural networks library deeply integrated with autograd designed for maximum flexibility torch.optim an optimization package to be used with torch.nn with standard optimization methods such as SGD, RMSProp, LBFGS, Adam etc. torch.multiprocessing python multiprocessing, but with magical memory sharing of torch Tensors across processes. Useful for data loading and hogwild training. torch.utils DataLoader, Trainer and other utility functions for convenience torch.legacy(.nn/.optim) legacy code that has been ported over from torch for backward compatibility reasons This Tutorial
Outline • Neural Network in Brief • Concepts of PyTorch • Multi-GPU Processing • RNN • Transfer Learning • Comparison with TensorFlow
Neural Network in Brief • Supervised Learning – Learning a function f, that f(x)=y Data Label X1 Y1 X2 Y2 … … Trying to learn f(.), that f(x)=y
Neural Network in Brief WiData Neural Network Big Data Batch N Batch 1 Batch 2 Batch 3 … 1 Epoch N=Big Data/Batch Size
Neural Network in Brief WiData Label’ Neural Network Forward Big Data Batch N Batch 1 Batch 2 Batch 3 … 1 Epoch Forward Process: from data to label N=Big Data/Batch Size
Neural Network in Brief WiData Label’ Neural Network LabelLoss Forward Big Data Batch N Batch 1 Batch 2 Batch 3 … 1 Epoch Forward Process: from data to label N=Big Data/Batch Size
Neural Network in Brief Wi-> Wi+1Data Label’ Neural Network LabelLoss Optimizer Backward Forward Big Data Batch N Batch 1 Batch 2 Batch 3 … 1 Epoch Forward Process: from data to label Backward Process: update the parameters N=Big Data/Batch Size
Neural Network in Brief Wi-> Wi+1Data Label’ Neural Network LabelLoss Optimizer Backward Forward W Forward Backward Inside the Neural Network W W W W… Data Label’ Gradient
Neural Network in Brief Wi-> Wi+1Data Label’ Neural Network LabelLoss Optimizer Backward Forward W Forward Backward Inside the Neural Network W W W W… Data Gradient Data in the Neural Network - Tensor (n-dim array) - Gradient of Functions Label’
Concepts of PyTorch • Modules of PyTorch Wi-> Wi+1Data Label’ Neural Network LabelLoss Optimizer Backward Forward Data: - Tensor - Variable (for Gradient) Function: - NN Modules - Optimizer - Loss Function - Multi-Processing
Concepts of PyTorch • Modules of PyTorch • Similar to Numpy Data: - Tensor - Variable (for Gradient) Function: - NN Modules - Optimizer - Loss Function - Multi-Processing
Concepts of PyTorch • Modules of PyTorch • Operations – z=x+y – torch.add(x,y, out=z) – y.add_(x) # in-place Data: - Tensor - Variable (for Gradient) Function: - NN Modules - Optimizer - Loss Function - Multi-Processing
Concepts of PyTorch • Modules of PyTorch • Numpy Bridge • To Numpy – a = torch.ones(5) – b = a.numpy() • To Tensor – a = numpy.ones(5) – b = torch.from_numpy(a) Data: - Tensor - Variable (for Gradient) Function: - NN Modules - Optimizer - Loss Function - Multi-Processing
Concepts of PyTorch • Modules of PyTorch • CUDA Tensors • Move to GPU – x = x.cuda() – y = y.cuda() – x+y • Move Net to GPU net = Network() if torch.cuda.is_available(): net.cuda() Data: - Tensor - Variable (for Gradient) Function: - NN Modules - Optimizer - Loss Function - Multi-Processing
Concepts of PyTorch • Modules of PyTorch Wi-> Wi+1Data Label’ Neural Network LabelLoss Optimizer Backward Forward Data: - Tensor - Variable (for Gradient) Function: - NN Modules - Optimizer - Loss Function - Multi-Processing
Neural Network in Brief Wi-> Wi+1Data Label’ Neural Network LabelLoss Optimizer Backward Forward W Forward Backward Inside the Neural Network W W W W… Data Gradient Data in the Neural Network - Tensor (n-dim array) - Gradient of Functions Label’
Concepts of PyTorch • Modules of PyTorch • Variable Data: - Tensor - Variable (for Gradient) Function: - NN Modules - Optimizer - Loss Function - Multi-Processing Tensor data For Current Backward Process Handled by PyTorch Automatically
Concepts of PyTorch • Modules of PyTorch • Variable • x = Variable(torch.ones(2, 2), requires_grad=True) • print(x) • y = x + 2 • z = y * y * 3 • out = z.mean() • out.backward() • print(x.grad) Data: - Tensor - Variable (for Gradient) Function: - NN Modules - Optimizer - Loss Function - Multi-Processing
• http://pytorch.org/tutorials/beginner/blitz/neural_networks_tutorial.html#define-the-network
• http://pytorch.org/tutorials/beginner/blitz/neural_networks_tutorial.html#define-the-network Define modules (must have) Build network (must have)
• http://pytorch.org/tutorials/beginner/blitz/neural_networks_tutorial.html#define-the-network conv1 x relu pooling conv2 relu pooling fc1 relu fc2 relu fc3 Define modules (must have) Build network (must have) [Channel, H, W]: 1x32x32->6x28x28
• http://pytorch.org/tutorials/beginner/blitz/neural_networks_tutorial.html#define-the-network conv1 x relu pooling conv2 relu pooling fc1 relu fc2 relu fc3 Define modules (must have) Build network (must have) [Channel, H, W]: 6x28x28
• http://pytorch.org/tutorials/beginner/blitz/neural_networks_tutorial.html#define-the-network conv1 x relu pooling conv2 relu pooling fc1 relu fc2 relu fc3 Define modules (must have) Build network (must have) [Channel, H, W]: 6x28x28 -> 6x14x14
• http://pytorch.org/tutorials/beginner/blitz/neural_networks_tutorial.html#define-the-network conv1 x relu pooling conv2 relu pooling fc1 relu fc2 relu fc3 Define modules (must have) Build network (must have) [Channel, H, W]: 6x14x14 -> 16x10x10
• http://pytorch.org/tutorials/beginner/blitz/neural_networks_tutorial.html#define-the-network conv1 x relu pooling conv2 relu pooling fc1 relu fc2 relu fc3 Define modules (must have) Build network (must have) [Channel, H, W]: 16x10x10
• http://pytorch.org/tutorials/beginner/blitz/neural_networks_tutorial.html#define-the-network conv1 x relu pooling conv2 relu pooling fc1 relu fc2 relu fc3 Define modules (must have) Build network (must have) [Channel, H, W]: 16x10x10 -> 16x5x5
• http://pytorch.org/tutorials/beginner/blitz/neural_networks_tutorial.html#define-the-network conv1 x relu pooling conv2 relu pooling fc1 relu fc2 relu fc3 Flatten the Tensor Define modules (must have) Build network (must have) 16x5x5 Tensor: [Batch N, Channel, H, W]
• http://pytorch.org/tutorials/beginner/blitz/neural_networks_tutorial.html#define-the-network conv1 x relu pooling conv2 relu pooling fc1 relu fc2 relu fc3 Define modules (must have) Build network (must have)
• http://pytorch.org/tutorials/beginner/blitz/neural_networks_tutorial.html#define-the-network conv1 x relu pooling conv2 relu pooling fc1 relu fc2 relu fc3 Define modules (must have) Build network (must have)
• http://pytorch.org/tutorials/beginner/blitz/neural_networks_tutorial.html#define-the-network conv1 x relu pooling conv2 relu pooling fc1 relu fc2 relu fc3 Define modules (must have) Build network (must have)
• http://pytorch.org/tutorials/beginner/blitz/neural_networks_tutorial.html#define-the-network conv1 x relu pooling conv2 relu pooling fc1 relu fc2 relu fc3 Define modules (must have) Build network (must have)
• http://pytorch.org/tutorials/beginner/blitz/neural_networks_tutorial.html#define-the-network conv1 x relu pooling conv2 relu pooling fc1 relu fc2 relu fc3 Define modules (must have) Build network (must have)
Concepts of PyTorch • Modules of PyTorch • NN Modules (torch.nn) – Modules built on Variable – Gradient handled by PyTorch • Common Modules – Convolution layers – Linear layers – Pooling layers – Dropout layers – Etc… Data: - Tensor - Variable (for Gradient) Function: - NN Modules - Optimizer - Loss Function - Multi-Processing
NN Modules • Convolution Layer – N-th Batch (N), Channel (C) – torch.nn.Conv1d: input [N, C, W] # moving kernel in 1D – torch.nn.Conv2d: input [N, C, H, W] # moving kernel in 2D – torch.nn.Conv3d: input [N, C, D, H, W] # moving kernel in 3D – Example: – torch.nn.conv2d(in_channels=3, out_channels=16, kernel_size=3, padding=1)
NN Modules • Convolution Layer – N-th Batch (N), Channel (C) – torch.nn.Conv1d: input [N, C, W] # moving kernel in 1D – torch.nn.Conv2d: input [N, C, H, W] # moving kernel in 2D – torch.nn.Conv3d: input [N, C, D, H, W] # moving kernel in 3D Hin Input for Conv2d Win Cin *: convolution
NN Modules • Convolution Layer – N-th Batch (N), Channel (C) – torch.nn.Conv1d: input [N, C, W] # moving kernel in 1D – torch.nn.Conv2d: input [N, C, H, W] # moving kernel in 2D – torch.nn.Conv3d: input [N, C, D, H, W] # moving kernel in 3D Hin Input for Conv2d k k Win Cin Cin * 1st kernel *: convolution
NN Modules • Convolution Layer – N-th Batch (N), Channel (C) – torch.nn.Conv1d: input [N, C, W] # moving kernel in 1D – torch.nn.Conv2d: input [N, C, H, W] # moving kernel in 2D – torch.nn.Conv3d: input [N, C, D, H, W] # moving kernel in 3D Hin Input for Conv2d k k Win Cin Cin Hout Wout 1 * 1st kernel = *: convolution
NN Modules • Convolution Layer – N-th Batch (N), Channel (C) – torch.nn.Conv1d: input [N, C, W] # moving kernel in 1D – torch.nn.Conv2d: input [N, C, H, W] # moving kernel in 2D – torch.nn.Conv3d: input [N, C, D, H, W] # moving kernel in 3D Hin Input for Conv2d k k Win Cin Cin Hout Wout 1 * 1st kernel = *: convolution k=3d=1 s=1, moving step size p=1
NN Modules • Convolution Layer – N-th Batch (N), Channel (C) – torch.nn.Conv1d: input [N, C, W] # moving kernel in 1D – torch.nn.Conv2d: input [N, C, H, W] # moving kernel in 2D – torch.nn.Conv3d: input [N, C, D, H, W] # moving kernel in 3D Hin Input for Conv2d k k Win Cin Cin Hout Wout 1 * 1st kernel = *: convolution k=3d=1 p=1 p=1 s=1, moving step size
NN Modules • Convolution Layer – N-th Batch (N), Channel (C) – torch.nn.Conv1d: input [N, C, W] # moving kernel in 1D – torch.nn.Conv2d: input [N, C, H, W] # moving kernel in 2D – torch.nn.Conv3d: input [N, C, D, H, W] # moving kernel in 3D Hin Input for Conv2d k k Win Cin Cin Hout Wout 1 * 1st kernel = *: convolution k=3d=1 p=1 k=3 p=1 s=1, moving step size
NN Modules • Convolution Layer – N-th Batch (N), Channel (C) – torch.nn.Conv1d: input [N, C, W] # moving kernel in 1D – torch.nn.Conv2d: input [N, C, H, W] # moving kernel in 2D – torch.nn.Conv3d: input [N, C, D, H, W] # moving kernel in 3D Hin Input for Conv2d k k Win Cin Cin Hout Wout 1 * 1st kernel = *: convolution k=3d=1 p=1 k=3 s=1 p=1 s=1, moving step size
NN Modules • Convolution Layer – N-th Batch (N), Channel (C) – torch.nn.Conv1d: input [N, C, W] # moving kernel in 1D – torch.nn.Conv2d: input [N, C, H, W] # moving kernel in 2D – torch.nn.Conv3d: input [N, C, D, H, W] # moving kernel in 3D Hin Input for Conv2d k k Win Cin Cin Hout Wout 1 * 1st kernel Cout-th kernel k k Cin Hout Wout 1 * = = *: convolution … …
NN Modules • Convolution Layer – N-th Batch (N), Channel (C) – torch.nn.Conv1d: input [N, C, W] # moving kernel in 1D – torch.nn.Conv2d: input [N, C, H, W] # moving kernel in 2D – torch.nn.Conv3d: input [N, C, D, H, W] # moving kernel in 3D Hin Input for Conv2d k k Win Cin Cin Hout Wout 1 * 1st kernel Cout-th kernel k k Cin Hout Wout 1 * = = Hout Wout Cout *: convolution … …
NN Modules • Convolution Layer – N-th Batch (N), Channel (C) – torch.nn.Conv1d: input [N, C, W] # moving kernel in 1D – torch.nn.Conv2d: input [N, C, H, W] # moving kernel in 2D – torch.nn.Conv3d: input [N, C, D, H, W] # moving kernel in 3D Hin Input for Conv2d k k Win Cin Cin * 1st kernel Cout-th kernel k k Cin * *: convolution … # of parameters
NN Modules • Linear Layer – torch.nn.Linear(in_features=3, out_features=5) – y=Ax+b
NN Modules • Dropout Layer – torch.nn.Dropout(p) – Random zeros the input with probability p – Output are scaled by 1/(1-p) If dropout here
NN Modules • Pooling Layer – torch.nn.AvgPool2d(kernel_size=2, stride=2, padding=0) – torch.nn.MaxPool2d(kernel_size=2, stride=2, padding=0) k=2d=1 k=2 p=0 s=2, moving step size s=2, moving step size
Concepts of PyTorch • Modules of PyTorch • NN Modules (torch.nn) – Modules built on Variable – Gradient handled by PyTorch • Common Modules – Convolution layers – Linear layers – Pooling layers – Dropout layers – Etc… Data: - Tensor - Variable (for Gradient) Function: - NN Modules - Optimizer - Loss Function - Multi-Processing
Concepts of PyTorch • Modules of PyTorch • Optimizer (torch.optim) – SGD – Adagrad – Adam – RMSprop – … – 9 Optimizers (PyTorch 0.2) • Loss (torch.nn) – L1Loss – MSELoss – CrossEntropy – … – 18 Loss Functions (PyTorch 0.2) Data: - Tensor - Variable (for Gradient) Function: - NN Modules - Optimizer - Loss Function - Multi-Processing
http://pytorch.org/tutorials/beginner/pytorch_with_examples.html#pytorch-optim Define modules (must have) Build network (must have) What We Build?
http://pytorch.org/tutorials/beginner/pytorch_with_examples.html#pytorch-optim Define modules (must have) Build network (must have) What We Build? … … … D_in=1000 H=100 D_out=10 y_pred
http://pytorch.org/tutorials/beginner/pytorch_with_examples.html#pytorch-optim Define modules (must have) Build network (must have) What We Build? … … … D_in=1000 H=100 D_out=10 y_pred Optimizer and Loss Function Construct Our Model Don’t Update y (y are labels here)
http://pytorch.org/tutorials/beginner/pytorch_with_examples.html#pytorch-optim Define modules (must have) Build network (must have) … … … D_in=1000 H=100 D_out=10 y_pred Optimizer and Loss Function Reset Gradient Backward Update Step Construct Our Model What We Build? Don’t Update y (y are labels here)
Concepts of PyTorch • Modules of PyTorch • Basic Method – torch.nn.DataParallel – Recommend by PyTorch • Advanced Methods – torch.multiprocessing – Hogwild (async) Data: - Tensor - Variable (for Gradient) Function: - NN Modules - Optimizer - Loss Function - Multi-Processing
Multi-GPU Processing • torch.nn.DataParallel – gpu_id = '6,7' – os.environ['CUDA_VISIBLE_DEVICES'] = gpu_id – net = torch.nn.DataParallel(model, device_ids=[0, 1]) – output = net(input_var) • Important Notes: – Device_ids must start from 0 – (batch_size/GPU_size) must be integer
Saving Models • First Approach (Recommend by PyTorch) • # save only the model parameters • torch.save(the_model.state_dict(), PATH) • # load only the model parameters • the_model = TheModelClass(*args, **kwargs) • the_model.load_state_dict(torch.load(PATH)) • Second Approach • torch.save(the_model, PATH) # save the entire model • the_model = torch.load(PATH) # load the entire model http://pytorch.org/docs/master/notes/serialization.html#recommended-approach-for-saving-a-model
Recurrent Neural Network (RNN) http://pytorch.org/tutorials/beginner/former_torchies/nn_tutorial.html#example-2-recurrent-net self.i2h input_size=50+20=70 input hidden output
Recurrent Neural Network (RNN) http://pytorch.org/tutorials/beginner/former_torchies/nn_tutorial.html#example-2-recurrent-net self.i2h input_size=50+20=70 input hidden output Same module (i.e. same parameters) among the time
Transfer Learning • Freeze the parameters of original model – requires_grad = False • Then add your own modules http://pytorch.org/docs/master/notes/autograd.html#excluding-subgraphs-from-backward
Comparison with TensorFlow Properties TensorFlow PyTorch Graph Static Dynamic (TensorFlow Fold) Dynamic Ramp-up Time - Win Graph Creation and Debugging - Win Feature Coverage Win Catch up quickly Documentation Tie Tie Serialization Win (support other lang.) - Deployment Win (Cloud & Mobile) - Data Loading - Win Device Management Win Need .cuda() Custom Extensions - Win Summarized from https://awni.github.io/pytorch-tensorflow/
Remind: Platform & Final Project Thank You~!

[Update] PyTorch Tutorial for NTU Machine Learing Course 2017

Editor's Notes

  • #63 http://pytorch.org/docs/master/notes/autograd.html