[Update] PyTorch Tutorial for NTU Machine Learing Course 2017

PyTorch Tutorial (Updated) -NTU Machine Learning Course- Lyman Lin 林裕訓 Nov. 03, 2017 lymanblue[at]gmail.com

What is PyTorch? • Developed by Facebook – Python first – Dynamic Neural Network – This tutorial is for PyTorch 0.2.0 • Endorsed by Director of AI at Tesla

Installation • PyTorch Web: http://pytorch.org/

Packages of PyTorch Package Description torch a Tensor library like Numpy, with strong GPU support torch.autograd a tape based automatic differentiation library that supports all differentiable Tensor operations in torch torch.nn a neural networks library deeply integrated with autograd designed for maximum flexibility torch.optim an optimization package to be used with torch.nn with standard optimization methods such as SGD, RMSProp, LBFGS, Adam etc. torch.multiprocessing python multiprocessing, but with magical memory sharing of torch Tensors across processes. Useful for data loading and hogwild training. torch.utils DataLoader, Trainer and other utility functions for convenience torch.legacy(.nn/.optim) legacy code that has been ported over from torch for backward compatibility reasons This Tutorial

Outline • Neural Network in Brief • Concepts of PyTorch • Multi-GPU Processing • RNN • Transfer Learning • Comparison with TensorFlow

Neural Network in Brief • Supervised Learning – Learning a function f, that f(x)=y Data Label X1 Y1 X2 Y2 … … Trying to learn f(.), that f(x)=y

Neural Network in Brief WiData Neural Network Big Data Batch N Batch 1 Batch 2 Batch 3 … 1 Epoch N=Big Data/Batch Size

Neural Network in Brief WiData Label’ Neural Network Forward Big Data Batch N Batch 1 Batch 2 Batch 3 … 1 Epoch Forward Process: from data to label N=Big Data/Batch Size

Neural Network in Brief WiData Label’ Neural Network LabelLoss Forward Big Data Batch N Batch 1 Batch 2 Batch 3 … 1 Epoch Forward Process: from data to label N=Big Data/Batch Size

Neural Network in Brief Wi-> Wi+1Data Label’ Neural Network LabelLoss Optimizer Backward Forward Big Data Batch N Batch 1 Batch 2 Batch 3 … 1 Epoch Forward Process: from data to label Backward Process: update the parameters N=Big Data/Batch Size

Neural Network in Brief Wi-> Wi+1Data Label’ Neural Network LabelLoss Optimizer Backward Forward W Forward Backward Inside the Neural Network W W W W… Data Label’ Gradient

Neural Network in Brief Wi-> Wi+1Data Label’ Neural Network LabelLoss Optimizer Backward Forward W Forward Backward Inside the Neural Network W W W W… Data Gradient Data in the Neural Network - Tensor (n-dim array) - Gradient of Functions Label’

Concepts of PyTorch • Modules of PyTorch Wi-> Wi+1Data Label’ Neural Network LabelLoss Optimizer Backward Forward Data: - Tensor - Variable (for Gradient) Function: - NN Modules - Optimizer - Loss Function - Multi-Processing

Concepts of PyTorch • Modules of PyTorch • Similar to Numpy Data: - Tensor - Variable (for Gradient) Function: - NN Modules - Optimizer - Loss Function - Multi-Processing

Concepts of PyTorch • Modules of PyTorch • Operations – z=x+y – torch.add(x,y, out=z) – y.add_(x) # in-place Data: - Tensor - Variable (for Gradient) Function: - NN Modules - Optimizer - Loss Function - Multi-Processing

Concepts of PyTorch • Modules of PyTorch • Numpy Bridge • To Numpy – a = torch.ones(5) – b = a.numpy() • To Tensor – a = numpy.ones(5) – b = torch.from_numpy(a) Data: - Tensor - Variable (for Gradient) Function: - NN Modules - Optimizer - Loss Function - Multi-Processing

Concepts of PyTorch • Modules of PyTorch • CUDA Tensors • Move to GPU – x = x.cuda() – y = y.cuda() – x+y • Move Net to GPU net = Network() if torch.cuda.is_available(): net.cuda() Data: - Tensor - Variable (for Gradient) Function: - NN Modules - Optimizer - Loss Function - Multi-Processing

Concepts of PyTorch • Modules of PyTorch • Variable Data: - Tensor - Variable (for Gradient) Function: - NN Modules - Optimizer - Loss Function - Multi-Processing Tensor data For Current Backward Process Handled by PyTorch Automatically

Concepts of PyTorch • Modules of PyTorch • Variable • x = Variable(torch.ones(2, 2), requires_grad=True) • print(x) • y = x + 2 • z = y * y * 3 • out = z.mean() • out.backward() • print(x.grad) Data: - Tensor - Variable (for Gradient) Function: - NN Modules - Optimizer - Loss Function - Multi-Processing

• http://pytorch.org/tutorials/beginner/blitz/neural_networks_tutorial.html#define-the-network

• http://pytorch.org/tutorials/beginner/blitz/neural_networks_tutorial.html#define-the-network Define modules (must have) Build network (must have)

• http://pytorch.org/tutorials/beginner/blitz/neural_networks_tutorial.html#define-the-network conv1 x relu pooling conv2 relu pooling fc1 relu fc2 relu fc3 Define modules (must have) Build network (must have) [Channel, H, W]: 1x32x32->6x28x28

• http://pytorch.org/tutorials/beginner/blitz/neural_networks_tutorial.html#define-the-network conv1 x relu pooling conv2 relu pooling fc1 relu fc2 relu fc3 Define modules (must have) Build network (must have) [Channel, H, W]: 6x28x28

• http://pytorch.org/tutorials/beginner/blitz/neural_networks_tutorial.html#define-the-network conv1 x relu pooling conv2 relu pooling fc1 relu fc2 relu fc3 Define modules (must have) Build network (must have) [Channel, H, W]: 6x28x28 -> 6x14x14

• http://pytorch.org/tutorials/beginner/blitz/neural_networks_tutorial.html#define-the-network conv1 x relu pooling conv2 relu pooling fc1 relu fc2 relu fc3 Define modules (must have) Build network (must have) [Channel, H, W]: 16x10x10

• http://pytorch.org/tutorials/beginner/blitz/neural_networks_tutorial.html#define-the-network conv1 x relu pooling conv2 relu pooling fc1 relu fc2 relu fc3 Flatten the Tensor Define modules (must have) Build network (must have) 16x5x5 Tensor: [Batch N, Channel, H, W]

• http://pytorch.org/tutorials/beginner/blitz/neural_networks_tutorial.html#define-the-network conv1 x relu pooling conv2 relu pooling fc1 relu fc2 relu fc3 Define modules (must have) Build network (must have)

Concepts of PyTorch • Modules of PyTorch • NN Modules (torch.nn) – Modules built on Variable – Gradient handled by PyTorch • Common Modules – Convolution layers – Linear layers – Pooling layers – Dropout layers – Etc… Data: - Tensor - Variable (for Gradient) Function: - NN Modules - Optimizer - Loss Function - Multi-Processing

NN Modules • Convolution Layer – N-th Batch (N), Channel (C) – torch.nn.Conv1d: input [N, C, W] # moving kernel in 1D – torch.nn.Conv2d: input [N, C, H, W] # moving kernel in 2D – torch.nn.Conv3d: input [N, C, D, H, W] # moving kernel in 3D – Example: – torch.nn.conv2d(in_channels=3, out_channels=16, kernel_size=3, padding=1)

NN Modules • Convolution Layer – N-th Batch (N), Channel (C) – torch.nn.Conv1d: input [N, C, W] # moving kernel in 1D – torch.nn.Conv2d: input [N, C, H, W] # moving kernel in 2D – torch.nn.Conv3d: input [N, C, D, H, W] # moving kernel in 3D Hin Input for Conv2d Win Cin *: convolution

NN Modules • Convolution Layer – N-th Batch (N), Channel (C) – torch.nn.Conv1d: input [N, C, W] # moving kernel in 1D – torch.nn.Conv2d: input [N, C, H, W] # moving kernel in 2D – torch.nn.Conv3d: input [N, C, D, H, W] # moving kernel in 3D Hin Input for Conv2d k k Win Cin Cin * 1st kernel *: convolution

NN Modules • Convolution Layer – N-th Batch (N), Channel (C) – torch.nn.Conv1d: input [N, C, W] # moving kernel in 1D – torch.nn.Conv2d: input [N, C, H, W] # moving kernel in 2D – torch.nn.Conv3d: input [N, C, D, H, W] # moving kernel in 3D Hin Input for Conv2d k k Win Cin Cin Hout Wout 1 * 1st kernel = *: convolution

NN Modules • Convolution Layer – N-th Batch (N), Channel (C) – torch.nn.Conv1d: input [N, C, W] # moving kernel in 1D – torch.nn.Conv2d: input [N, C, H, W] # moving kernel in 2D – torch.nn.Conv3d: input [N, C, D, H, W] # moving kernel in 3D Hin Input for Conv2d k k Win Cin Cin Hout Wout 1 * 1st kernel = *: convolution k=3d=1 s=1, moving step size p=1

NN Modules • Convolution Layer – N-th Batch (N), Channel (C) – torch.nn.Conv1d: input [N, C, W] # moving kernel in 1D – torch.nn.Conv2d: input [N, C, H, W] # moving kernel in 2D – torch.nn.Conv3d: input [N, C, D, H, W] # moving kernel in 3D Hin Input for Conv2d k k Win Cin Cin Hout Wout 1 * 1st kernel = *: convolution k=3d=1 p=1 p=1 s=1, moving step size

NN Modules • Convolution Layer – N-th Batch (N), Channel (C) – torch.nn.Conv1d: input [N, C, W] # moving kernel in 1D – torch.nn.Conv2d: input [N, C, H, W] # moving kernel in 2D – torch.nn.Conv3d: input [N, C, D, H, W] # moving kernel in 3D Hin Input for Conv2d k k Win Cin Cin Hout Wout 1 * 1st kernel = *: convolution k=3d=1 p=1 k=3 p=1 s=1, moving step size

NN Modules • Convolution Layer – N-th Batch (N), Channel (C) – torch.nn.Conv1d: input [N, C, W] # moving kernel in 1D – torch.nn.Conv2d: input [N, C, H, W] # moving kernel in 2D – torch.nn.Conv3d: input [N, C, D, H, W] # moving kernel in 3D Hin Input for Conv2d k k Win Cin Cin Hout Wout 1 * 1st kernel = *: convolution k=3d=1 p=1 k=3 s=1 p=1 s=1, moving step size

NN Modules • Convolution Layer – N-th Batch (N), Channel (C) – torch.nn.Conv1d: input [N, C, W] # moving kernel in 1D – torch.nn.Conv2d: input [N, C, H, W] # moving kernel in 2D – torch.nn.Conv3d: input [N, C, D, H, W] # moving kernel in 3D Hin Input for Conv2d k k Win Cin Cin Hout Wout 1 * 1st kernel Cout-th kernel k k Cin Hout Wout 1 * = = *: convolution … …

NN Modules • Convolution Layer – N-th Batch (N), Channel (C) – torch.nn.Conv1d: input [N, C, W] # moving kernel in 1D – torch.nn.Conv2d: input [N, C, H, W] # moving kernel in 2D – torch.nn.Conv3d: input [N, C, D, H, W] # moving kernel in 3D Hin Input for Conv2d k k Win Cin Cin Hout Wout 1 * 1st kernel Cout-th kernel k k Cin Hout Wout 1 * = = Hout Wout Cout *: convolution … …

NN Modules • Convolution Layer – N-th Batch (N), Channel (C) – torch.nn.Conv1d: input [N, C, W] # moving kernel in 1D – torch.nn.Conv2d: input [N, C, H, W] # moving kernel in 2D – torch.nn.Conv3d: input [N, C, D, H, W] # moving kernel in 3D Hin Input for Conv2d k k Win Cin Cin * 1st kernel Cout-th kernel k k Cin * *: convolution … # of parameters

NN Modules • Linear Layer – torch.nn.Linear(in_features=3, out_features=5) – y=Ax+b

NN Modules • Dropout Layer – torch.nn.Dropout(p) – Random zeros the input with probability p – Output are scaled by 1/(1-p) If dropout here

NN Modules • Pooling Layer – torch.nn.AvgPool2d(kernel_size=2, stride=2, padding=0) – torch.nn.MaxPool2d(kernel_size=2, stride=2, padding=0) k=2d=1 k=2 p=0 s=2, moving step size s=2, moving step size

Concepts of PyTorch • Modules of PyTorch • Optimizer (torch.optim) – SGD – Adagrad – Adam – RMSprop – … – 9 Optimizers (PyTorch 0.2) • Loss (torch.nn) – L1Loss – MSELoss – CrossEntropy – … – 18 Loss Functions (PyTorch 0.2) Data: - Tensor - Variable (for Gradient) Function: - NN Modules - Optimizer - Loss Function - Multi-Processing

http://pytorch.org/tutorials/beginner/pytorch_with_examples.html#pytorch-optim Define modules (must have) Build network (must have) What We Build?

http://pytorch.org/tutorials/beginner/pytorch_with_examples.html#pytorch-optim Define modules (must have) Build network (must have) What We Build? … … … D_in=1000 H=100 D_out=10 y_pred

http://pytorch.org/tutorials/beginner/pytorch_with_examples.html#pytorch-optim Define modules (must have) Build network (must have) What We Build? … … … D_in=1000 H=100 D_out=10 y_pred Optimizer and Loss Function Construct Our Model Don’t Update y (y are labels here)

http://pytorch.org/tutorials/beginner/pytorch_with_examples.html#pytorch-optim Define modules (must have) Build network (must have) … … … D_in=1000 H=100 D_out=10 y_pred Optimizer and Loss Function Reset Gradient Backward Update Step Construct Our Model What We Build? Don’t Update y (y are labels here)

Concepts of PyTorch • Modules of PyTorch • Basic Method – torch.nn.DataParallel – Recommend by PyTorch • Advanced Methods – torch.multiprocessing – Hogwild (async) Data: - Tensor - Variable (for Gradient) Function: - NN Modules - Optimizer - Loss Function - Multi-Processing

Multi-GPU Processing • torch.nn.DataParallel – gpu_id = '6,7' – os.environ['CUDA_VISIBLE_DEVICES'] = gpu_id – net = torch.nn.DataParallel(model, device_ids=[0, 1]) – output = net(input_var) • Important Notes: – Device_ids must start from 0 – (batch_size/GPU_size) must be integer

Saving Models • First Approach (Recommend by PyTorch) • # save only the model parameters • torch.save(the_model.state_dict(), PATH) • # load only the model parameters • the_model = TheModelClass(*args, **kwargs) • the_model.load_state_dict(torch.load(PATH)) • Second Approach • torch.save(the_model, PATH) # save the entire model • the_model = torch.load(PATH) # load the entire model http://pytorch.org/docs/master/notes/serialization.html#recommended-approach-for-saving-a-model

Recurrent Neural Network (RNN) http://pytorch.org/tutorials/beginner/former_torchies/nn_tutorial.html#example-2-recurrent-net self.i2h input_size=50+20=70 input hidden output

Recurrent Neural Network (RNN) http://pytorch.org/tutorials/beginner/former_torchies/nn_tutorial.html#example-2-recurrent-net self.i2h input_size=50+20=70 input hidden output Same module (i.e. same parameters) among the time

Transfer Learning • Freeze the parameters of original model – requires_grad = False • Then add your own modules http://pytorch.org/docs/master/notes/autograd.html#excluding-subgraphs-from-backward

Comparison with TensorFlow Properties TensorFlow PyTorch Graph Static Dynamic (TensorFlow Fold) Dynamic Ramp-up Time - Win Graph Creation and Debugging - Win Feature Coverage Win Catch up quickly Documentation Tie Tie Serialization Win (support other lang.) - Deployment Win (Cloud & Mobile) - Data Loading - Win Device Management Win Need .cuda() Custom Extensions - Win Summarized from https://awni.github.io/pytorch-tensorflow/

Remind: Platform & Final Project Thank You~!

[Update] PyTorch Tutorial for NTU Machine Learing Course 2017

More Related Content

What's hot

Viewers also liked

Similar to [Update] PyTorch Tutorial for NTU Machine Learing Course 2017

Recently uploaded

[Update] PyTorch Tutorial for NTU Machine Learing Course 2017

Editor's Notes