Object Segmentation (D2L7 Insight@DCU Machine Learning Workshop 2017)

Amaia Salvador amaia.salvador@upc.edu PhD Candidate Universitat Politècnica de Catalunya DEEP LEARNING WORKSHOP Dublin City University 27-28 April 2017 Object Segmentation Day 2 Lecture 7

Object Segmentation Define the accurate boundaries of all objects in an image 2

Semantic Segmentation Label every pixel! Don’t differentiate instances (cows) Classic computer vision problem Slide Credit: CS231n 3

Instance Segmentation Detect instances, give category, label pixels “simultaneous detection and segmentation” (SDS) Slide Credit: CS231n 4

Object Segmentation: Datasets Pascal Visual Object Classes 20 Classes ~ 5.000 images Pascal Context 540 Classes ~ 10.000 images 5

Object Segmentation: Datasets SUN RGB-D 19 Classes ~ 10.000 images Microsoft COCO 80 Classes ~ 300.000 images 6

Object Segmentation: Datasets CityScapes 30 Classes ~ 25.000 images ADE20K >150 Classes ~ 22.000 images 7

Semantic Segmentation Slide Credit: CS231n CNN COW Extract patch Run through a CNN Classify center pixel Repeat for every pixel 8

Semantic Segmentation Slide Credit: CS231n CNN Run “fully convolutional” network to get all pixels at once 9

Semantic Segmentation Slide Credit: CS231n CNN Smaller output due to pooling Problem 1: 10

Learnable upsampling Long et al. Fully Convolutional Networks for Semantic Segmentation. CVPR 2015 Learnable upsampling! Slide Credit: CS231n 11

Reminder: Convolutional Layer Slide Credit: CS231n Typical 3 x 3 convolution, stride 1 pad 1 Input: 4 x 4 Output: 4 x 4 12

Reminder: Convolutional Layer Slide Credit: CS231n Typical 3 x 3 convolution, stride 1 pad 1 Input: 4 x 4 Output: 4 x 4 Dot product between filter and input 13

Reminder: Convolutional Layer Slide Credit: CS231n Typical 3 x 3 convolution, stride 2 pad 1 Input: 4 x 4 Output: 2 x 2 15

Learnable Upsample: Deconvolutional Layer Slide Credit: CS231n 3 x 3 “deconvolution”, stride 2 pad 1 Input: 2 x 2 Output: 4 x 4 18

Slide Credit: CS231n 3 x 3 “deconvolution”, stride 2 pad 1 Input: 2 x 2 Output: 4 x 4 Input gives weight for filter values Learnable Upsample: Deconvolutional Layer 19

Learnable Upsample: Deconvolutional Layer Slide Credit: CS231n 3 x 3 “deconvolution”, stride 2 pad 1 Input: 2 x 2 Output: 4 x 4 Input gives weight for filter values Sum where output overlaps 20

Learnable Upsample: Deconvolutional Layer Warning: Checkerboard effect when kernel size is not divisible by the stride Source: distill.pub 21

Learnable Upsample: Deconvolutional Layer Source: distill.pub stride = 2, kernel_size = 3 22 Warning: Checkerboard effect when kernel size is not divisible by the stride

Semantic Segmentation Slide Credit: CS231n Noh et al. Learning Deconvolution Network for Semantic Segmentation. ICCV 2015 “Regular” VGG “Upside down” VGG 23

Better Upsampling: Subpixel Re-arange features in previous convolutional layer to form a higher resolution output Shi et al.Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network.CVPR 2016 24

Semantic Segmentation CNN Blobby-like segmentations Problem 2: High-level features (e.g. conv5 layer) from a pretrained classification network are the input for the segmentation branch 25

Skip Connections Slide Credit: CS231n Skip connections = Better results “skip connections” Long et al. Fully Convolutional Networks for Semantic Segmentation. CVPR 2015 Recovering low level features from early layers 26

Dilated Convolutions Yu & Koltun. Multi-Scale Context Aggregation by Dilated Convolutions. ICLR 2016 Structural change in convolutional layers for dense prediction problems (e.g. image segmentation) ● The receptive field grows exponentially as you add more layers → more context information in deeper layers wrt regular convolutions ● Number of parameters increases linearly as you add more layers 27

Instance Segmentation Detect instances, give category, label pixels “simultaneous detection and segmentation” (SDS) Slide Credit: CS231n 28

Instance Segmentation More challenging than Semantic Segmentation ● Number of objects is variable ● No unique match between predicted and ground truth objects (cannot use instance IDs) Several attack lines: ● Proposal-based methods ● Recurrent Neural Networks 29

Proposal-based Slide Credit: CS231nHariharan et al. Simultaneous Detection and Segmentation. ECCV 2014 External Segment proposals Mask out background with mean image Similar to R-CNN, but with segment proposals 30

Proposal-based Slide Credit: CS231nHariharan et al. Hypercolumns for Object Segmentation and Fine-grained Localization. CVPR 2015 31

Proposal-based Instance Segmentation: MNC Dai et al. Instance-aware Semantic Segmentation via Multi-task Network Cascades. CVPR 2016 Won COCO 2015 challenge (with ResNet) Region proposal network (RPN) Reshape boxes to fixed size, figure / ground logistic regression Mask out background, predict object class Learn entire model end-to-end! Faster R-CNN for Pixel Level Segmentation in a multi-stage cascade strategy 32

Dai et al. Instance-aware Semantic Segmentation via Multi-task Network Cascades. CVPR 2016 Predictions Ground truth Proposal-based Instance Segmentation: MNC 33

He et al. Mask R-CNN. arXiv Mar 2017 Proposal-based Instance Segmentation: Mask R-CNN Faster R-CNN for Pixel Level Segmentation as a parallel prediction of masks and class labels 34

He et al. Mask R-CNN. arXiv Mar 2017 Mask R-CNN ● Classification & box detection losses are identical to those in Faster R-CNN ● Addition of a new loss term for mask prediction: The network outputs a K x m x m volume for mask prediction, where K is the number of categories and m is the size of the mask (square) 35

He et al. Mask R-CNN. arXiv Mar 2017 Mask R-CNN: RoI Align Reminder: RoI Pool from Fast R-CNN Hi-res input image: 3 x 800 x 600 with region proposal Convolution and Pooling Hi-res conv features: C x H x W with region proposal Fully-connected layers Max-pool within each grid cell RoI conv features: C x h x w for region proposal Fully-connected layers expect low-res conv features: C x h x w x/16 & rounding → misalignment ! + not differentiable 36

Jaderberg et al. Spatial Transformer Networks. NIPS 2015 Mask R-CNN: RoI Align Use bilinear interpolation instead of cropping + maxpool 37 Mapping given by box coordinates ( 12 and 21 = 0 translation + scale)

He et al. Mask R-CNN. arXiv Mar 2017 Mask R-CNN Object Detection Instance Segmentation 39

Recurrent Instance Segmentation Romera-Paredes & H.S. Torr. Recurrent Instance Segmentation ECCV 2016 40 Sequential mask generation

Recurrent Instance Segmentation Romera-Paredes & H.S. Torr. Recurrent Instance Segmentation ECCV 2016 41 Mapping between ground truth and predicted masks ?

Recurrent Instance Segmentation: Romera-Paredes & H.S. Torr. Recurrent Instance Segmentation ECCV 2016 Slide Credit: M. Baradad, ReadCV@UPC 1-Compute the IoU for all pairs of Predicted/GT masks Ŷt Yt 42 Coverage Loss

Recurrent Instance Segmentation: Romera-Paredes & H.S. Torr. Recurrent Instance Segmentation ECCV 2016 Slide Credit: M. Baradad, ReadCV@UPC 1-Compute the IoU for all pairs of Predicted/GT masks 0.9 0 0 0.1 0.8 0.1 ... ... ... ... 43 Coverage Loss

Recurrent Instance Segmentation: Romera-Paredes & H.S. Torr. Recurrent Instance Segmentation ECCV 2016 Slide Credit: M. Baradad, ReadCV@UPC 2-Find best matching: Loss: Sum of the Intersections over the union for the best matching (*-1) 44 Coverage Loss

Recurrent Instance Segmentation: Romera-Paredes & H.S. Torr. Recurrent Instance Segmentation ECCV 2016 Slide Credit: M. Baradad, ReadCV@UPC 3-Also take into account the scores s1 = 0.93 s2 = 0.73 s3 = 0.86 s4 = 0.63 s5 = 0.56 Where: is the binary cross entropy: is the Iverson bracket which: Is 1 if the condition is true and 0 else 45 Coverage Loss

Recurrent Instance Segmentation: Romera-Paredes & H.S. Torr. Recurrent Instance Segmentation ECCV 2016 Slide Credit: M. Baradad, ReadCV@UPC 4-Add everything together 46 Coverage Loss

Recurrent Instance Segmentation Romera-Paredes & H.S. Torr. Recurrent Instance Segmentation ECCV 2016 47

Summary Segmentation Datasets Semantic Segmentation Methods ● Deconvolution ● Dilated Convolution ● Skip Connections Instance Segmentation Methods ● Proposal-Based ● Recurrent 48

Object Segmentation (D2L7 Insight@DCU Machine Learning Workshop 2017)

More Related Content

What's hot

Similar to Object Segmentation (D2L7 Insight@DCU Machine Learning Workshop 2017)

More from Universitat Politècnica de Catalunya

Recently uploaded

Object Segmentation (D2L7 Insight@DCU Machine Learning Workshop 2017)