1 Object Detection By Usman Qayyum 4, Dec, 2018
Talk Covers Three Papers (Object Detection -> Embedded Computing) 2 SqueezeNet-2016SSD-2016 TinySSD-2018 =+
Image Classification/Object Detection ● Autonomous vehicles, smart video surveillance, facial detection and various applications, fast and robust object detection is need of an hour ● Nonly recognizing and classifying every object in an image, but localizing each one by drawing the appropriate bounding box around it. 3
CNN Migration (Image Classification) 4
Object Detection as Classification CNN deer? cat? background?
Object Detection as Classification CNN deer? cat? background?
Object Detection as Classification CNN deer? cat? background?
Object Detection as Classification with Sliding Window CNN deer? cat? background?
Object Detection as Classification with Box Proposals
Box Proposal Method : Selective Search Segmentation As Selective Search for Object Recognition. van de Sande et al. ICCV 2011
Idea behind Object Detectors ● Box Proposals ● Classifier Algorithm 11
RCNN Rich feature hierarchies for accurate object detection and semantic segmentation. Girshick et al. CVPR 2014. https://people.eecs.berkeley.edu/~rbg/papers/r-cnn-cvpr.pdf
Fast-RCNN Fast R-CNN. Girshick. ICCV 2015. https://arxiv.org/abs/1504.08083 Idea: No need to recompute features for every box independently, Regress refined bounding box coordinates.
Faster-RCNN Ren et al. NIPS 2015. https://arxiv.org/abs/1506.01497 Idea: Integrate the Bounding Box Propos als as part of the CNN predictions
YOLO- You Only Look Once ● Single Shot Detector Redmon et al. CVPR 2016. https://arxiv.org/abs/1506.02640 Idea: No bounding box proposals. Predict a class and a box for every location in a grid.
SSD: Single Shot Detector Liu et al. ECCV 2016. Idea: Similar to YOLO, but denser grid map, multiscale grid maps. + Data augm entation + Hard negative mining + Other design choices in the network.
-The overall objective loss function is a weighted sum of the localization loss and the confidence loss(conf) N: the number of matched default boxes l: predicted boxes g: the ground truth box x=1 denotes some certain default box is matched to a ground truth box17 1 ( , , , ) ( ( , ) ( , , ))conf locL x c l g L x c L x l g N   SSD: Single Shot Detector
Performance 18
Accuracy Vs Computation 19
AI Workload Migration Embedded (Mobile/Edge) Server/Clou d Execution/Inference Training Execution/Inference Intelligence & Analytics Key Use Cases Vision | Audio | Security Benefits Low Latency | Privacy
AI in Embedded Devices 21
How ? (AI in Embedded Devices) Pruning Quantization22
SqueezeNet (Parameter Reduction) ● Strategy 1. Replace 3x3 filters with 1x1 filters ○ Parameters per filter: (3x3 filter) = 9 * (1x1 filter) ● Strategy 2. Decrease the number of input channels to 3x3 filters ○ Total # of parameters: (# of input channels) * (# of filters) * ( # of parameters per filter) ● Strategy 3. Downsample late in the network so that convolution layers have large activation maps ○ Size of activation maps: the size of input data, the choice of layers in which to downsample in the CNN architecture 23 Iandola, Forrest N., et al. "SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 0.5 MB model size."
Strategy#1 Conv1x1 or Kernel Reduction 24
Microarchitecture – Fire Module 25 Squeeze Layer Set s1x1 < (e1x1 + e3x3), limits the # of input channels to 3*3 filters Strategy 2. Decrease the number of input channels to 3x3 filters Total # of parameters: (# of input channels) * (# of filters) * ( # of parameters per filter) How much can we limit s1x1? Strategy 1. Replace 3*3 filters with 1*1 filters Parameters per filter: (3*3 filter) = 9 * (1*1 filter) How much can we replace 3*3 with 1*1? (e1x1 vs e3x3 )?
Expand ● In the "expand" modules, what are the tradeoffs when we turn the knob between mostly 1x1 and mostly 3x3 filters? ● Hypothesis: if having more weights leads to higher accuracy, then having all 3x3 filters should give the highest accuracy 27
28
Macroarchitecture 29 Strategy 3. Downsample late in the network so that convolution layers have large activation maps Size of activation maps: the size of input data, the choice of layers in which to downsample in the CNN architecture
Performance 30
TinySSD (SSD with Microarchitecture) 31
Thanks for your attention. 32

Object Detection using Deep Neural Networks

  • 1.
    1 Object Detection By UsmanQayyum 4, Dec, 2018
  • 2.
    Talk Covers ThreePapers (Object Detection -> Embedded Computing) 2 SqueezeNet-2016SSD-2016 TinySSD-2018 =+
  • 3.
    Image Classification/Object Detection ●Autonomous vehicles, smart video surveillance, facial detection and various applications, fast and robust object detection is need of an hour ● Nonly recognizing and classifying every object in an image, but localizing each one by drawing the appropriate bounding box around it. 3
  • 4.
    CNN Migration (ImageClassification) 4
  • 5.
    Object Detection asClassification CNN deer? cat? background?
  • 6.
    Object Detection asClassification CNN deer? cat? background?
  • 7.
    Object Detection asClassification CNN deer? cat? background?
  • 8.
    Object Detection asClassification with Sliding Window CNN deer? cat? background?
  • 9.
    Object Detection asClassification with Box Proposals
  • 10.
    Box Proposal Method: Selective Search Segmentation As Selective Search for Object Recognition. van de Sande et al. ICCV 2011
  • 11.
    Idea behind ObjectDetectors ● Box Proposals ● Classifier Algorithm 11
  • 12.
    RCNN Rich feature hierarchiesfor accurate object detection and semantic segmentation. Girshick et al. CVPR 2014. https://people.eecs.berkeley.edu/~rbg/papers/r-cnn-cvpr.pdf
  • 13.
    Fast-RCNN Fast R-CNN. Girshick.ICCV 2015. https://arxiv.org/abs/1504.08083 Idea: No need to recompute features for every box independently, Regress refined bounding box coordinates.
  • 14.
    Faster-RCNN Ren et al.NIPS 2015. https://arxiv.org/abs/1506.01497 Idea: Integrate the Bounding Box Propos als as part of the CNN predictions
  • 15.
    YOLO- You OnlyLook Once ● Single Shot Detector Redmon et al. CVPR 2016. https://arxiv.org/abs/1506.02640 Idea: No bounding box proposals. Predict a class and a box for every location in a grid.
  • 16.
    SSD: Single ShotDetector Liu et al. ECCV 2016. Idea: Similar to YOLO, but denser grid map, multiscale grid maps. + Data augm entation + Hard negative mining + Other design choices in the network.
  • 17.
    -The overall objectiveloss function is a weighted sum of the localization loss and the confidence loss(conf) N: the number of matched default boxes l: predicted boxes g: the ground truth box x=1 denotes some certain default box is matched to a ground truth box17 1 ( , , , ) ( ( , ) ( , , ))conf locL x c l g L x c L x l g N   SSD: Single Shot Detector
  • 18.
  • 19.
  • 20.
    AI Workload Migration Embedded (Mobile/Edge) Server/Clou d Execution/Inference Training Execution/Inference Intelligence& Analytics Key Use Cases Vision | Audio | Security Benefits Low Latency | Privacy
  • 21.
    AI in EmbeddedDevices 21
  • 22.
    How ? (AIin Embedded Devices) Pruning Quantization22
  • 23.
    SqueezeNet (Parameter Reduction) ●Strategy 1. Replace 3x3 filters with 1x1 filters ○ Parameters per filter: (3x3 filter) = 9 * (1x1 filter) ● Strategy 2. Decrease the number of input channels to 3x3 filters ○ Total # of parameters: (# of input channels) * (# of filters) * ( # of parameters per filter) ● Strategy 3. Downsample late in the network so that convolution layers have large activation maps ○ Size of activation maps: the size of input data, the choice of layers in which to downsample in the CNN architecture 23 Iandola, Forrest N., et al. "SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 0.5 MB model size."
  • 24.
    Strategy#1 Conv1x1 orKernel Reduction 24
  • 25.
    Microarchitecture – FireModule 25 Squeeze Layer Set s1x1 < (e1x1 + e3x3), limits the # of input channels to 3*3 filters Strategy 2. Decrease the number of input channels to 3x3 filters Total # of parameters: (# of input channels) * (# of filters) * ( # of parameters per filter) How much can we limit s1x1? Strategy 1. Replace 3*3 filters with 1*1 filters Parameters per filter: (3*3 filter) = 9 * (1*1 filter) How much can we replace 3*3 with 1*1? (e1x1 vs e3x3 )?
  • 26.
    Expand ● In the"expand" modules, what are the tradeoffs when we turn the knob between mostly 1x1 and mostly 3x3 filters? ● Hypothesis: if having more weights leads to higher accuracy, then having all 3x3 filters should give the highest accuracy 27
  • 27.
  • 28.
    Macroarchitecture 29 Strategy 3. Downsamplelate in the network so that convolution layers have large activation maps Size of activation maps: the size of input data, the choice of layers in which to downsample in the CNN architecture
  • 29.
  • 30.
    TinySSD (SSD withMicroarchitecture) 31
  • 31.
    Thanks for yourattention. 32