0% found this document useful (0 votes)

344 views90 pages

Advanced Object Detection Guide

The document discusses various computer vision tasks and object detection algorithms including RCNN, Fast RCNN, Faster RCNN, YOLO, and mAP evaluation. RCNN was one of the early object detection algorithms that was slow due to running CNN thousands of times per image. Fast RCNN and Faster RCNN improved speed by running CNN once per image and using region proposals. YOLO further improved speed by running CNN on the full image and dividing it into a grid for predictions. mAP is used to evaluate object detection and involves calculating precision and recall based on a confusion matrix using IoU thresholds.

Uploaded by

Arooj Fatima

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

344 views90 pages

Advanced Object Detection Guide

Uploaded by

Arooj Fatima

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 90

Object Detection

Object Detection
✓ boat
✓ person

Image Classification Object Detection

(what?) (what + where?)
Computer vision Tasks
Comparing Boxes: Intersection over Union
(IoU)
Comparing Boxes: Intersection over Union
(IoU)
Region-based Convolutional Neural Network (RCNN )
• Instead of working on a massive number of regions, the RCNN algorithm proposes
a bunch of boxes in the image and checks if any of these boxes contain any object.
RCNN uses selective search to extract these boxes from an image (these boxes are called
regions).

First Selective Search • Then combines the similar

• It first takes an image as input • Then, it generates initial sub-
regions to form a larger region
segmentations so that we
(based on color similarity, texture
have multiple regions from
similarity, size similarity, and
this image:
shape compatibility)
RCNN- PROBLEMS
• Extracting 2,000 regions for each image based on a selective search

• Extracting features using CNN for every image region. Suppose we have N images,
then the number of CNN features will be N*2,000

• The entire process of object detection using RCNN has three models:

• CNN for feature extraction

• Linear SVM classifier for identifying objects.

• Regression model for tightening the bounding boxes.

• All these processes combine to make RCNN very slow. It takes around 40-50 seconds to
make predictions for each new image
RCNN-PROBLEM

• As CNN is followed by fully connected layers which can accept

input of fixed size.

• This makes CNN incapable of accepting varied size inputs. Thus,

images are first reshaped into some specific dimension before
feeding into CNN.

• This creates another issue of image warping and reduced

resolution. Spatial Pyramid pooling comes as a counter to this
problem.
What’s wrong with SPP-net?

• Training is still Slow( though better).

• Introduces a new problem: cannot update parameters below SPP layer during
training.
FAST-RCNN
• Instead of running a CNN 2,000 times per image, we can run it just once per image and get
all the regions of interest (regions containing some object).

• In Fast RCNN, we feed the input image to the CNN, which in turn generates the
convolutional feature maps.

• Using these maps, the regions of proposals are extracted.

• We then use a RoI pooling layer to reshape all the proposed regions into a fixed size, so that
it can be fed into a fully connected network.

• A SoftMax layer is used on top of the fully connected network to output classes. Along with
the SoftMax layer, a linear regression layer is also used parallelly to output bounding box
coordinates for predicted classes.
Cropping Features: RoI Pool
Model takes an image input of size 512x512x3 (width x height x RGB) and VGG16 is
mapping it into a 16x16x512 feature map.

Note that the Output’s width and height are exactly 32 times smaller than the input
image (512/32 = 16). That’s important because all RoIs must be scaled down by this
factor.
Cropping Features: RoI Pool

• Its original size is 145x200 and the top left corner is

set to be in (192x296). As you could probably tell,
we’re not able to divide most of those numbers by 32.

• width: 200/32 = 6.25

• height: 145/32 = ~4.53

• x: 296/32 = 9.25

• y: 192/32 = 6
Cropping Features: RoI Pool
After RoI Pooling Layer there is a Fully Connected
layer with a fixed size. Because our Roi's have different
sizes we have to pool them into the same size
(3x3x512 in our example). At this moment our mapped
RoI is a size of 4x6x512 and as you can imagine
we cannot divide 4 by 3.
Problems with Fast RCNN
• Fast RCNN has certain problem areas.

• It also uses selective search as a

proposed method to find the Regions
of Interest, which is a slow and time-
consuming process.

• It takes around 2 seconds per image

to detect objects, which is much
better compared to RCNN.

• But when we consider large real-life

datasets, then even a Fast RCNN
doesn’t look so fast anymore.
Faster-RCNN
• Faster RCNN is the modified version of Fast RCNN. The major difference
between them is that Fast RCNN uses the selective search for generating
Regions of Interest.
• We extract a descriptor
per location
YOLO (You Only Look Once!)
• YOLO is a real-time object detection algorithm. It was developed by Joseph
Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi at the University of
Washington (2015).

• Yolo is extremely fast because it passes the entire image at once into a CNN,
rather than making predictions on many individual regions of the image.

• The key idea behind YOLO is to use a single neural network to predict the
bounding boxes and class probabilities for objects in an image
YOLO (You Only Look Once!)
• YOLO divides the input image into a grid of cells and predicts the presence of objects in
each cell.
• If an object is detected in a cell, the algorithm also predicts the bounding box and the
class for the object.
• The bounding box coordinates and class probabilities are then used to localize and
classify the objects.

Each object in training

image is assigned to
grid cell that contains
that object’s midpoint.
YOLO –Anchor Boxes
• One of the Caveats of YOLO is that it can’t detect multiple objects in same grid.
• Solution: Anchor boxes. It is a predefined bounding box used in object detection
algorithms.
• The anchor box is used to define the size and aspect ratio of the window, and it is defined
prior to training the object detection model. The model is then trained to predict the
bounding box coordinates and class probabilities for objects relative to the anchor box.
Each object in training Per grid target label
image is assigned to
grid cell that contains
object’s midpoint and
anchor box for the grid
cell with highest IoU.
Putting it together: YOLO algorithm

Two anchor boxes used

Outputting the non-max suppressed outputs
Detection evaluation
mAP formula is based on the following sub metrics:
• Confusion Matrix,
• Intersection over Union(IoU),
• Recall,
• Precision

Confusion Matrix
• To create a confusion matrix, we need four attributes:
• True Positives (TP): The model predicted a label and matches correctly as per ground
truth.
• True Negatives (TN): The model does not predict the label and is not a part of the
ground truth.
• False Positives (FP): The model predicted a label, but it is not a part of the ground
truth.
• False Negatives (FN): The model does not predict a label, but it is part of the ground
truth.
Confusion Matrix
Detection evaluation

• Precision measures how many of the “positive” predictions

made by the model were correct.

• Recall measures how many of the positive class samples

present in the dataset were correctly identified by the model.
• Precision and recall offer a trade-off, i.e., one metric comes
at the cost of another.
mAP
• The mAP is calculated by finding Average Precision(AP) for each class and then average over a
number of classes.

Object Detection
No ratings yet
Object Detection
57 pages
YOLO: Real-Time Object Detection
No ratings yet
YOLO: Real-Time Object Detection
60 pages
YOLO V3 ML Project
No ratings yet
YOLO V3 ML Project
15 pages
General Framework For Object Detection
No ratings yet
General Framework For Object Detection
9 pages
An Analysis of Convolutional Neural Network Architectures
No ratings yet
An Analysis of Convolutional Neural Network Architectures
54 pages
Project
100% (1)
Project
30 pages
Deep Learning With Tensorflow
No ratings yet
Deep Learning With Tensorflow
50 pages
Deep Learning
No ratings yet
Deep Learning
127 pages
Object Detection - Week 1 - Object Detection in 20 Years - Final
No ratings yet
Object Detection - Week 1 - Object Detection in 20 Years - Final
280 pages
Understanding of Convolutional Neural Network (CNN) - Deep Learning - by Prabhu - Medium
No ratings yet
Understanding of Convolutional Neural Network (CNN) - Deep Learning - by Prabhu - Medium
8 pages
GANppt
100% (1)
GANppt
34 pages
CNN Architectures: Lenet, Alexnet, VGG, Googlenet, Resnet and More
No ratings yet
CNN Architectures: Lenet, Alexnet, VGG, Googlenet, Resnet and More
9 pages
CNNs for Image Recognition
No ratings yet
CNNs for Image Recognition
16 pages
Unit-V Deep Learning Techniques
100% (1)
Unit-V Deep Learning Techniques
31 pages
Deep Learning - Roy Keyes
100% (1)
Deep Learning - Roy Keyes
163 pages
RBF Neural Network
No ratings yet
RBF Neural Network
34 pages
Day 45 PyTorch Presentation
No ratings yet
Day 45 PyTorch Presentation
67 pages
Autoregressive Generative Models Guide
No ratings yet
Autoregressive Generative Models Guide
57 pages
A Survey of Evolution of Image Captioning PDF
No ratings yet
A Survey of Evolution of Image Captioning PDF
18 pages
Hyperparameters
No ratings yet
Hyperparameters
15 pages
Artificial Neural Networks: Part 1/3
No ratings yet
Artificial Neural Networks: Part 1/3
25 pages
Segmentation Detection
100% (1)
Segmentation Detection
109 pages
Convolutional Neural Networks
No ratings yet
Convolutional Neural Networks
13 pages
Machine Learning
No ratings yet
Machine Learning
20 pages
Neural Network Loss & Regularization
No ratings yet
Neural Network Loss & Regularization
112 pages
Deep Learning Step by Step
No ratings yet
Deep Learning Step by Step
171 pages
Tensorflow Presentation
No ratings yet
Tensorflow Presentation
13 pages
Deep Learning (MODULE-3)
No ratings yet
Deep Learning (MODULE-3)
85 pages
TensorFlow 2.x Basics: Tensors Guide
No ratings yet
TensorFlow 2.x Basics: Tensors Guide
50 pages
Deep Learning
100% (1)
Deep Learning
189 pages
Lecture 17. Convolutional Neural Networks PDF
No ratings yet
Lecture 17. Convolutional Neural Networks PDF
32 pages
Regularization: Swetha V, Research Scholar
No ratings yet
Regularization: Swetha V, Research Scholar
32 pages
Deep Learning With Keras and Tensorflow
No ratings yet
Deep Learning With Keras and Tensorflow
557 pages
Deep Learning RNN
100% (2)
Deep Learning RNN
53 pages
Image Recognition Using Neural Networks
No ratings yet
Image Recognition Using Neural Networks
18 pages
LSTM for Touchpoint Prediction
100% (1)
LSTM for Touchpoint Prediction
73 pages
4.pattern Recognition (Pattern Classification) - Convolutional Neural Networks - (CNN)
No ratings yet
4.pattern Recognition (Pattern Classification) - Convolutional Neural Networks - (CNN)
235 pages
Introduction to Graph Neural Networks
100% (1)
Introduction to Graph Neural Networks
122 pages
Deep Learning CNN
100% (1)
Deep Learning CNN
28 pages
Notes On Backpropagation
No ratings yet
Notes On Backpropagation
14 pages
Soft Max
No ratings yet
Soft Max
6 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
24 pages
What Is Computer Vision?
No ratings yet
What Is Computer Vision?
120 pages
Tutorial Pytorch Best Commands
No ratings yet
Tutorial Pytorch Best Commands
8 pages
Bank Customer Churn Analysis - Jupyter Notebook
No ratings yet
Bank Customer Churn Analysis - Jupyter Notebook
11 pages
Face Recognition with Neural Networks
100% (3)
Face Recognition with Neural Networks
33 pages
How Is Bigdata Handled in Kaggle?: 17Cp006-Leenanci Parmar 17CP012-DHRUVI LAD
No ratings yet
How Is Bigdata Handled in Kaggle?: 17Cp006-Leenanci Parmar 17CP012-DHRUVI LAD
18 pages
Object Detection Week 2 YOLOv1-YOLOv8
100% (1)
Object Detection Week 2 YOLOv1-YOLOv8
264 pages
Session 1
0% (1)
Session 1
13 pages
Churn For Bank Customers
No ratings yet
Churn For Bank Customers
28 pages
Ensemble Learning: Wisdom of The Crowd
100% (1)
Ensemble Learning: Wisdom of The Crowd
12 pages
Computer Vision Unit 4
No ratings yet
Computer Vision Unit 4
186 pages
Deep Learning Course Overview
100% (2)
Deep Learning Course Overview
639 pages
Lec16 - Autoencoders
No ratings yet
Lec16 - Autoencoders
18 pages
Keras RNN Guide for Beginners
No ratings yet
Keras RNN Guide for Beginners
13 pages
Real Time Object Detection System
No ratings yet
Real Time Object Detection System
31 pages
Yolo
No ratings yet
Yolo
24 pages
Object Detection Using You Only Look Once (YOLO) Algorithm in Convolution Neural Network (CNN)
No ratings yet
Object Detection Using You Only Look Once (YOLO) Algorithm in Convolution Neural Network (CNN)
5 pages
Object Detection in Deep Learning
No ratings yet
Object Detection in Deep Learning
61 pages
Yolo Vs RCNN
No ratings yet
Yolo Vs RCNN
5 pages
ETD Syllabus
No ratings yet
ETD Syllabus
2 pages
Design of Internal Model Controller For A Heat Exchanger System
No ratings yet
Design of Internal Model Controller For A Heat Exchanger System
5 pages
DSP
No ratings yet
DSP
95 pages
MATLAB Econometrics Toolbox User S Guide The Mathworks PDF Download
100% (10)
MATLAB Econometrics Toolbox User S Guide The Mathworks PDF Download
56 pages
Higher Order Derivatives and Implicit Differentiation
No ratings yet
Higher Order Derivatives and Implicit Differentiation
22 pages
M.Tech Power Systems QBank
No ratings yet
M.Tech Power Systems QBank
6 pages
Deep Learning (Book)
No ratings yet
Deep Learning (Book)
130 pages
08 Homographies Slides
No ratings yet
08 Homographies Slides
86 pages
Simulation of Insurance Data With Actuar
No ratings yet
Simulation of Insurance Data With Actuar
14 pages
ISTT
No ratings yet
ISTT
4 pages
Recurrent Convolutional Neural Networks For Text Classification
No ratings yet
Recurrent Convolutional Neural Networks For Text Classification
7 pages
Introduction To CFD Basics Rajesh Bhaskaran
No ratings yet
Introduction To CFD Basics Rajesh Bhaskaran
17 pages
Operations Research Project: A Report Presented in Fulfillment of The Term Project in Operations Research To
No ratings yet
Operations Research Project: A Report Presented in Fulfillment of The Term Project in Operations Research To
10 pages
Simultaneous Linear Equations in Three Variables
No ratings yet
Simultaneous Linear Equations in Three Variables
2 pages
Finite Element Methods: Unit 2 Guide
100% (1)
Finite Element Methods: Unit 2 Guide
4 pages
Unit I Lesson 3 Computing The Mean of A Discrete Probability Distribution
100% (1)
Unit I Lesson 3 Computing The Mean of A Discrete Probability Distribution
24 pages
A G1002 Pages: 2: Answer Any Two Full Questions, Each Carries 15 Marks
No ratings yet
A G1002 Pages: 2: Answer Any Two Full Questions, Each Carries 15 Marks
2 pages
Python Tuple: Exercise-1 With Solution: Write A Python Program To Create A Tuple
No ratings yet
Python Tuple: Exercise-1 With Solution: Write A Python Program To Create A Tuple
23 pages
Module 3.4 Jacobian
No ratings yet
Module 3.4 Jacobian
1 page
Turtle Programming - Encryption in Python Final PDF
No ratings yet
Turtle Programming - Encryption in Python Final PDF
14 pages
OM-Chapter 5
No ratings yet
OM-Chapter 5
38 pages
Column Generation With Gams
No ratings yet
Column Generation With Gams
19 pages
Organization of Data Using Graphs
No ratings yet
Organization of Data Using Graphs
1 page
1a. Unconstraint - Kuhn Condition
No ratings yet
1a. Unconstraint - Kuhn Condition
26 pages
Cs 50
No ratings yet
Cs 50
5 pages
Institute of Engineering and Management, Kolkata Artificial Intelligence Project (CS793C) On Handwriting Analysis
No ratings yet
Institute of Engineering and Management, Kolkata Artificial Intelligence Project (CS793C) On Handwriting Analysis
11 pages
9-Biotonic Sort
No ratings yet
9-Biotonic Sort
25 pages
Ranked Positional Weight Method: of Assembly Line Balancing
No ratings yet
Ranked Positional Weight Method: of Assembly Line Balancing
11 pages
CAT1 MCQs
No ratings yet
CAT1 MCQs
11 pages

Advanced Object Detection Guide

Uploaded by

Advanced Object Detection Guide

Uploaded by

Object Detection

Image Classification Object Detection

First Selective Search • Then combines the similar

• CNN for feature extraction

• Linear SVM classifier for identifying objects.

• Regression model for tightening the bounding boxes.

• As CNN is followed by fully connected layers which can accept

• This makes CNN incapable of accepting varied size inputs. Thus,

• This creates another issue of image warping and reduced

• Training is still Slow( though better).

• Using these maps, the regions of proposals are extracted.

• Its original size is 145x200 and the top left corner is

• width: 200/32 = 6.25

• height: 145/32 = ~4.53

• It also uses selective search as a

• It takes around 2 seconds per image

• But when we consider large real-life

Each object in training

Two anchor boxes used

• Precision measures how many of the “positive” predictions

• Recall measures how many of the positive class samples

You might also like