Depth Images Prediction from a Single RGB Image Using Deep learning Deep Learning May 2017 Soubhi Hadri
Depth Images Prediction from a Single RGB Image Table of Contents : Introduction.1 Existing Solutions.2 Dataset and Model.3 Project Code and Results.1
Introduction
Depth Images Prediction from a Single RGB Image Introduction -In 3D computer graphics a depth map is an image or image channel that contains information relating to the distance of the surfaces of scene objects from a viewpoint. -RGB-D image : a RGB image and its corresponding depth image -A depth image is an image channel in which each pixel relates to a distance between the image plane and the corresponding object in the RGB image.
Depth Images Prediction from a Single RGB Image Introduction To approximate the depth of objects : • Stereo camera : camera with two/more lenses to simulate human vision. • Realsense or Kinect to get RGB-D images • Deep Learning..!!
Existing Solutions
Depth Images Prediction from a Single RGB Image Deep Learning for depth estimation : Recently, there are many works to estimate the depth map for RGB image.
Depth Images Prediction from a Single RGB Image Deep Learning for depth estimation : Learning Fine-Scaled Depth Maps from Single RGB Images. 7 Feb 2017 Recently, there are many works to estimate the depth map for RGB image.
Dataset & Model
Depth Images Prediction from a Single RGB Image Dataset : NYU Depth V2 The NYU-Depth V2 data set is comprised of video sequences from a variety of indoor scenes as recorded by both the RGB and Depth cameras from the Microsoft Kinect.
Depth Images Prediction from a Single RGB Image Dataset : NYU Depth V2 The NYU-Depth V2 data set is comprised of video sequences from a variety of indoor scenes as recorded by both the RGB and Depth cameras from the Microsoft Kinect.
Depth Images Prediction from a Single RGB Image Dataset : NYU Depth V2 The dataset consists of : • 1449 labeled pairs of aligned RGB and depth images (2.8 GB). • 407,024 new unlabeled frames - raw rgb, depth (428 GB). • Toolbox: Useful functions for manipulating the data and labels. Different parts of the dataset can be downloaded individually. Authors : Nathan Silberman, Derek Hoiem, Pushmeet Kohli and Rob Fergus 2012
Depth Images Prediction from a Single RGB Image Dataset : NYU Depth V2 The dataset consists of : • 1449 labeled pairs of aligned RGB and depth images (2.8 GB). • 407,024 new unlabeled frames - raw rgb, depth (428 GB). • Toolbox: Useful functions for manipulating the data and labels. Different parts of the dataset can be downloaded individually. Authors : Nathan Silberman, Derek Hoiem, Pushmeet Kohli and Rob Fergus 2012
Depth Images Prediction from a Single RGB Image Dataset : NYU Depth V2 For this project: • Office 1-2 dataset (part of the whole dataset). • 15 GB after processing RAW data. • 3522 RGB-D images.
Depth Images Prediction from a Single RGB Image Dataset : NYU Depth V2 For this project: • Office 1-2 dataset (part of the whole dataset). • 15 GB after processing RAW data. • 3522 RGB-D images. Split the data: 3522 20% 80% 2817 705 2414 403 Training Validation Test
Depth Images Prediction from a Single RGB Image Dataset : NYU Depth V2 Samples of the data:
Depth Images Prediction from a Single RGB Image The Model for Depth Estimation: Model proposed by JaN IVANECK in his master degree thesis -2016.
Depth Images Prediction from a Single RGB Image The Model for Depth Estimation: Model proposed by JaN IVANECK in his master degree thesis -2016. He derived his model from Eigen et al. Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional Architecture. 17 Dec 2015
Depth Images Prediction from a Single RGB Image The Model for Depth Estimation: Global context network estimates the rough depth map of the whole scene from the input RGB image.
Depth Images Prediction from a Single RGB Image The Model for Depth Estimation: Gradient network estimates horizontal and vertical gradients of the depth map globally, for the whole RGB image.
Depth Images Prediction from a Single RGB Image The Model for Depth Estimation: Refining network improves the rough estimate from the global context network, utilizing gradients estimated by the gradient network and an input RGB image.
Depth Images Prediction from a Single RGB Image The Model for Depth Estimation: Global context network Architecture of the global context network The model is derived from AlexNet.
Depth Images Prediction from a Single RGB Image Loss Function: Root mean squared error log(rms-log)
Depth Images Prediction from a Single RGB Image Training The Network: 1- Scale the output images to [0 1]. 2-Subtraction 127 from input images to center the data (kind of normalization). 3-Initialize the convolution layers using AlexNet pre-trained CNN (Transfer Learning). 4-Training the network using batches (batch size = 32) for 35 Epochs. 5- Save the session and model in the end of each Epoch.
Depth Images Prediction from a Single RGB Image Training The Network: 1- Scale the label images to [0 1]. 2-Subtraction 127 from input images to center the data (kind of normalization). 3-Initialize the convolution layers using AlexNet pre-trained CNN (Transfer Learning). 4-Training the network using batches (batch size = 32) for 35 Epochs. 5- Save the session and model in the end of each Epoch.
Depth Images Prediction from a Single RGB Image Training The Network: 1- Scale the label images to [0 1]. 2-Subtraction 127 from input images to center the data (kind of normalization). 3-Initialize the convolution layers using AlexNet pre-trained CNN (Transfer Learning). 4-Training the network using batches (batch size = 32) for 35 Epochs. 5- Save the session and model in the end of each Epoch.
Depth Images Prediction from a Single RGB Image Training The Network: 1- Scale the label images to [0 1]. 2-Subtraction 127 from input images to center the data (kind of normalization). 3-Initialize the convolution layers using AlexNet pre-trained CNN (Transfer Learning). 4-Training the network using batches (batch size = 32) for 35 Epochs. 5- Save the session and model in the end of each Epoch.
Depth Images Prediction from a Single RGB Image Training The Network: 1- Scale the label images to [0 1]. 2-Subtraction 127 from input images to center the data (kind of normalization). 3-Initialize the convolution layers using AlexNet pre-trained CNN (Transfer Learning). 4-Training the network using batches (batch size = 32) for 35 Epochs. 5- Save the session and model in the end of each Epoch.
Depth Images Prediction from a Single RGB Image Project Functions : 1- split_data : to split and save the data into training/testing/val.npy files. 2- load_data : load data from .npy files. 3- plot_imgs: to plot pair of images. 4- get_next_batch: to get the next batch from training data. 5- loss : calculate the loss function. 6- model: to create model (network structure).
Depth Images Prediction from a Single RGB Image Project Functions : 7- train: to start training . 8- evaluate: to evaluate new data after restoring the model..
Depth Images Prediction from a Single RGB Image Project Tools and Libraries: 1- Tensorflow. 2- Slim : lightweight library for defining, training and evaluating complex models in TensorFlow. 3- Tensorboard. 4- numpy. 5-matplotlib.
Depth Images Prediction from a Single RGB Image Project Results:  Training Loss error:
Depth Images Prediction from a Single RGB Image Project Results:  Samples of new data:
Depth Images Prediction from a Single RGB Image Project Results:  Explanation : • Training data is not sufficient.
Depth Images Prediction from a Single RGB Image Project Results:  Explanation : • Training data is not sufficient. In Jan’s experiment: • Full NYU dataset and 3 dataset generated from the original one. • Network was trained for 100,000 iterations.
Depth Images Prediction from a Single RGB Image Project Results:  Explanation : • Training data is not sufficient. In Jan’s experiment: • Full NYU dataset and 3 dataset generated from the original one. • Network was trained for 100,000 iterations. This experiment: • It took ~26 hours for 30 Epochs.
Depth Images Prediction from a Single RGB Image Project : The project code and data will be available on GitHub: https://github.com/SubhiH/Depth-Estimation-Deep-Learning
Depth Images Prediction from a Single RGB Image Resources : -https://arxiv.org/pdf/1607.00730.pdf -http://janivanecky.com/ -http://cs.nyu.edu/~silberman/datasets/nyu_depth_v2.html
Thank You

Depth estimation using deep learning

  • 1.
    Depth Images Predictionfrom a Single RGB Image Using Deep learning Deep Learning May 2017 Soubhi Hadri
  • 2.
    Depth Images Predictionfrom a Single RGB Image Table of Contents : Introduction.1 Existing Solutions.2 Dataset and Model.3 Project Code and Results.1
  • 3.
  • 4.
    Depth Images Predictionfrom a Single RGB Image Introduction -In 3D computer graphics a depth map is an image or image channel that contains information relating to the distance of the surfaces of scene objects from a viewpoint. -RGB-D image : a RGB image and its corresponding depth image -A depth image is an image channel in which each pixel relates to a distance between the image plane and the corresponding object in the RGB image.
  • 5.
    Depth Images Predictionfrom a Single RGB Image Introduction To approximate the depth of objects : • Stereo camera : camera with two/more lenses to simulate human vision. • Realsense or Kinect to get RGB-D images • Deep Learning..!!
  • 6.
  • 7.
    Depth Images Predictionfrom a Single RGB Image Deep Learning for depth estimation : Recently, there are many works to estimate the depth map for RGB image.
  • 8.
    Depth Images Predictionfrom a Single RGB Image Deep Learning for depth estimation : Learning Fine-Scaled Depth Maps from Single RGB Images. 7 Feb 2017 Recently, there are many works to estimate the depth map for RGB image.
  • 9.
  • 10.
    Depth Images Predictionfrom a Single RGB Image Dataset : NYU Depth V2 The NYU-Depth V2 data set is comprised of video sequences from a variety of indoor scenes as recorded by both the RGB and Depth cameras from the Microsoft Kinect.
  • 11.
    Depth Images Predictionfrom a Single RGB Image Dataset : NYU Depth V2 The NYU-Depth V2 data set is comprised of video sequences from a variety of indoor scenes as recorded by both the RGB and Depth cameras from the Microsoft Kinect.
  • 12.
    Depth Images Predictionfrom a Single RGB Image Dataset : NYU Depth V2 The dataset consists of : • 1449 labeled pairs of aligned RGB and depth images (2.8 GB). • 407,024 new unlabeled frames - raw rgb, depth (428 GB). • Toolbox: Useful functions for manipulating the data and labels. Different parts of the dataset can be downloaded individually. Authors : Nathan Silberman, Derek Hoiem, Pushmeet Kohli and Rob Fergus 2012
  • 13.
    Depth Images Predictionfrom a Single RGB Image Dataset : NYU Depth V2 The dataset consists of : • 1449 labeled pairs of aligned RGB and depth images (2.8 GB). • 407,024 new unlabeled frames - raw rgb, depth (428 GB). • Toolbox: Useful functions for manipulating the data and labels. Different parts of the dataset can be downloaded individually. Authors : Nathan Silberman, Derek Hoiem, Pushmeet Kohli and Rob Fergus 2012
  • 14.
    Depth Images Predictionfrom a Single RGB Image Dataset : NYU Depth V2 For this project: • Office 1-2 dataset (part of the whole dataset). • 15 GB after processing RAW data. • 3522 RGB-D images.
  • 15.
    Depth Images Predictionfrom a Single RGB Image Dataset : NYU Depth V2 For this project: • Office 1-2 dataset (part of the whole dataset). • 15 GB after processing RAW data. • 3522 RGB-D images. Split the data: 3522 20% 80% 2817 705 2414 403 Training Validation Test
  • 16.
    Depth Images Predictionfrom a Single RGB Image Dataset : NYU Depth V2 Samples of the data:
  • 17.
    Depth Images Predictionfrom a Single RGB Image The Model for Depth Estimation: Model proposed by JaN IVANECK in his master degree thesis -2016.
  • 18.
    Depth Images Predictionfrom a Single RGB Image The Model for Depth Estimation: Model proposed by JaN IVANECK in his master degree thesis -2016. He derived his model from Eigen et al. Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional Architecture. 17 Dec 2015
  • 19.
    Depth Images Predictionfrom a Single RGB Image The Model for Depth Estimation: Global context network estimates the rough depth map of the whole scene from the input RGB image.
  • 20.
    Depth Images Predictionfrom a Single RGB Image The Model for Depth Estimation: Gradient network estimates horizontal and vertical gradients of the depth map globally, for the whole RGB image.
  • 21.
    Depth Images Predictionfrom a Single RGB Image The Model for Depth Estimation: Refining network improves the rough estimate from the global context network, utilizing gradients estimated by the gradient network and an input RGB image.
  • 22.
    Depth Images Predictionfrom a Single RGB Image The Model for Depth Estimation: Global context network Architecture of the global context network The model is derived from AlexNet.
  • 23.
    Depth Images Predictionfrom a Single RGB Image Loss Function: Root mean squared error log(rms-log)
  • 24.
    Depth Images Predictionfrom a Single RGB Image Training The Network: 1- Scale the output images to [0 1]. 2-Subtraction 127 from input images to center the data (kind of normalization). 3-Initialize the convolution layers using AlexNet pre-trained CNN (Transfer Learning). 4-Training the network using batches (batch size = 32) for 35 Epochs. 5- Save the session and model in the end of each Epoch.
  • 25.
    Depth Images Predictionfrom a Single RGB Image Training The Network: 1- Scale the label images to [0 1]. 2-Subtraction 127 from input images to center the data (kind of normalization). 3-Initialize the convolution layers using AlexNet pre-trained CNN (Transfer Learning). 4-Training the network using batches (batch size = 32) for 35 Epochs. 5- Save the session and model in the end of each Epoch.
  • 26.
    Depth Images Predictionfrom a Single RGB Image Training The Network: 1- Scale the label images to [0 1]. 2-Subtraction 127 from input images to center the data (kind of normalization). 3-Initialize the convolution layers using AlexNet pre-trained CNN (Transfer Learning). 4-Training the network using batches (batch size = 32) for 35 Epochs. 5- Save the session and model in the end of each Epoch.
  • 27.
    Depth Images Predictionfrom a Single RGB Image Training The Network: 1- Scale the label images to [0 1]. 2-Subtraction 127 from input images to center the data (kind of normalization). 3-Initialize the convolution layers using AlexNet pre-trained CNN (Transfer Learning). 4-Training the network using batches (batch size = 32) for 35 Epochs. 5- Save the session and model in the end of each Epoch.
  • 28.
    Depth Images Predictionfrom a Single RGB Image Training The Network: 1- Scale the label images to [0 1]. 2-Subtraction 127 from input images to center the data (kind of normalization). 3-Initialize the convolution layers using AlexNet pre-trained CNN (Transfer Learning). 4-Training the network using batches (batch size = 32) for 35 Epochs. 5- Save the session and model in the end of each Epoch.
  • 29.
    Depth Images Predictionfrom a Single RGB Image Project Functions : 1- split_data : to split and save the data into training/testing/val.npy files. 2- load_data : load data from .npy files. 3- plot_imgs: to plot pair of images. 4- get_next_batch: to get the next batch from training data. 5- loss : calculate the loss function. 6- model: to create model (network structure).
  • 30.
    Depth Images Predictionfrom a Single RGB Image Project Functions : 7- train: to start training . 8- evaluate: to evaluate new data after restoring the model..
  • 31.
    Depth Images Predictionfrom a Single RGB Image Project Tools and Libraries: 1- Tensorflow. 2- Slim : lightweight library for defining, training and evaluating complex models in TensorFlow. 3- Tensorboard. 4- numpy. 5-matplotlib.
  • 32.
    Depth Images Predictionfrom a Single RGB Image Project Results:  Training Loss error:
  • 33.
    Depth Images Predictionfrom a Single RGB Image Project Results:  Samples of new data:
  • 34.
    Depth Images Predictionfrom a Single RGB Image Project Results:  Explanation : • Training data is not sufficient.
  • 35.
    Depth Images Predictionfrom a Single RGB Image Project Results:  Explanation : • Training data is not sufficient. In Jan’s experiment: • Full NYU dataset and 3 dataset generated from the original one. • Network was trained for 100,000 iterations.
  • 36.
    Depth Images Predictionfrom a Single RGB Image Project Results:  Explanation : • Training data is not sufficient. In Jan’s experiment: • Full NYU dataset and 3 dataset generated from the original one. • Network was trained for 100,000 iterations. This experiment: • It took ~26 hours for 30 Epochs.
  • 37.
    Depth Images Predictionfrom a Single RGB Image Project : The project code and data will be available on GitHub: https://github.com/SubhiH/Depth-Estimation-Deep-Learning
  • 38.
    Depth Images Predictionfrom a Single RGB Image Resources : -https://arxiv.org/pdf/1607.00730.pdf -http://janivanecky.com/ -http://cs.nyu.edu/~silberman/datasets/nyu_depth_v2.html
  • 39.