|
11 | 11 | "Our goal is to train a deep learning model that can make steering angle predictions based on an input that comprises of camera images and the vehicle's last known state. In this notebook, we will prepare the data for our end-to-end deep learning model. Along the way, we will also make some useful observations about the dataset that will aid us when it comes time to train the model. \n", |
12 | 12 | "\n", |
13 | 13 | "\n", |
14 | | - "## What is End-to-End Deep Learning?\n", |
| 14 | + "## What is end-to-end deep learning?\n", |
15 | 15 | "\n", |
16 | | - "End-to-end deep learning is a modeling strategy that is a response to the success of deep neural networks. Unlike traditional methods, this strategy is not built on feature engineering. Instead, it leverages the power of deep neural networks, along with recent hardware advances (GPUs, FPGAs etc.) to harness the incredible potential of large amounts of data. It is closer to a human-like learning approach than traditional ML as it lets a neural network map raw input to direct outputs. A big downside to this approach is that it requires a very large amount of training data which makes it unsuitable for many common applications. Since simulators can (potentially) generate infinite amounts of data, they are a perfect data source for end-to-end deep learning algorithms. If you wish to learn more, [this video](https://www.coursera.org/learn/machine-learning-projects/lecture/k0Klk/what-is-end-to-end-deep-learning) by Andrew Ng provides a nice overview of the topic.\n", |
| 16 | + "End-to-end deep learning is a modeling strategy that is a response to the success of deep neural networks. Unlike traditional methods, this strategy is not built on feature engineering. Instead, it leverages the power of deep neural networks, along with recent hardware advances (GPUs, FPGAs etc.) to harness the incredible potential of large amounts of data. It is closer to a human-like learning approach than traditional ML as it lets a neural network map raw input to direct output. A big downside to this approach is that it requires a very large amount of training data which makes it unsuitable for many common applications. Since simulators can (potentially) generate data in infinite amounts, they are a perfect data source for end-to-end deep learning algorithms. If you wish to learn more, [this video](https://www.coursera.org/learn/machine-learning-projects/lecture/k0Klk/what-is-end-to-end-deep-learning) by Andrew Ng provides a nice overview of the topic.\n", |
17 | 17 | "\n", |
18 | | - "Autonomous driving is a field that can highly benefit from the power of end-to-end deep learning. In order to achieve SAE Level 4 Autonomy, cars need to be trained on copious amounts of data (it is not uncommon for car manufacturers to collect hundreds of petabytes of data every week), something that is virtually impossible without a simulator. \n", |
| 18 | + "Autonomous driving is a field that can highly benefit from the power of end-to-end deep learning. In order to achieve SAE Level 4 or 5 Autonomy, cars need to be trained on copious amounts of data (it is not uncommon for car manufacturers to collect hundreds of petabytes of data every week), something that is virtually impossible without a simulator. \n", |
19 | 19 | "\n", |
20 | | - "With photo-realistic simulators like [AirSim](https://github.com/Microsoft/AirSim), it is now possible to collect a large amount of data to train your autonomous driving models without having to use an actual car. These models can then be fine tuned using a comparably lesser amount of real-world data and used on actual cars. This technique is called Behavioral Cloning. In this tutorial, you will train a model to learn how to steer a car through a portion of the Landscape map in AirSim using only one of the front facing webcams as visual input. Our strategy will be to perform some basic data analysis to get a feel for the dataset, and then train an end-to-end deep learning model to predict the correct steering control signals (a.k.a. \"steering angle\") given a frame from the webcam, and the car's current state parameters (speed, steering angle, throttle etc.).\n", |
| 20 | + "With photo-realistic simulators like [AirSim](https://github.com/Microsoft/AirSim), it is now possible to collect a large amount of data to train your autonomous driving models without having to use an actual car. These models can then be fine tuned using a comparably lesser amount of real-world data and used on actual cars. This technique is called Behavioral Cloning. In this tutorial, you will train a model to learn how to steer a car through a portion of the Landscape map in AirSim using only one of the front facing webcams on the car as visual input. Our strategy will be to perform some basic data analysis to get a feel for the dataset, and then train an end-to-end deep learning model to predict the correct driving control signal (in this case the steering angle) given a frame from the webcam, and the car's current state parameters (speed, steering angle, throttle etc.).\n", |
21 | 21 | "\n", |
22 | 22 | "Before you begin, please make sure you have the dataset for the tutorial downloaded. If you missed the instructions in the readme file, [you can download the dataset from here](https://aka.ms/AirSimTutorialDataset).\n", |
23 | 23 | "\n", |
|
62 | 62 | "cell_type": "markdown", |
63 | 63 | "metadata": {}, |
64 | 64 | "source": [ |
65 | | - "First, let's take a look at the raw data. There are two parts to the dataset - the images and the .tsv file. First, let us read one of the .tsv files." |
| 65 | + "Let's take a look at the raw data. There are two parts to the dataset - the images and the .tsv file. First, let us read one of the .tsv files." |
66 | 66 | ] |
67 | 67 | }, |
68 | 68 | { |
69 | 69 | "cell_type": "code", |
70 | 70 | "execution_count": 2, |
71 | 71 | "metadata": { |
| 72 | + "collapsed": false, |
72 | 73 | "scrolled": true |
73 | 74 | }, |
74 | 75 | "outputs": [ |
|
181 | 182 | "cell_type": "markdown", |
182 | 183 | "metadata": {}, |
183 | 184 | "source": [ |
184 | | - "This dataset contains our label, the steering angle. It also has the name of the image taken at the time the steering angle was recorded. Let's look at a sample image - 'img_0.png' inside the 'normal_1' folder (more on folder naming later)." |
| 185 | + "This dataset contains our label, the steering angle. It also has the name of the image taken at the time the steering angle was recorded. Let's look at a sample image - 'img_0.png' inside the 'normal_1' folder (more on our folder naming style later)." |
185 | 186 | ] |
186 | 187 | }, |
187 | 188 | { |
188 | 189 | "cell_type": "code", |
189 | 190 | "execution_count": 3, |
190 | 191 | "metadata": { |
| 192 | + "collapsed": false, |
191 | 193 | "scrolled": true |
192 | 194 | }, |
193 | 195 | "outputs": [ |
|
220 | 222 | { |
221 | 223 | "cell_type": "code", |
222 | 224 | "execution_count": 4, |
223 | | - "metadata": {}, |
| 225 | + "metadata": { |
| 226 | + "collapsed": false |
| 227 | + }, |
224 | 228 | "outputs": [ |
225 | 229 | { |
226 | 230 | "data": { |
|
252 | 256 | "cell_type": "markdown", |
253 | 257 | "metadata": {}, |
254 | 258 | "source": [ |
255 | | - "**Extracting this ROI will both reduce the training time and the amount of data needed to train the model**. It will also prevent the model from getting confused by focusing on irrelevant features in the environment (e.g. clouds, birds, etc)\n", |
| 259 | + "**Extracting this ROI will both reduce the training time and the amount of data needed to train the model**. It will also prevent the model from getting confused by focusing on irrelevant features in the environment (e.g. mountains, trees, etc)\n", |
256 | 260 | "\n", |
257 | 261 | "Another observation we can make is that **the dataset exhibits a vertical flip tolerance**. That is, we get a valid data point if we flip the image around the Y axis if we also flip the sign of the steering angle. This is important as it effectively doubles the number of data points we have available. \n", |
258 | 262 | "\n", |
|
274 | 278 | { |
275 | 279 | "cell_type": "code", |
276 | 280 | "execution_count": 5, |
277 | | - "metadata": {}, |
| 281 | + "metadata": { |
| 282 | + "collapsed": false |
| 283 | + }, |
278 | 284 | "outputs": [ |
279 | 285 | { |
280 | 286 | "name": "stdout", |
|
415 | 421 | "cell_type": "markdown", |
416 | 422 | "metadata": {}, |
417 | 423 | "source": [ |
418 | | - "Let us first address the naming of the dataset folders. You will notice that we have two types of folders in our dataset - 'normal', and 'swerve'. These names refer to two different driving strategies. Let's begin by attempting to get an understanding of the differences between these two styles of driving. First, we'll plot a portion of datapoints from each of the driving styles against each other." |
| 424 | + "Let us now address the naming of the dataset folders. You will notice that we have two types of folders in our dataset - 'normal', and 'swerve'. These names refer to two different driving strategies. Let's begin by attempting to get an understanding of the differences between these two styles of driving. First, we'll plot a portion of datapoints from each of the driving styles against each other." |
419 | 425 | ] |
420 | 426 | }, |
421 | 427 | { |
422 | 428 | "cell_type": "code", |
423 | 429 | "execution_count": 6, |
424 | | - "metadata": {}, |
| 430 | + "metadata": { |
| 431 | + "collapsed": false |
| 432 | + }, |
425 | 433 | "outputs": [ |
426 | 434 | { |
427 | 435 | "data": { |
|
471 | 479 | { |
472 | 480 | "cell_type": "code", |
473 | 481 | "execution_count": 7, |
474 | | - "metadata": {}, |
| 482 | + "metadata": { |
| 483 | + "collapsed": false |
| 484 | + }, |
475 | 485 | "outputs": [ |
476 | 486 | { |
477 | 487 | "data": { |
|
510 | 520 | "So, roughly a quarter of the data points are collected with the swerving driving strategy, and the rest are collected with the normal strategy. We also see that we have almost 47,000 data points to work with. This is nearly not enough data, hence our network cannot be too deep. \n", |
511 | 521 | "\n", |
512 | 522 | "> **Thought Exercise 0.4:**\n", |
513 | | - "Like many things in the field of Machine Learning, the ideal blend of number of datapoints in each category here is something that is problem specific, and can only be optimized by trial and error. Can you find a split that works better than ours?\n", |
| 523 | + "Like many things in the field of Machine Learning, the ideal ratio of number of datapoints in each category here is something that is problem specific, and can only be optimized by trial and error. Can you find a split that works better than ours?\n", |
514 | 524 | "\n", |
515 | 525 | "Let's see what the distribution of labels looks like for the two strategies." |
516 | 526 | ] |
517 | 527 | }, |
518 | 528 | { |
519 | 529 | "cell_type": "code", |
520 | 530 | "execution_count": 8, |
521 | | - "metadata": {}, |
| 531 | + "metadata": { |
| 532 | + "collapsed": false |
| 533 | + }, |
522 | 534 | "outputs": [ |
523 | 535 | { |
524 | 536 | "data": { |
|
570 | 582 | "\n", |
571 | 583 | "The code for cooking the dataset is straightforward, but long. When it terminates, the final dataset will have 4 parts:\n", |
572 | 584 | "\n", |
573 | | - "* image: a numpy array containing the image data\n", |
574 | | - "* previous_state: a numpy array containing the last known state of the car. This is a (steering, throttle, brake, speed) tuple\n", |
575 | | - "* label: a numpy array containing the steering angles that we wish to predict (normalized on the range -1..1)\n", |
576 | | - "* metadata: a numpy array containing metadata about the files (which folder they came from, etc)\n", |
| 585 | + "* **image**: a numpy array containing the image data\n", |
| 586 | + "* **previous_state**: a numpy array containing the last known state of the car. This is a (steering, throttle, brake, speed) tuple\n", |
| 587 | + "* **label**: a numpy array containing the steering angles that we wish to predict (normalized on the range -1..1)\n", |
| 588 | + "* **metadata**: a numpy array containing metadata about the files (which folder they came from, etc)\n", |
577 | 589 | "\n", |
578 | | - "The processing may take some time. We will also divide the datasets into train/test/validation datasets." |
| 590 | + "The processing may take some time. We will also combine all the datasets into one and then split it into train/test/validation datasets." |
579 | 591 | ] |
580 | 592 | }, |
581 | 593 | { |
582 | 594 | "cell_type": "code", |
583 | 595 | "execution_count": 9, |
584 | 596 | "metadata": { |
| 597 | + "collapsed": false, |
585 | 598 | "scrolled": true |
586 | 599 | }, |
587 | 600 | "outputs": [ |
|
0 commit comments