Skip to content

Commit 1b6c06c

Browse files
committed
final fixes e2e tutorial
1 parent 37efd70 commit 1b6c06c

File tree

5 files changed

+65
-37
lines changed

5 files changed

+65
-37
lines changed

AirSimE2EDeepLearning/DataExplorationAndPreparation.ipynb

Lines changed: 32 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -11,13 +11,13 @@
1111
"Our goal is to train a deep learning model that can make steering angle predictions based on an input that comprises of camera images and the vehicle's last known state. In this notebook, we will prepare the data for our end-to-end deep learning model. Along the way, we will also make some useful observations about the dataset that will aid us when it comes time to train the model. \n",
1212
"\n",
1313
"\n",
14-
"## What is End-to-End Deep Learning?\n",
14+
"## What is end-to-end deep learning?\n",
1515
"\n",
16-
"End-to-end deep learning is a modeling strategy that is a response to the success of deep neural networks. Unlike traditional methods, this strategy is not built on feature engineering. Instead, it leverages the power of deep neural networks, along with recent hardware advances (GPUs, FPGAs etc.) to harness the incredible potential of large amounts of data. It is closer to a human-like learning approach than traditional ML as it lets a neural network map raw input to direct outputs. A big downside to this approach is that it requires a very large amount of training data which makes it unsuitable for many common applications. Since simulators can (potentially) generate infinite amounts of data, they are a perfect data source for end-to-end deep learning algorithms. If you wish to learn more, [this video](https://www.coursera.org/learn/machine-learning-projects/lecture/k0Klk/what-is-end-to-end-deep-learning) by Andrew Ng provides a nice overview of the topic.\n",
16+
"End-to-end deep learning is a modeling strategy that is a response to the success of deep neural networks. Unlike traditional methods, this strategy is not built on feature engineering. Instead, it leverages the power of deep neural networks, along with recent hardware advances (GPUs, FPGAs etc.) to harness the incredible potential of large amounts of data. It is closer to a human-like learning approach than traditional ML as it lets a neural network map raw input to direct output. A big downside to this approach is that it requires a very large amount of training data which makes it unsuitable for many common applications. Since simulators can (potentially) generate data in infinite amounts, they are a perfect data source for end-to-end deep learning algorithms. If you wish to learn more, [this video](https://www.coursera.org/learn/machine-learning-projects/lecture/k0Klk/what-is-end-to-end-deep-learning) by Andrew Ng provides a nice overview of the topic.\n",
1717
"\n",
18-
"Autonomous driving is a field that can highly benefit from the power of end-to-end deep learning. In order to achieve SAE Level 4 Autonomy, cars need to be trained on copious amounts of data (it is not uncommon for car manufacturers to collect hundreds of petabytes of data every week), something that is virtually impossible without a simulator. \n",
18+
"Autonomous driving is a field that can highly benefit from the power of end-to-end deep learning. In order to achieve SAE Level 4 or 5 Autonomy, cars need to be trained on copious amounts of data (it is not uncommon for car manufacturers to collect hundreds of petabytes of data every week), something that is virtually impossible without a simulator. \n",
1919
"\n",
20-
"With photo-realistic simulators like [AirSim](https://github.com/Microsoft/AirSim), it is now possible to collect a large amount of data to train your autonomous driving models without having to use an actual car. These models can then be fine tuned using a comparably lesser amount of real-world data and used on actual cars. This technique is called Behavioral Cloning. In this tutorial, you will train a model to learn how to steer a car through a portion of the Landscape map in AirSim using only one of the front facing webcams as visual input. Our strategy will be to perform some basic data analysis to get a feel for the dataset, and then train an end-to-end deep learning model to predict the correct steering control signals (a.k.a. \"steering angle\") given a frame from the webcam, and the car's current state parameters (speed, steering angle, throttle etc.).\n",
20+
"With photo-realistic simulators like [AirSim](https://github.com/Microsoft/AirSim), it is now possible to collect a large amount of data to train your autonomous driving models without having to use an actual car. These models can then be fine tuned using a comparably lesser amount of real-world data and used on actual cars. This technique is called Behavioral Cloning. In this tutorial, you will train a model to learn how to steer a car through a portion of the Landscape map in AirSim using only one of the front facing webcams on the car as visual input. Our strategy will be to perform some basic data analysis to get a feel for the dataset, and then train an end-to-end deep learning model to predict the correct driving control signal (in this case the steering angle) given a frame from the webcam, and the car's current state parameters (speed, steering angle, throttle etc.).\n",
2121
"\n",
2222
"Before you begin, please make sure you have the dataset for the tutorial downloaded. If you missed the instructions in the readme file, [you can download the dataset from here](https://aka.ms/AirSimTutorialDataset).\n",
2323
"\n",
@@ -62,13 +62,14 @@
6262
"cell_type": "markdown",
6363
"metadata": {},
6464
"source": [
65-
"First, let's take a look at the raw data. There are two parts to the dataset - the images and the .tsv file. First, let us read one of the .tsv files."
65+
"Let's take a look at the raw data. There are two parts to the dataset - the images and the .tsv file. First, let us read one of the .tsv files."
6666
]
6767
},
6868
{
6969
"cell_type": "code",
7070
"execution_count": 2,
7171
"metadata": {
72+
"collapsed": false,
7273
"scrolled": true
7374
},
7475
"outputs": [
@@ -181,13 +182,14 @@
181182
"cell_type": "markdown",
182183
"metadata": {},
183184
"source": [
184-
"This dataset contains our label, the steering angle. It also has the name of the image taken at the time the steering angle was recorded. Let's look at a sample image - 'img_0.png' inside the 'normal_1' folder (more on folder naming later)."
185+
"This dataset contains our label, the steering angle. It also has the name of the image taken at the time the steering angle was recorded. Let's look at a sample image - 'img_0.png' inside the 'normal_1' folder (more on our folder naming style later)."
185186
]
186187
},
187188
{
188189
"cell_type": "code",
189190
"execution_count": 3,
190191
"metadata": {
192+
"collapsed": false,
191193
"scrolled": true
192194
},
193195
"outputs": [
@@ -220,7 +222,9 @@
220222
{
221223
"cell_type": "code",
222224
"execution_count": 4,
223-
"metadata": {},
225+
"metadata": {
226+
"collapsed": false
227+
},
224228
"outputs": [
225229
{
226230
"data": {
@@ -252,7 +256,7 @@
252256
"cell_type": "markdown",
253257
"metadata": {},
254258
"source": [
255-
"**Extracting this ROI will both reduce the training time and the amount of data needed to train the model**. It will also prevent the model from getting confused by focusing on irrelevant features in the environment (e.g. clouds, birds, etc)\n",
259+
"**Extracting this ROI will both reduce the training time and the amount of data needed to train the model**. It will also prevent the model from getting confused by focusing on irrelevant features in the environment (e.g. mountains, trees, etc)\n",
256260
"\n",
257261
"Another observation we can make is that **the dataset exhibits a vertical flip tolerance**. That is, we get a valid data point if we flip the image around the Y axis if we also flip the sign of the steering angle. This is important as it effectively doubles the number of data points we have available. \n",
258262
"\n",
@@ -274,7 +278,9 @@
274278
{
275279
"cell_type": "code",
276280
"execution_count": 5,
277-
"metadata": {},
281+
"metadata": {
282+
"collapsed": false
283+
},
278284
"outputs": [
279285
{
280286
"name": "stdout",
@@ -415,13 +421,15 @@
415421
"cell_type": "markdown",
416422
"metadata": {},
417423
"source": [
418-
"Let us first address the naming of the dataset folders. You will notice that we have two types of folders in our dataset - 'normal', and 'swerve'. These names refer to two different driving strategies. Let's begin by attempting to get an understanding of the differences between these two styles of driving. First, we'll plot a portion of datapoints from each of the driving styles against each other."
424+
"Let us now address the naming of the dataset folders. You will notice that we have two types of folders in our dataset - 'normal', and 'swerve'. These names refer to two different driving strategies. Let's begin by attempting to get an understanding of the differences between these two styles of driving. First, we'll plot a portion of datapoints from each of the driving styles against each other."
419425
]
420426
},
421427
{
422428
"cell_type": "code",
423429
"execution_count": 6,
424-
"metadata": {},
430+
"metadata": {
431+
"collapsed": false
432+
},
425433
"outputs": [
426434
{
427435
"data": {
@@ -471,7 +479,9 @@
471479
{
472480
"cell_type": "code",
473481
"execution_count": 7,
474-
"metadata": {},
482+
"metadata": {
483+
"collapsed": false
484+
},
475485
"outputs": [
476486
{
477487
"data": {
@@ -510,15 +520,17 @@
510520
"So, roughly a quarter of the data points are collected with the swerving driving strategy, and the rest are collected with the normal strategy. We also see that we have almost 47,000 data points to work with. This is nearly not enough data, hence our network cannot be too deep. \n",
511521
"\n",
512522
"> **Thought Exercise 0.4:**\n",
513-
"Like many things in the field of Machine Learning, the ideal blend of number of datapoints in each category here is something that is problem specific, and can only be optimized by trial and error. Can you find a split that works better than ours?\n",
523+
"Like many things in the field of Machine Learning, the ideal ratio of number of datapoints in each category here is something that is problem specific, and can only be optimized by trial and error. Can you find a split that works better than ours?\n",
514524
"\n",
515525
"Let's see what the distribution of labels looks like for the two strategies."
516526
]
517527
},
518528
{
519529
"cell_type": "code",
520530
"execution_count": 8,
521-
"metadata": {},
531+
"metadata": {
532+
"collapsed": false
533+
},
522534
"outputs": [
523535
{
524536
"data": {
@@ -570,18 +582,19 @@
570582
"\n",
571583
"The code for cooking the dataset is straightforward, but long. When it terminates, the final dataset will have 4 parts:\n",
572584
"\n",
573-
"* image: a numpy array containing the image data\n",
574-
"* previous_state: a numpy array containing the last known state of the car. This is a (steering, throttle, brake, speed) tuple\n",
575-
"* label: a numpy array containing the steering angles that we wish to predict (normalized on the range -1..1)\n",
576-
"* metadata: a numpy array containing metadata about the files (which folder they came from, etc)\n",
585+
"* **image**: a numpy array containing the image data\n",
586+
"* **previous_state**: a numpy array containing the last known state of the car. This is a (steering, throttle, brake, speed) tuple\n",
587+
"* **label**: a numpy array containing the steering angles that we wish to predict (normalized on the range -1..1)\n",
588+
"* **metadata**: a numpy array containing metadata about the files (which folder they came from, etc)\n",
577589
"\n",
578-
"The processing may take some time. We will also divide the datasets into train/test/validation datasets."
590+
"The processing may take some time. We will also combine all the datasets into one and then split it into train/test/validation datasets."
579591
]
580592
},
581593
{
582594
"cell_type": "code",
583595
"execution_count": 9,
584596
"metadata": {
597+
"collapsed": false,
585598
"scrolled": true
586599
},
587600
"outputs": [

AirSimE2EDeepLearning/README.md

Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@ If you have never worked with Python notebooks before, we highly recommend [chec
3434

3535
### Background needed
3636

37-
At the very least, you need to be familiar with the basics neural networks. You are not required to know advanced concepts like LSTMs or Reinforcement Learning but you should know how Convolutional Networks work. A really good starting point to get a strong background in a very short amount of time is [this great book](http://neuralnetworksanddeeplearning.com/) written by Michael Nielsen. It is free, very short and available online. It can provide you a solid foundation in less than a week's time.
37+
You should be familiar with the basics of neural networks and deep learning. You are not required to know advanced concepts like LSTMs or Reinforcement Learning but you should know how Convolutional Neural Networks work. A really good starting point to get a strong background in a short amount of time is [this highly recommended book on the topic](http://neuralnetworksanddeeplearning.com/) written by Michael Nielsen. It is free, very short and available online. It can provide you a solid foundation in less than a week's time.
3838

3939
You should also be comfortable with Python. At the very least, you should be able to read and understand code written in Python.
4040

@@ -44,10 +44,17 @@ You should also be comfortable with Python. At the very least, you should be abl
4444
2. [Install CNTK](https://docs.microsoft.com/en-us/cognitive-toolkit/Setup-CNTK-on-your-machine) or [install Tensorflow](https://www.tensorflow.org/install/install_windows)
4545
3. [Install h5py](http://docs.h5py.org/en/latest/build.html)
4646
4. [Install Keras](https://keras.io/#installation) and [configure the Keras backend](https://keras.io/backend/) to work with TensorFlow (default) or CNTK.
47+
5. [Install AzCopy](https://docs.microsoft.com/en-us/azure/storage/common/storage-use-azcopy). Be sure to add the location for the AzCopy executable to your system path.
4748

4849
### Simulator Package
4950

50-
We have created a standalone build of the AirSim simulation environment for the tutorials in this cookbook. [You can download the build package from here](https://aka.ms/ADCookbookAirSimPackage). After downloading the package, unzip it and run the **ADCookbook_Airsim_Mountain.bat** file to start the simulator.
51+
We have created a standalone build of the AirSim simulation environment for the tutorials in this cookbook. [You can download the build package from here](https://airsimtutorialdataset.blob.core.windows.net/e2edl/AD_Cookbook_AirSim.7z). Consider using [AzCopy](https://docs.microsoft.com/en-us/azure/storage/common/storage-use-azcopy), as the file size is large. After downloading the package, unzip it and run the PowerShell command
52+
53+
`
54+
.\AD_Cookbook_Start_AirSim.ps1 landscape
55+
`
56+
57+
to start the simulator in the landscape environment.
5158

5259
### Hardware
5360

AirSimE2EDeepLearning/TestModel.ipynb

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,9 @@
4141
{
4242
"cell_type": "code",
4343
"execution_count": 2,
44-
"metadata": {},
44+
"metadata": {
45+
"collapsed": false
46+
},
4547
"outputs": [
4648
{
4749
"name": "stdout",
@@ -153,7 +155,7 @@
153155
"> ** Thought Exercise 2.2**:\n",
154156
"The car seems to crash when it tries to climb one of those hills. Can you think of a reason why? How can you fix this? (Hint: You might want to take a look at what the car is seeing when it is making that ascent)\n",
155157
"\n",
156-
"AirSim opens up a world of possibilities. There is no limit to the new things you can try as you train even more complex models and use other learning techniques. Here are a few immediate things you could try that might require modifying some of the code provided in this tutorials (including the helper files) but won't require modifying any Unreal assets.\n",
158+
"AirSim opens up a world of possibilities. There is no limit to the new things you can try as you train even more complex models and use other learning techniques. Here are a few immediate things you could try that might require modifying some of the code provided in this tutorial (including the helper files) but won't require modifying any Unreal assets.\n",
157159
"\n",
158160
"> ** Exploratory Idea 2.1**:\n",
159161
"If you have a background in Machine Learning, you might have asked the question: why did we train and test in the same environment? Isn't that overfitting? Well, you can make arguments on both sides. While using the same environment for both training and testing might seem like you are overfitting to that environment, it can also be seen as drawing examples from the same probability distribution. The data used for training and testing is not the same, even though it is coming from the same distribution. So that brings us to the question: how will this model fare in a different environment, one it hasn't seen before? \n",
@@ -167,7 +169,7 @@
167169
"The model currently views a single image and a single state for each prediction. However, we have access to historical data. Can we extend the model to make predictions using the previous N images and states (e.g. given the past 3 images and past 3 states, predict the next steering angle)? (Hint: This will possibly require you to use recurrent neural network techniques)\n",
168170
"\n",
169171
"> ** Exploratory Idea 2.4**:\n",
170-
"AirSim is much more than the dataset we provided you. For starters, we only used one camera and used it only in RGB mode. AirSim lets you collect data in depth view, segmentation view, surface normal view etc for each of the cameras available. So you can potentially have 20 different images (for 5 cameras operating in all 4 modes) for each instance (we only used 1 image here). How can combining all this information help us improve the model we just trained?"
172+
"AirSim is a lot more than the dataset we provided you. For starters, we only used one camera and used it only in RGB mode. AirSim lets you collect data in depth view, segmentation view, surface normal view etc for each of the cameras available. So you can potentially have 20 different images (for 5 cameras operating in all 4 modes) for each instance (we only used 1 image here). How can combining all this information help us improve the model we just trained?"
171173
]
172174
}
173175
],
@@ -187,7 +189,7 @@
187189
"name": "python",
188190
"nbconvert_exporter": "python",
189191
"pygments_lexer": "ipython3",
190-
"version": "3.5.3"
192+
"version": "3.6.0"
191193
}
192194
},
193195
"nbformat": 4,

0 commit comments

Comments
 (0)