AREAS OF DISCUSSION •What is a decision tree and how does it work • Key terms • How does a decision tree work • What is Random Forest • Why use Random Forest • How Random Forest works • Applications of random Forest • Use Case
3.
WHAT IS ADECISION TREE AND HOW DOES IT WORK? • Is a tree shaped diagram used to determine a course of action. • Each branch of the tree represents a possible decision, occurrence, or reaction.
4.
KEY TERMS 1. Entropy– is the measure of randomness or unpredictability in the dataset
5.
2. Information gain– the measure of decrease in entropy after the dataset is split
6.
3. Leaf Node– carries the classification or the decision 4. Decision Node – has two or more branches
HOW DOES ADECICION TREE WORK Problem statement: To classify the different types of fruits in the bowl based on different features • The dataset (Bowl) is quite messy and has a high entropy • To split the data, we have to frame the conditions that split the data in such a way that the information gain is the highest NB: Gain is the measure of decrease in entropy after splitting
9.
• We willtry to choose a condition that gives us the highest gain • We will do that by splitting the data using each condition and checking the gain that we get out of them NB: The condition that gives us the highest gain will be used to make the first split
• We thensplit the right node further based on color • We can then predict a lemon with 100% accuracy • Apple can also be predicted with 100% accuracy
12.
WHAT IS RANDOMFOREST • Is a method that operates by constructing multiple decision trees • Bunch of decision trees bundled together • Based on the idea “The wisdom of the crowd” • Gets predictions from each tree and selects the best solution by voting • The decision of majority of the trees is chosen by the random forest as the final decision • Example - Getting recommendations from friends for vacation destinations • Can be used for regression and classification
14.
WHY DO WEUSE RANDOM FOREST? 1. No overfitting • In overfitting, the model learns “too much” from the training data set • Overfitting is the case where the overall cost is really small, but the generalization of the model is unreliable. • What use is a model that has learned very well from the training data but still can’t make reliable predictions for new inputs? • We always want to find the trend, not fit the line to all the data points • Training time is less
15.
2. High Accuracy •Random forest runs efficiently on large databases • Produces highly accurate predictions for large data 3. Estimates missing data • Maintains accuracy when a large proportion of data is missing • E.g. different sets of demographic statistics coming in from various areas where; One set is missing number of children in the house Another set missing size of the house Random forest will look at the sets differently and build 2 different trees, then guesses which one fits better
16.
HOW RANDOM FORESTWORKS • Step 1: Select the random samples from a given dataset • Step 2 : Construct a decision tree from each sample and get a prediction result from each decision tree • Step 3: Perform a vote for each predicted result • Step 4: Select the prediction result with the most votes as the final prediction
17.
• Lets takethis blackened fruit and try to classify it • This is an example where random forest works really good when missing data Diameter = 3 Colour = Orange Grows in summer = Yes Shape = Circle
19.
APPLICATIONS 1. Kinect • Gameconsole developed by Microsoft. • Uses infrared to track body movements and recreates it in the game.
20.
2. Remote Sensing •Used in Enhanced Thematic Devices (ETM) on satellites to acquire high-resolution imaging information of the Earth’s surface • Less training time • Higher accuracy
21.
3. Object Detection •Multiclass object detection e.g. Traffic where the algorithm is used in sorting out different types of vehicles such as buses, lorries, etc • Provides better detection in complicated environments