RANDOM FOREST GONZAGA EDWARD OKANE 2019/HD05/25243U EGOR ELEAZAR 2019/HD05/25247U
AREAS OF DISCUSSION • What is a decision tree and how does it work • Key terms • How does a decision tree work • What is Random Forest • Why use Random Forest • How Random Forest works • Applications of random Forest • Use Case
WHAT IS A DECISION TREE AND HOW DOES IT WORK? • Is a tree shaped diagram used to determine a course of action. • Each branch of the tree represents a possible decision, occurrence, or reaction.
KEY TERMS 1. Entropy – is the measure of randomness or unpredictability in the dataset
2. Information gain – the measure of decrease in entropy after the dataset is split
3. Leaf Node – carries the classification or the decision 4. Decision Node – has two or more branches
5. Root Node – The top most decision node
HOW DOES A DECICION TREE WORK Problem statement: To classify the different types of fruits in the bowl based on different features • The dataset (Bowl) is quite messy and has a high entropy • To split the data, we have to frame the conditions that split the data in such a way that the information gain is the highest NB: Gain is the measure of decrease in entropy after splitting
• We will try to choose a condition that gives us the highest gain • We will do that by splitting the data using each condition and checking the gain that we get out of them NB: The condition that gives us the highest gain will be used to make the first split
• After splitting based on diameter, the entropy has reduced
• We then split the right node further based on color • We can then predict a lemon with 100% accuracy • Apple can also be predicted with 100% accuracy
WHAT IS RANDOM FOREST • Is a method that operates by constructing multiple decision trees • Bunch of decision trees bundled together • Based on the idea “The wisdom of the crowd” • Gets predictions from each tree and selects the best solution by voting • The decision of majority of the trees is chosen by the random forest as the final decision • Example - Getting recommendations from friends for vacation destinations • Can be used for regression and classification
WHY DO WE USE RANDOM FOREST? 1. No overfitting • In overfitting, the model learns “too much” from the training data set • Overfitting is the case where the overall cost is really small, but the generalization of the model is unreliable. • What use is a model that has learned very well from the training data but still can’t make reliable predictions for new inputs? • We always want to find the trend, not fit the line to all the data points • Training time is less
2. High Accuracy • Random forest runs efficiently on large databases • Produces highly accurate predictions for large data 3. Estimates missing data • Maintains accuracy when a large proportion of data is missing • E.g. different sets of demographic statistics coming in from various areas where;  One set is missing number of children in the house  Another set missing size of the house Random forest will look at the sets differently and build 2 different trees, then guesses which one fits better
HOW RANDOM FOREST WORKS • Step 1: Select the random samples from a given dataset • Step 2 : Construct a decision tree from each sample and get a prediction result from each decision tree • Step 3: Perform a vote for each predicted result • Step 4: Select the prediction result with the most votes as the final prediction
• Lets take this blackened fruit and try to classify it • This is an example where random forest works really good when missing data Diameter = 3 Colour = Orange Grows in summer = Yes Shape = Circle
APPLICATIONS 1. Kinect • Game console developed by Microsoft. • Uses infrared to track body movements and recreates it in the game.
2. Remote Sensing • Used in Enhanced Thematic Devices (ETM) on satellites to acquire high-resolution imaging information of the Earth’s surface • Less training time • Higher accuracy
3. Object Detection • Multiclass object detection e.g. Traffic where the algorithm is used in sorting out different types of vehicles such as buses, lorries, etc • Provides better detection in complicated environments

RANDOM FOREST for machine and deep learning for computer science

  • 1.
    RANDOM FOREST GONZAGA EDWARDOKANE 2019/HD05/25243U EGOR ELEAZAR 2019/HD05/25247U
  • 2.
    AREAS OF DISCUSSION •What is a decision tree and how does it work • Key terms • How does a decision tree work • What is Random Forest • Why use Random Forest • How Random Forest works • Applications of random Forest • Use Case
  • 3.
    WHAT IS ADECISION TREE AND HOW DOES IT WORK? • Is a tree shaped diagram used to determine a course of action. • Each branch of the tree represents a possible decision, occurrence, or reaction.
  • 4.
    KEY TERMS 1. Entropy– is the measure of randomness or unpredictability in the dataset
  • 5.
    2. Information gain– the measure of decrease in entropy after the dataset is split
  • 6.
    3. Leaf Node– carries the classification or the decision 4. Decision Node – has two or more branches
  • 7.
    5. Root Node– The top most decision node
  • 8.
    HOW DOES ADECICION TREE WORK Problem statement: To classify the different types of fruits in the bowl based on different features • The dataset (Bowl) is quite messy and has a high entropy • To split the data, we have to frame the conditions that split the data in such a way that the information gain is the highest NB: Gain is the measure of decrease in entropy after splitting
  • 9.
    • We willtry to choose a condition that gives us the highest gain • We will do that by splitting the data using each condition and checking the gain that we get out of them NB: The condition that gives us the highest gain will be used to make the first split
  • 10.
    • After splittingbased on diameter, the entropy has reduced
  • 11.
    • We thensplit the right node further based on color • We can then predict a lemon with 100% accuracy • Apple can also be predicted with 100% accuracy
  • 12.
    WHAT IS RANDOMFOREST • Is a method that operates by constructing multiple decision trees • Bunch of decision trees bundled together • Based on the idea “The wisdom of the crowd” • Gets predictions from each tree and selects the best solution by voting • The decision of majority of the trees is chosen by the random forest as the final decision • Example - Getting recommendations from friends for vacation destinations • Can be used for regression and classification
  • 14.
    WHY DO WEUSE RANDOM FOREST? 1. No overfitting • In overfitting, the model learns “too much” from the training data set • Overfitting is the case where the overall cost is really small, but the generalization of the model is unreliable. • What use is a model that has learned very well from the training data but still can’t make reliable predictions for new inputs? • We always want to find the trend, not fit the line to all the data points • Training time is less
  • 15.
    2. High Accuracy •Random forest runs efficiently on large databases • Produces highly accurate predictions for large data 3. Estimates missing data • Maintains accuracy when a large proportion of data is missing • E.g. different sets of demographic statistics coming in from various areas where;  One set is missing number of children in the house  Another set missing size of the house Random forest will look at the sets differently and build 2 different trees, then guesses which one fits better
  • 16.
    HOW RANDOM FORESTWORKS • Step 1: Select the random samples from a given dataset • Step 2 : Construct a decision tree from each sample and get a prediction result from each decision tree • Step 3: Perform a vote for each predicted result • Step 4: Select the prediction result with the most votes as the final prediction
  • 17.
    • Lets takethis blackened fruit and try to classify it • This is an example where random forest works really good when missing data Diameter = 3 Colour = Orange Grows in summer = Yes Shape = Circle
  • 19.
    APPLICATIONS 1. Kinect • Gameconsole developed by Microsoft. • Uses infrared to track body movements and recreates it in the game.
  • 20.
    2. Remote Sensing •Used in Enhanced Thematic Devices (ETM) on satellites to acquire high-resolution imaging information of the Earth’s surface • Less training time • Higher accuracy
  • 21.
    3. Object Detection •Multiclass object detection e.g. Traffic where the algorithm is used in sorting out different types of vehicles such as buses, lorries, etc • Provides better detection in complicated environments