Naive Bayes Classifier Tutorial | Naive Bayes Classifier Example | Naive Bayes in R | Edureka

www.edureka.co/data-scienceEDUREKA DATA SCIENCE CERTIFICATION TRAINING

www.edureka.co/data-scienceEDUREKA DATA SCIENCE CERTIFICATION TRAINING What to expect?  What is Machine Learning?  Introduction to Classification  Classification Algorithms  What is Naive Bayes?  Use Cases of Naive Bayes  Demo – Employee Salary Prediction

www.edureka.co/data-scienceEDUREKA DATA SCIENCE CERTIFICATION TRAINING What is Machine Learning?

www.edureka.co/data-scienceEDUREKA DATA SCIENCE CERTIFICATION TRAINING What is Machine Learning?  Machine Learning explores the study and construction of algorithms that can learn from and make predictions on data.  Closely related to computational statistics.  Used to devise complex models and algorithms that lend themselves to a prediction which in commercial use is known as predictive analytics. Speech Recognition Face Recognition Anti Virus Weather Prediction

www.edureka.co/data-scienceEDUREKA DATA SCIENCE CERTIFICATION TRAINING Supervised vs Unsupervised Learning Supervised Learning Unsupervised Learning Classification is the result of supervised learning which means that there is a known label that you want the system to generate. Clustering is the result of unsupervised learning which means that you’ve seen lots of examples, but don’t have labels. E.g. If you built a fruit classifier, the labels will be “this is an orange, this is an apple and this is a banana”, based on showing the classifier examples of apples, oranges and bananas. E.g. In the same example, a fruit clustering will categorize as “fruits with soft skin and lots of dimples”, “fruits with shiny hard skin” and “elongated yellow fruits”.

www.edureka.co/data-scienceEDUREKA DATA SCIENCE CERTIFICATION TRAINING Introduction to Classification

www.edureka.co/data-scienceEDUREKA DATA SCIENCE CERTIFICATION TRAINING Introduction to Classification  Classification is the problem of identifying to which set of categories a new observation belongs  It is based on the training set of data containing observations. Figure: Examples of Classification

www.edureka.co/data-scienceEDUREKA DATA SCIENCE CERTIFICATION TRAINING Classification Algorithms

www.edureka.co/data-scienceEDUREKA DATA SCIENCE CERTIFICATION TRAINING Classification Algorithms Classifier Quadratic Linear SVM Logistic Regression Naive Bayes Neural Networks Decision Trees Kernel Estimation Perceptron Naive Bayes

www.edureka.co/data-scienceEDUREKA DATA SCIENCE CERTIFICATION TRAINING What is Naive Bayes?

www.edureka.co/data-scienceEDUREKA DATA SCIENCE CERTIFICATION TRAINING What is Naive Bayes? Let us understand Naive Bayes with the help of an example Hi! I just cannot seem to figure out which are the best days to play football with my friends. Can you help me out? Summer Monsoon Winter Sunny No Sun Windy No Wind All possible weather combinations

www.edureka.co/data-scienceEDUREKA DATA SCIENCE CERTIFICATION TRAINING What is Naive Bayes? That is perfect. We will be using Naive Bayes algorithm to predict if you should play on a particular day or not. I have noted down all the days it was good/bad to play football and the combination of weather metrics on that particular day.

www.edureka.co/data-scienceEDUREKA DATA SCIENCE CERTIFICATION TRAINING What is Naive Bayes? Summer Monsoon Winter No Yes Season Sunny Case 1 – Sunny  We have categorized the probability to play into “High” (P>0.5) and “Low” (P<0.5)  Big circles represent “High”, i.e. probability greater than 0.5  Small circles represent “Low”, i.e. probability less than 0.5 Case 1 – Sunny Moving further we can draw charts based on the probabilities of days favouring games

www.edureka.co/data-scienceEDUREKA DATA SCIENCE CERTIFICATION TRAINING  The second attribute is the wind speeds on a particular day.  Let us look at how wind affects the chances of playing Football on a particular day. What is Naive Bayes? Summer Monsoon Winter No Yes Season Windy Case 2 – Windy Here, we will look at days where there was wind and when it was good to play Case 2 – Windy

www.edureka.co/data-scienceEDUREKA DATA SCIENCE CERTIFICATION TRAINING What is Naive Bayes? Summer Monsoon Winter (Sunny = No, Windy = Yes) Sunny = No (Sunny = No, Windy = No) Summer Monsoon Winter (Sunny = Yes, Windy = Yes) Sunny = Yes (Sunny = Yes, Windy = No) Here, we have the complete set of attributes and whether to play on that day or not.

www.edureka.co/data-scienceEDUREKA DATA SCIENCE CERTIFICATION TRAINING What is Naive Bayes? If you notice in summer, it is advisable to play when there is no sun. But the second graph shows a different picture. This is because a day in Summer which is not Sunny might have P > 0.5 but when there is no wind, the Posterior probability P < 0.5

www.edureka.co/data-scienceEDUREKA DATA SCIENCE CERTIFICATION TRAINING  Naive Bayes classifier is a simple probabilistic classifier based on applying Bayes' theorem with strong (naive) independence assumptions between the features.  Bayes' theorem is stated mathematically as the following equation: where A and B are events and P(B) ≠ 0. What is Naive Bayes?

www.edureka.co/data-scienceEDUREKA DATA SCIENCE CERTIFICATION TRAINING Understanding Bayes’ Theorem P(c|x) = P(x|c) P(c) P(x) Likelihood Class Prior Probability Posterior Probability Predictor Prior Probability  Let us understand how Bayes’ Theorem can be used in Naive Bayes classifier:

www.edureka.co/data-scienceEDUREKA DATA SCIENCE CERTIFICATION TRAINING Understanding Bayes’ Theorem In Figure 1, We have the Posterior Probability of Sunny across seasons excluding Wind speed. In Figure 2, We have the Posterior Probabilities ( E.g. Sunny = No, Windy = Yes and Season = Summer ) Figure 1 Figure 2

www.edureka.co/data-scienceEDUREKA DATA SCIENCE CERTIFICATION TRAINING Understanding Bayes’ Theorem We can use Naive Bayes Classifier to predict whether to play Football on ( Season = Winter, Sunny = No , Windy = Yes ). Our Demo will help you clearly understand Naive Bayes.

www.edureka.co/data-scienceEDUREKA DATA SCIENCE CERTIFICATION TRAINING Understanding Bayes’ Theorem Yes No 3 2 4 0 2 3 Summer Monsoon Winter Season Play Frequency Table Yes No 3 4 6 1 Yes No Sunny Play Frequency Table Yes No 6 2 3 3 Yes No Windy Play Frequency Table From the dataset we have obtained, we will populate frequency tables for each of the attribute

www.edureka.co/data-scienceEDUREKA DATA SCIENCE CERTIFICATION TRAINING Understanding Bayes’ Theorem For each of the frequency tables, we will find the likelihoods for each of the cases P(c | x) = P(Yes | Summer) = P(Summer | Yes)* P(Yes) / P(Summer) = (0.33 x 0.64) /0.36 = 0.60 Likelihood of ‘Yes’ given Summer is: Yes No 3/9 2/5 4/9 0/5 2/9 3/5 Summer Monsoon Winter Season Play Likelihood Table 9/14 5/14 5/14 4/14 5/14 P(x | c) = P(Summer | Yes) = 3/9 = 0.33 P(c) = P(Yes) = 9/14 = 0.64 P(x) = P(Summer) = 5/14 = 0.36 Here, c = Play and x = Variables like Season, Sunny & Windy.

www.edureka.co/data-scienceEDUREKA DATA SCIENCE CERTIFICATION TRAINING Understanding Bayes’ Theorem Let us use the likelihood table to predict whether to play football on ( Season = Winter, Sunny = No , Windy = Yes ) P(c | x) = P(Play = Yes | Winter, Sunny = No, Windy = Yes) = P(Winter | Yes) * P(Sunny = No | Yes) * P(Windy = Yes | Yes) * P(Yes) P(Winter) * P(Sunny = No) * P(Windy = Yes) = (2/9) * (6/9) * (6/9) * (9/14) / (5/14) * (7/14) * (8/14) = 0.6223 Since the probability is greater than 0.5, we should play football on that day. Yayiee!!

www.edureka.co/data-scienceEDUREKA DATA SCIENCE CERTIFICATION TRAINING Use Cases of Naive Bayes

www.edureka.co/data-scienceEDUREKA DATA SCIENCE CERTIFICATION TRAINING Use Cases of Naive Bayes Email Spam Detection Categorizing News Face Recognition Sentiment Analysis

www.edureka.co/data-scienceEDUREKA DATA SCIENCE CERTIFICATION TRAINING Use Cases of Naive Bayes Weather Prediction Digit RecognitionMedical Diagnosis

www.edureka.co/data-scienceEDUREKA DATA SCIENCE CERTIFICATION TRAINING Demo – Employee Salary Prediction

www.edureka.co/data-scienceEDUREKA DATA SCIENCE CERTIFICATION TRAINING Demo – Problem Statement Problem Statement: To devise a model to predict an employee’s salary based on the given set of attributes using Naive Bayes classifier.  We have an Employee Dataset where there are 14 attributes and our output variable is Employee’s Salary.  We will use Naive Bayes Classifier to predict an Employee’s Salary as high(>50k) or low(<50k)by finding out the probabilities for the given attribute combination.

www.edureka.co/data-scienceEDUREKA DATA SCIENCE CERTIFICATION TRAINING Demo – Employee Salary Prediction Feature Selection Divide Dataset Implement Model Optimize Model Prediction Model Validation Data Acquisition

www.edureka.co/data-scienceEDUREKA DATA SCIENCE CERTIFICATION TRAINING Demo – Employee Salary Prediction Field Description Age_Of_emp Age of the employee Emp_Stat_type Type of the employment industry srnumber Serial number of the employee Edu_of_Emp Employee education details Edu_Cat Employee’s education category marital_Status Employee marital status Occ_Of_Emp Job description of the employee Emp_rel_status Employee relationship status Emp_race_type Race of the employee sex_of_emp Sex of the employee capital_gain Income from investment sources apart from wages/salary capital_loss Losses from investment sources apart from wages/salary Work_hour_in_week Number of weekly working hours country_of_res Country of residence Emp_sal Employee’s salary Feature Selection Divide Dataset Implement Model Optimize Model Prediction Model Validation Data Acquisition

www.edureka.co/data-scienceEDUREKA DATA SCIENCE CERTIFICATION TRAINING Demo – Employee Salary Prediction Feature Selection Divide Dataset Implement Model Optimize Model Prediction Model Validation Data Acquisition  From the following fields, we need to filter out unnecessary columns which will not affect the Employee’s Salary.  We will be removing fields srnumber, marital_Status, Emp_rel_status, Emp_race_type, sex_of_emp, capital_gain and capital_loss because these fields are factors which do not affect a person’s salary.  The remaining fields will be used to build our model.

www.edureka.co/data-scienceEDUREKA DATA SCIENCE CERTIFICATION TRAINING Demo – Employee Salary Prediction We will divide our entire dataset into two subsets as:  Training dataset -> To train the model  Testing dataset -> To validate and make predictions Feature Selection Divide Dataset Implement Model Optimize Model Prediction Model Validation Data Acquisition

www.edureka.co/data-scienceEDUREKA DATA SCIENCE CERTIFICATION TRAINING Demo – Employee Salary Prediction  We model the Naive Bayes using the library ‘e1071’ on the training dataset that we created just now.  The model is called emp_nb. Feature Selection Divide Dataset Implement Model Optimize Model Prediction Model Validation Data Acquisition

www.edureka.co/data-scienceEDUREKA DATA SCIENCE CERTIFICATION TRAINING Demo – Employee Salary Prediction The following is the output from emp_nb model Feature Selection Divide Dataset Implement Model Optimize Model Prediction Model Validation Data Acquisition Likelihood of High & Low Salaries Likelihood of Employee Department against High & Low Salaries

www.edureka.co/data-scienceEDUREKA DATA SCIENCE CERTIFICATION TRAINING Demo – Employee Salary Prediction Optimizing Models refers to modifying our model so as to achieve highest accuracy. If the P-value is > 0.05, then we should reject the model. Our P-value is lesser than 0.05, so our model is acceptable. Kappa is the value obtained by: Kappa = (totalAccuracy - randomAccuracy) / (1 - randomAccuracy) Naive Bayes classifier can be further improved using the following steps:  Include Laplace Correction  Normalization Feature Selection Divide Dataset Implement Model Optimize Model Prediction Model Validation Data Acquisition

www.edureka.co/data-scienceEDUREKA DATA SCIENCE CERTIFICATION TRAINING Demo – Employee Salary Prediction  We can go ahead and check the validation of the predictions.  We will populate the Confusion Matrix which shows all the metrics to measure the accuracy, sensitivity, specificity, prevalence, etc. Feature Selection Divide Dataset Implement Model Optimize Model Prediction Model Validation Data Acquisition

www.edureka.co/data-scienceEDUREKA DATA SCIENCE CERTIFICATION TRAINING Demo – Employee Salary Prediction  The final step in our project is to predict the Salary of the employee based on the Naive Bayes model that we have created.  The prediction for our specific input is Low. Feature Selection Divide Dataset Implement Model Optimize Model Prediction Model Validation Data Acquisition

www.edureka.co/data-scienceEDUREKA DATA SCIENCE CERTIFICATION TRAINING Summary

www.edureka.co/data-scienceEDUREKA DATA SCIENCE CERTIFICATION TRAINING Introduction to ClassificationWhat is Machine Learning? Summary Use Cases of Naive BayesWhat is Naive Bayes? Demo Classification Algorithms

www.edureka.co/data-scienceEDUREKA DATA SCIENCE CERTIFICATION TRAINING Thank You … Questions/Queries/Feedback

Naive Bayes Classifier Tutorial | Naive Bayes Classifier Example | Naive Bayes in R | Edureka

In this document

More Related Content

What's hot

Similar to Naive Bayes Classifier Tutorial | Naive Bayes Classifier Example | Naive Bayes in R | Edureka

More from Edureka!

Recently uploaded

Naive Bayes Classifier Tutorial | Naive Bayes Classifier Example | Naive Bayes in R | Edureka