Classification: Basic Concepts and
Decision Trees
 and Decision Rules
Data Classification and Prediction
 Data classification
  classification
  prediction
 Methods of classification
  decision tree induction
  Forward and Back-propagation (Neural Network)
  Bayesian classification
  Association rule mining
Classification: Definition
  Given a collection of records (training set )
  Each record contains a set of attributes, one of the attributes is
 the class.
  Find a model for class attribute as a function of
 the values of other attributes.
  Goal: previously unseen records should be
 assigned a class as accurately as possible.
  A test set is used to determine the accuracy of the model.
 Usually, the given data set is divided into training and test sets,
 with training set used to build the model and test set used to
 validate it.
Illustrating Classification Task
 Tid Attrib1 Attrib2 Attrib3 Class Learning
 1 Yes Large 125K No
 algorithm
 2 No Medium 100K No
 3 No Small 70K No
 4 Yes Medium 120K No
 Induction
 5 No Large 95K Yes
 6 No Medium 60K No
 7 Yes Large 220K No Learn
 8 No Small 85K Yes Model
 9 No Medium 75K No
 10 No Small 90K Yes
 Model
 10
 Training Set
 Apply
 Tid Attrib1 Attrib2 Attrib3 Class Model
 11 No Small 55K ?
 12 Yes Medium 80K ?
 13 Yes Large 110K ? Deduction
 14 No Small 95K ?
 15 No Large 67K ?
 10
 Test Set
Examples of Classification Task
 Predicting tumor cells as benign or malignant
 Classifying credit card transactions
 as legitimate or fraudulent
 Classifying secondary structures of protein
 as alpha-helix, beta-sheet, or random
 coil
 Categorizing news stories as finance,
 weather, entertainment, sports, etc.
Classification Techniques
 Decision Tree based Methods
 Rule-based Methods / Decision Rules
 Neural Networks (NN)
  Artificial Neural Network (ANN)
  Recurrence Neural Network (RNN)
 Association Rules (Apriori Algorithm)
 Naïve Bayes and Bayesian Belief Networks
 Memory based reasoning(LSTM)
 Support Vector Machines (SVM)
 Description of
 Decision Rules or Trees
 Native appeal for users
 Presentation Forms
  graphically –decision trees (ID3: Iterative Dichotomiser 3)
 (dīˈkädəməser)
  “if, then” statements (decision rules)
 What They Look Like
 Works like a flow chart
 Looks like an upside down tree
 Nodes
  appear as rectangles or circles
  represent test or decision
 Lines or branches - represent outcome of a test
 Circles - terminal (leaf) nodes
 Top or starting node- root node
 Internal nodes - rectangles
 Example of a Decision Tree
 Tid Refund Marital Taxable
 Splitting Attributes
 Status Income Cheat
 1 Yes Single 125K No
 2 No Married 100K No Refund
 3 No Single 70K No
 Yes No
 4 Yes Married 120K No NO MarSt
 5 No Divorced 95K Yes Married
 Single, Divorced
 6 No Married 60K No
 7 Yes Divorced 220K No TaxInc NO
 8 No Single 85K Yes < 80K > 80K
 9 No Married 75K No
 NO YES
 10 No Single 90K Yes
10
 Training Data Model: Decision Tree
 Another Example of Decision Tree
 MarSt Single,
 Married Divorced
 Tid Refund Marital Taxable
 Status Income Cheat
 NO Refund
 1 Yes Single 125K No
 Yes No
 2 No Married 100K No
 3 No Single 70K No NO TaxInc
 4 Yes Married 120K No < 80K > 80K
 5 No Divorced 95K Yes
 NO YES
 6 No Married 60K No
 7 Yes Divorced 220K No
 8 No Single 85K Yes
 9 No Married 75K No There could be more than one tree that
 10 No Single 90K Yes fits the same data!
10
Decision Tree Classification Task
 Tid Attrib1 Attrib2 Attrib3 Class
 Tree
 1 Yes Large 125K No Induction
 2 No Medium 100K No algorithm
 3 No Small 70K No
 4 Yes Medium 120K No
 Induction
 5 No Large 95K Yes
 6 No Medium 60K No
 7 Yes Large 220K No Learn
 8 No Small 85K Yes Model
 9 No Medium 75K No
 10 No Small 90K Yes
 Model
 10
 Training Set
 Apply Decision
 Model Tree
 Tid Attrib1 Attrib2 Attrib3 Class
 11 No Small 55K ?
 12 Yes Medium 80K ?
 13 Yes Large 110K ?
 Deduction
 14 No Small 95K ?
 15 No Large 67K ?
 10
 Test Set
 How They Work
 Decision rules - partition sample of data
 Terminal node (leaf) indicates the class assignment
 Tree partitions samples into mutually exclusive groups
 One group for each terminal node
 All paths
  start at the root node
  end at a leaf
 Each path represents a decision rule
  joining (AND) of all the tests along that path
  separate paths that result in the same class are disjunctions (ORs)
 All paths - mutually exclusive
  for any one case - only one path will be followed
  false decisions on the left branch
  true decisions on the right branch
Apply Model to Test Data
 Test Data
 Start from the root of tree. Refund Marital Taxable
 Status Income Cheat
 No Married 80K ?
 Refund 10
 Yes No
 NO MarSt
 Single, Divorced Married
 TaxInc NO
 < 80K > 80K
 NO YES
Apply Model to Test Data
 Test Data
 Refund Marital Taxable
 Status Income Cheat
 No Married 80K ?
 Refund 10
 Yes No
 NO MarSt
 Single, Divorced Married
 TaxInc NO
 < 80K > 80K
 NO YES
Apply Model to Test Data
 Test Data
 Refund Marital Taxable
 Status Income Cheat
 No Married 80K ?
 Refund 10
 Yes No
 NO MarSt
 Single, Divorced Married
 TaxInc NO
 < 80K > 80K
 NO YES
Apply Model to Test Data
 Test Data
 Refund Marital Taxable
 Status Income Cheat
 No Married 80K ?
 Refund 10
 Yes No
 NO MarSt
 Single, Divorced Married
 TaxInc NO
 < 80K > 80K
 NO YES
Apply Model to Test Data
 Test Data
 Refund Marital Taxable
 Status Income Cheat
 No Married 80K ?
 Refund 10
 Yes No
 NO MarSt
 Single, Divorced Married
 TaxInc NO
 < 80K > 80K
 NO YES
Apply Model to Test Data
 Test Data
 Refund Marital Taxable
 Status Income Cheat
 No Married 80K ?
 Refund 10
 Yes No
 NO MarSt
 Single, Divorced Married Assign Cheat to “No”
 TaxInc NO
 < 80K > 80K
 NO YES
Decision Tree Classification Task
 Tid Attrib1 Attrib2 Attrib3 Class
 Tree
 1 Yes Large 125K No Induction
 2 No Medium 100K No algorithm
 3 No Small 70K No
 4 Yes Medium 120K No
 Induction
 5 No Large 95K Yes
 6 No Medium 60K No
 7 Yes Large 220K No Learn
 8 No Small 85K Yes Model
 9 No Medium 75K No
 10 No Small 90K Yes
 Model
 10
 Training Set
 Apply Decision
 Tid Attrib1 Attrib2 Attrib3 Class
 Model Tree
 11 No Small 55K ?
 12 Yes Medium 80K ?
 13 Yes Large 110K ?
 Deduction
 14 No Small 95K ?
 15 No Large 67K ?
 10
 Test Set
Entropy
 S is a sample of training examples
 p is the proportion of positive examples
 n is the proportion of negative examples
 Calculate
  I(pi,ni)=((-p/p+n)log2(p/p+n))–((n/p+n)log2(n/p+n))
 Calculate
  Entropy(S) = pi+ni/P I(pi,ni)
 20
Splitting Based on INFO...
 Information Gain:
  Entropy ( p)    Entropy (i ) 
  n
 k
 GAIN i
  n 
 split i 1
 Parent Node, p is split into k partitions;
 ni is number of records in partition i
Building Decision Tree
Step – 1 Refund
 Yes
 Marital Status
 Single
 Taxable Inc.
 125K
 Cheat
 No
 Calculate “Class” attribute Entropy.
 No Married 100K No
 No Single 70K No
 Class attribute is “Cheat” Yes
 No
 Married
 Divorced
 120K
 95K
 No
 Yes
 No Married 60K No
 Yes Divorced 220K No
 No Single 85K Yes
 No Married 75K No
 p=3, n=7, p + n=10
 No Single 90K Yes
 
Entropy(Cheet) = (-p/p+n)log2(p/p+n)–(n/p+n)log2(n/p+n)
 =-3/10 log2 3/10 – 7/10 log2 7/10
 = ((-3/10)(log(3/10) / log(2))) – ((7/10)(log (7/10) / log(2)))
 = 0.88
Building Decision Tree
 Step – 2 Refund
 Yes
 Marital Status
 Single
 Taxable Inc.
 125K
 Cheat
 No
Calculate Entropy of other attributes
 No Married 100K No
 No Single 70K No
such as Refund, Marital Status and Yes
 No
 Married
 Divorced
 120K
 95K
 No
 Yes
Taxable Income. No Married 60K No
 Yes Divorced 220K No
 Entropy (Refund) No
 No
 Single
 Married
 85K
 75K
 Yes
 No
 No Single 90K Yes
 pi ni I(pi,ni)
 Yes 0 3 0
 No 3 4 0.98
I(3,4)=-3/7log23/7-4/7log24/7 I(0,3)=-0/3log20/3-3/3log23/3
 =0.98 =0
E(Refund)=7/10x0 + 7/10 x 0.98=0.686
Gain(Refund)=E(Cheat)-E(Refund)
 =0.88-0.686=0.19
 Refund Marital Status Taxable Inc. Cheat
 Yes Single 125K No
 No Married 100K No
Building Decision Tree
 No Single 70K No
 Yes Married 120K No
 No Divorced 95K Yes
 No Married 60K No
 Yes Divorced 220K No
 Entropy (Marital St.) No
 No
 Single
 Married
 85K
 75K
 Yes
 No
 No Single 90K Yes
 pi ni I(pi,ni)
 Single 2 2 1
 Married 0 4 0
 Divorced 1 1 1
I(2,2)=-2/4log22/4-2/4log22/4 I(2,2)=-1/2log21/2-1/2log21/2
 =1 =1
I(2,2)=-0/4log20/4-4/4log24/4
 =0
E(Marital Status)=4/10x1 + 4/10 x 0 + 4/10 x 1 = 0.6
Gain(Marital Status)=E(Cheat)-E(Refund)
 =0.88-0.6=0.28
 Refund Marital Status Taxable Inc. Cheat
 Yes Single 125K No
 No Married 100K No
Building Decision Tree
 No Single 70K No
 Yes Married 120K No
 No Divorced 95K Yes
 No Married 60K No
 Yes Divorced 220K No
 Entropy (Taxable Inc.) No
 No
 Single
 Married
 85K
 75K
 Yes
 No
 No Single 90K Yes
 pi ni I(pi,ni)
 >80 3 4 1
 <80 0 3 0
I(3,4)=-3/7log23/7-4/7log24/7
 =1
I(0,3)=-0/3log20/3-3/3log23/3
 =0
E(Marital Status)=7/10x1 + 3/10 x 0 = 0.686
Gain(Marital Status)=E(Cheat)-E(Refund)
 =0.88-0.686=0.19
 Refund Marital Status Taxable Inc. Cheat
 Yes Single 125K No
 No Married 100K No
Building Decision Tree
 No Single 70K No
 Yes Married 120K No
 No Divorced 95K Yes
 No Married 60K No
 Yes Divorced 220K No
 No Single 85K Yes
Which attribute got larger Gain? No
 No
 Married
 Single
 75K
 90K
 No
 Yes
  Gain(Refund) =0.19
  Gain(Marital Status) =0.28 (Greater than other)
  Gain(Taxable Inc.) =0.19
 Refund Marital Status Taxable Inc. Cheat
 Yes Single 125K No
 No Married 100K No
Building Decision Tree
 No Single 70K No
 Yes Married 120K No
 No Divorced 95K Yes
 No Married 60K No
 Yes Divorced 220K No
 No Single 85K Yes
 Step – 2 No
 No
 Married
 Single
 75K
 90K
 No
 Yes
 Split the attributes according to the Marital Status
 Marital Status Refund Taxable Inc. Cheat
 MarSt Married No 100K No
 Married Single, Married Yes 120K No
 Divorced Married No 60K No
 Married No 75K No
 NO Single Yes 125K No
 Single No 70K No
 Single No 85K Yes
 Single No 90K Yes
 Divorced No 95K Yes
 Divorced Yes 220K No
 Marital Status Refund Taxable Inc. Cheat
 Single Yes 125K No
 Single No 70K No
 Single No 85K Yes
Building Decision Tree Single
 Divorced
 Divorced
 No
 No
 Yes
 90K
 95K
 220K
 Yes
 Yes
 No
Step – 3
 Calculate the Entropy with respect to Marital
 Status=“Single” Refund Taxable Inc. Cheat
 Yes 125K No
 Entropy (MS=Single =>Refund) No
 No
 70K
 85K
 No
 Yes
 No 90K Yes
 pi ni I(pi,ni)
 Yes 0 1 0
 No 2 1 0.91
I(0,1)=-0/1log20/1-1/1log21/1 I(2,1)=-2/3log22/3-1/3log21/3
 =0 =0.91
E(MS.single=Refund)=1/10x0 + 3/10 x 0.91=0.27
Gain(MS.single=Refund)=E(Cheat)-E(Refund)
 =0.88-0.27=0.60
 Marital Status Refund Taxable Inc. Cheat
 Single Yes 125K No
 Single No 70K No
 Single No 85K Yes
Building Decision Tree Single
 Divorced
 Divorced
 No
 No
 Yes
 90K
 95K
 220K
 Yes
 Yes
 No
Step – 3
 Calculate the Entropy with respect to Marital
 Status=“Single” Taxable Inc. Cheat
 125K No
 Entropy (MS=Single =>Taxable Inc.) 70K
 85K
 No
 Yes
 90K Yes
 pi ni I(pi,ni)
 <80 0 1 0
 >80 2 1 0.91
I(0,1)=-0/1log20/1-1/1log21/1 I(2,1)=-2/3log22/3-1/3log21/3
 =0 =0.91
E(MS.single=Refund)=1/10x0 + 3/10 x 0.91=0.27
Gain(MS.single=Refund)=E(Cheat)-E(Refund)
 =0.88-0.27=0.60
 Building Decision Tree
 Marital Status Refund Taxable Inc. Cheat
 Single Yes 125K No
 MarSt Single No 70K No
 Married Single,
 Single No 85K Yes
 Divorced Single No 90K Yes
 Divorced No 95K Yes
 NO Refund Divorced Yes 220K No
 Yes No Marital Status Refund Taxable Inc. Cheat
 Single Yes 125K No
 NO TaxInc Divorced Yes 220K No
 Marital Status Refund Taxable Inc. Cheat
 < 80K > 80K
 Single No 70K <80 No
 Single No 85K >80 Yes
 NO YES Single No 90K >80 Yes
 Divorced No 95K >80 Yes
Decision Rules:
 Marital St.= Married then Cheat= No
 Marital St.= (Single Or Divorced) And Refund = Yes then Cheat = No
 Marital St.= (Single Or Divorced) And Refund = No And TaxInc <= 80K then Cheat = No
 Marital St.= (Single Or Divorced) And Refund = No And TaxInc >= 80K then Cheat = Yes
Building Decision Tree
Step – 1 Age
 Adult
 Income
 low
 Cartype
 Family
 Class
 High
 Calculate “Class” attribute Entropy. Adult high Sports High
 Younger low Sports Low
 Old low Family Low
 Younger high Truck High
 p=4, n=2, p + n=6 Old high Truck High
Entropy(Cheet) = (-p/p+n)log2(p/p+n)–(n/p+n)log2(n/p+n)
 =-4/6 log2 4/6 – 2/6 log2 2/6
 = ((-4/6)(log(4/6) / log(2))) – ((2/6)(log (2/6) / log(2)))
 = 0.92
Building Decision Tree
 Age Income Cartype Class
Step – 2 Adult low Family High
 Adult high Sports High
 Entropy of Age? Gain of Age ? Younger low Sports Low
 Old low Family Low
 Entropy of Income? Gain of Income ? Younger
 Old
 high
 high
 Truck
 Truck
 High
 High
 Entropy of Cartype? Gain of Cartype ?
Which attribute’s Gain is larger…?
 Split the table according to the larger Gain.
 Calculate Entropy and Gain if needed
 Build Tree
 Age Income Cartype Class
 Adult low Family High
 Adult high Sports High
Building Decision Tree
 Younger low Sports Low
 Old low Family Low
 Younger high Truck High
 Old high Truck High
Step – 2
 Entropy of Age=0.667 Gain of Age=0.256
 Entropy of Income=0.459 Gain of Income=0.46
 Entropy of Cartype=0.667 Gain of Cartype=0.256
Which attribute’s Gain is larger…?
 Split the table according to the larger Gain.
 Calculate Entropy and Gain Income Age Cartype Class
 low Adult Family High
 Build Tree low Younger Sports Low
 Income low Old Family Low
 High Low high Adult Sports High
 high Younger Truck High
 High high Old Truck High
 Building Decision Tree
 Income Age Cartype Class
 Income
 High Low low Adult Family High
 low Younger Sports Low
 low Old Family Low
 High Age
 Adult Younger, Old
 High Low
Decision Rules:
 Income = High then Class = High
 Income = Low And Age = Adult then Class = High
 Income = Low And (Age = Younger Or Old) then Class = Low
Building Decision Tree
 Try Yourself:
 Building Decision Tree
  Try Yourself:
 https://kindsonthegenius.com/blog/2018/04/how-to-build-a-decision-tree-for-
 classification-step-by-step-procedure-using-entropy-and-gain.html
Under-fitting and Over-fitting
Producing a model that doesn’t perform well even
on the training data, is called under-fitting.
Although typically when this happens we decide
our model isn’t good enough and keep looking for
a better one.
Producing a model that performs well on the
training data but that generalizes poorly to any
new data, is called over-fitting. This could involve
learning noise in the data.
Underfitting and Overfitting
 Under-fitting Just Right Over-fitting
 • A very small model • A very large model
 • Too few samples • Too many samples