0% found this document useful (0 votes)

546 views52 pages

Decision Tree Induction

The document discusses decision trees, which are flow-chart like tree structures used for classification or regression. Decision trees work by recursively splitting a dataset based on attribute tests at internal nodes. Popular algorithms for constructing decision trees include ID3, C4.5, and CART, which use a greedy top-down approach to build the tree. The algorithm starts with a root node and all training data, and recursively partitions the data into purer subsets based on the best splitting attribute until reaching leaf nodes with class labels.

Uploaded by

Indumathy Paranthaman

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

546 views52 pages

Decision Tree Induction

Uploaded by

Indumathy Paranthaman

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

Decision Trees

 Decision tree induction is the

learning of decision trees from
class-labelled training examples.
 A decision-tree is a flow-chart like
tree structure, where each internal
node denotes a test on an attribute,
each branch represents an outcome
of the test and each leaf node holds
a class label.
Decision Tree Induction

 ID3Iterative Dichotomiser 3 by J. Ross

Quinlan
 C4.5 is a successor of ID3
 Classification
and Regression
Trees(CART) – generation of binary trees
 ID3, C4.5 and CART adopt a
greedy(nonbacktracking) approach in
which decision trees are constructed in a
top-down recursive divide-and-conquer
manner.
 Algorithm starts with a training set of
tuples and their associated class labels.
 Training stet is recursively partitioned
into smaller subsets as the tree is being
built.
The Algorithm

 Create a root node N for the tree

 If all examples are of same class C,
then return N as the leaf node labeled
with the class C.
 If attribute_list is empty, then return N
as the leaf node labeled with the
majority class in D(majority voting)
 Apply Attribute_Selection_method(D,
attribute_list) to find “best”
splitting_criteria;
 Label node N with splitting_criterion
 If splitting_attribute is discrete_valued
and multiway splits allowed then
 attribute_list  attribute_list –
splitting_attribute
 For each outcome j of splitting_criterion
Let Dj be the set of data tuples in D satisfying
outcome j;
If Dj is empty then
 Attach a leaf labelled with the majority class in
D to node N;
else
 Attach the node returned by
Generate_decision_tree(Dj, attribute_list) to
node N
 Endfor
 Return N
 To partition attributes in D, there are three possible
scenarios. Let A be the splitting attribute. A has v
distinct values, {a1,a2,…,av}, based on training
data.
 A is discrete-valued:
 A is continuous-valued:
 A is discrete-valued and a binary tree must be
produced.
 The recursive partitioning stops only when one of
the following terminating condition is true:
 All the tuples in partition D belong to the same class
 There are no remaining attributes on which the tuples
may be further partitioned. In this case majority
voting is employed.
 There are no tuples for a given branch, that is, a
partition Dj, is empty. In this case, a leaf is created
with the majority class in D.
 What is the computational complexity?
Attribute Selection Procedure

 It is a heuristic for selecting the splitting criterion

that “best” separates a given data partition, D, of
class-labelled training tuples into individual
classes.
 It is also known as splitting rules because they
determine how the tuples at a given node are to be
split.
 It provides ranking for each attribute describing the
given training tuple. The attribute having the best
score for the measure is chosen as the splitting
attribute for the given tuples.
 Well known attribute selection measures are :
information gain, gain ratio, gini index.
Information
Gain(ID3/C4.5)
 Select the attribute with the highest information
gain.
 Let pi be the probability that an arbitrary tuple in D
belongs to class Ci, estimated by |Ci, D|/|D|
 Expected information (entropy) needed to classify a
tuple in D:

m
Info( D)   pi log 2 ( pi )
i 1
 Information needed (after using A to split D into v
partitions) to classify D:
v | Dj |
Info A ( D)    I (D j )
j 1 |D|
 Information gained by branching on attribute A

Gain(A)  Info(D)  Info A(D)

Exercise

 Calculate the information(entropy)

 Given:
 Set S contains14 examples
 9 Positive values
 5 Negative values
Exercise

 Info(S) = - (9/14) log2 (9/14) - (5/14) log2 (5/14)

 = 0.940
Information Gain

 Information gain is based on the decrease in

entropy after a dataset is split on an attribute.
 Looking for which attribute creates the most
homogeneous branches
Information Gain Example

 14 examples, 9 positive 5 negative

 The attribute is Wind.
 Values of wind are Weak and Strong
Exercise (cont.)

 8 occurrences of weak winds

 6 occurrences of strong winds
 For the weak winds, 6 are positive and 2 are
negative
 For the strong winds, 3 are positive and 3 are
negative
Exercise (cont.)

 Gain(S,Wind) =
 Info(S) - (8/14) * Info(Weak)
-(6/14) * Info(Strong)
 Info(Weak) = - (6/8)*log2(6/8) - (2/8)*log2(2/8) =
0.811
 Info(Strong) = - (3/6)*log2(3/6) - (3/6)*log2(3/6) =
1.00
Exercise (cont.)

 So…
 0.940 - (8/14)*0.811 - (6/14)*1.00
 = 0.048
age income student credit_rating buys_computer
youth high no fair no
youth high no excellent no
middle_aged high no fair yes
senior medium no fair yes
senior low yes fair yes
senior low yes excellent no
middle_aged low yes excellent yes
youth medium no fair no
youth low yes fair yes
senior medium yes fair yes
youth medium yes excellent yes
middle_aged medium no excellent yes
middle_aged high yes fair yes
senior medium no excellent no
Sample training data to determine
whether an animal lays eggs.
Dependent/
Independent/Condition attributes Decision
attributes
Animal Warm- Feathers Fur Swims Lays Eggs
blooded
Ostrich Yes Yes No No Yes
Crocodile No No No Yes Yes
Raven Yes Yes No No Yes
Albatross Yes Yes No No Yes
Dolphin Yes No No Yes No
Koala Yes No Yes No No
Entropy(4Y,2N): -(4/6)log2(4/6) – (2/6)log2(2/6)
= 0.91829

Now, we have to find the IG for all four attributes

Warm-blooded, Feathers, Fur, Swims
For attribute ‘Warm-blooded’:
Values(Warm-blooded) : [Yes,No]
S = [4Y,2N]
SYes = [3Y,2N] E(SYes) = 0.97095
SNo = [1Y,0N] E(SNo) = 0 (all members belong to same class)
Gain(S,Warm-blooded) = 0.91829 – [(5/6)*0.97095 + (1/6)*0]
= 0.10916
For attribute ‘Feathers’:
Values(Feathers) : [Yes,No]
S = [4Y,2N]
SYes = [3Y,0N] E(SYes) = 0
SNo = [1Y,2N] E(SNo) = 0.91829
Gain(S,Feathers) = 0.91829 – [(3/6)*0 + (3/6)*0.91829]
= 0.45914
For attribute ‘Fur’:
Values(Fur) : [Yes,No]
S = [4Y,2N]
SYes = [0Y,1N] E(SYes) = 0
SNo = [4Y,1N] E(SNo) = 0.7219
Gain(S,Fur) = 0.91829 – [(1/6)*0 + (5/6)*0.7219]
= 0.3167
For attribute ‘Swims’:
Values(Swims) : [Yes,No]
S = [4Y,2N]
SYes = [1Y,1N] E(SYes) = 1 (equal members in both classes)
SNo = [3Y,1N] E(SNo) = 0.81127
Gain(S,Swims) = 0.91829 – [(2/6)*1 + (4/6)*0.81127]
Gain(S,Warm-blooded) = 0.10916
Gain(S,Feathers) = 0.45914
Gain(S,Fur) = 0.31670
Gain(S,Swims) = 0.04411
Gain(S,Feathers) is maximum, so it is considered as the root node

Anim War Feath Fur Swim Lays

The ‘Y’ descendant has only
al m- ers s Eggs positive examples and becomes
blood the leaf node with classification
ed ‘Lays Eggs’
Ostric Yes Yes No No Yes Feathers
h
Croco No No No Yes Yes Y N
dile
Raven Yes Yes No No Yes
[Ostrich, [Crocodile,
Albatr Yes Yes No No Yes
Raven, Dolphin,
oss
Dolph Yes No No Yes No Albatross] Koala]
in
Lays Eggs ?
Koala Yes No Yes No No
Animal Warm- Feathers Fur Swims Lays Eggs
blooded
Crocodile No No No Yes Yes
Dolphin Yes No No Yes No
Koala Yes No Yes No No

We now repeat the procedure,

S: [Crocodile, Dolphin, Koala]
S: [1+,2-]

Entropy(S) = -(1/3)log2(1/3) – (2/3)log2(2/3)

= 0.91829
 For attribute ‘Warm-blooded’:
Values(Warm-blooded) : [Yes,No]
S = [1Y,2N]
SYes = [0Y,2N] E(SYes) = 0
SNo = [1Y,0N] E(SNo) = 0
Gain(S,Warm-blooded) = 0.91829 – [(2/3)*0 + (1/3)*0] = 0.91829
 For attribute ‘Fur’:
Values(Fur) : [Yes,No]
S = [1Y,2N]
SYes = [0Y,1N] E(SYes) = 0
SNo = [1Y,1N] E(SNo) = 1
Gain(S,Fur) = 0.91829 – [(1/3)*0 + (2/3)*1] = 0.25162
 For attribute ‘Swims’:
Values(Swims) : [Yes,No]
S = [1Y,2N]
SYes = [1Y,1N] E(SYes) = 1
SNo = [0Y,1N] E(SNo) = 0
Gain(S,Swims) = 0.91829 – [(2/3)*1 + (1/3)*0] = 0.25162
Gain(S,Warm-blooded) is maximum
The final decision tree will be:

Feathers

Y N

Lays eggs Warm-blooded

Y N

Does not lay eggs Lays Eggs

Factors affecting sunburn

Name Hair Height Weight Lotion Sunburned

Sarah Blonde Average Light No Yes
Dana Blonde Tall Average Yes No
Alex Brown Short Average Yes No
Annie Blonde Short Average No Yes
Emily Red Average Heavy No Yes
Pete Brown Tall Heavy No No
John Brown Average Heavy No No
Katie Blonde Short Light Yes No
S = [3+, 5-]
Entropy(S) = -(3/8)log2(3/8) – (5/8)log2(5/8)
= 0.95443
Find IG for all 4 attributes: Hair, Height, Weight, Lotion
 For attribute ‘Hair’:

Values(Hair) : [Blonde, Brown, Red]

S = [3+,5-]
SBlonde = [2+,2-] E(SBlonde) = 1
SBrown = [0+,3-] E(SBrown) = 0
SRed = [1+,0-] E(SRed) = 0
Gain(S,Hair) = 0.95443 – [(4/8)*1 + (3/8)*0 + (1/8)*0]
= 0.45443
 For attribute ‘Height’:
Values(Height) : [Average, Tall, Short]
SAverage = [2+,1-] E(SAverage) = 0.91829
STall = [0+,2-] E(STall) = 0
SShort = [1+,2-] E(SShort) = 0.91829
Gain(S,Height) = 0.95443 – [(3/8)*0.91829 + (2/8)*0 + (3/8)*0.91829] = 0.26571
 For attribute ‘Weight’:
Values(Weight) : [Light, Average, Heavy]
SLight = [1+,1-] E(SLight) = 1
SAverage = [1+,2-] E(SAverage) = 0.91829
SHeavy = [1+,2-] E(SHeavy) = 0.91829
Gain(S,Weight) = 0.95443 – [(2/8)*1 + (3/8)*0.91829 + (3/8)*0.91829] = 0.01571
 For attribute ‘Lotion’:
Values(Lotion) : [Yes, No]
SYes = [0+,3-] E(SYes) = 0
SNo = [3+,2-] E(SNo) = 0.97095
Gain(S,Hair) = 0.45443
Gain(S,Height) = 0.26571
Gain(S,Weight) = 0.01571
Gain(S,Lotion) = 0.3475
Gain(S,Hair) is maximum, so it is considered as the root node
Name Hair Height Weight Lotion Sunbur
ned
Sarah Blonde Averag Light No Yes
e
Hair
Dana Blonde Tall Averag Yes No
e Blond Brow
e n
Alex Brown Short Averag Yes No Red
e [Sarah, Dana, [Alex, Pete,
Annie Blonde Short Averag No Yes Annie, Katie] John]
e Not
? Sunburned
Emily Red Averag Heavy No Yes
e [Emily
Pete Brown Tall Heavy No No ]
John Brown Averag Heavy No No Sunburned
e
Name Hair Height Weight Lotion Sunburned
Sarah Blonde Average Light No Yes
Dana Blonde Tall Average Yes No
Annie Blonde Short Average No Yes
Katie Blonde Short Light Yes No

Repeating again:
S = [Sarah, Dana, Annie, Katie]
S: [2+,2-]
Entropy(S) = 1
Find IG for remaining 3 attributes Height, Weight, Lotion
 For attribute ‘Height’:

Values(Height) : [Average, Tall, Short]

S = [2+,2-]
SAverage = [1+,0-] E(SAverage) = 0
STall = [0+,1-] E(STall) = 0
SShort = [1+,1-] E(SShort) = 1
Gain(S,Height) = 1 – [(1/4)*0 + (1/4)*0 + (2/4)*1] = 0.5
 For attribute ‘Weight’:
Values(Weight) : [Average, Light]
S = [2+,2-]
SAverage = [1+,1-] E(SAverage) = 1
SLight = [1+,1-] E(SLight) = 1
Gain(S,Weight) = 1 – [(2/4)*1 + (2/4)*1]= 0
 For attribute ‘Lotion’:

Values(Lotion) : [Yes, No]

S = [2+,2-]
SYes = [0+,2-] E(SYes) = 0
SNo = [2+,0-] E(SNo) = 0
Gain(S,Lotion) = 1 – [(2/4)*0 + (2/4)*0]= 1

Therefore, Gain(S,Lotion) is maximum

 In this case, the final decision tree will be

Hair

Blonde Brow
Re n
d

Sunburned Not
Lotion
Sunburned

Y N

Not Sunburned
Sunburned
Computing Information-Gain for
Continuous-Valued Attributes
 Let attribute A be a continuous-valued attribute
 Must determine the best split point for A
 Sort the value A in increasing order
 Typically, the midpoint between each pair of adjacent values is
considered as a possible split point
 (ai+ai+1)/2 is the midpoint between the values of ai and ai+1
 The point with the minimum expected information requirement for
A is selected as the split-point for A
 Split:
 D1 is the set of tuples in D satisfying A ≤ split-point, and D2 is the set
of tuples in D satisfying A > split-point
Gain Ratio for Attribute Selection (C4.5)
 Information gain measure is biased towards attributes
with a large number of values
 C4.5 (a successor of ID3) uses gain ratio to overcome the
problem (normalization to information gain)
v | Dj | | Dj |
SplitInfo A ( D)    log 2 ( )
j 1 |D| |D|

 GainRatio(A) = Gain(A)/SplitInfo(A)
 Ex.
 gain_ratio(income) = 0.029/1.557 = 0.019
 The attribute with the maximum gain ratio is selected as
the splitting attribute
age income student credit_rating buys_computer
youth high no fair no
youth high no excellent no
middle_aged high no fair yes
senior medium no fair yes
senior low yes fair yes
senior low yes excellent no
middle_aged low yes excellent yes
youth medium no fair no
youth low yes fair yes
senior medium yes fair yes
youth medium yes excellent yes
middle_aged medium no excellent yes
middle_aged high yes fair yes
senior medium no excellent no
Gini Index (CART, IBM IntelligentMiner)
 If a data set D contains examples from n classes, gini index,
gini(D) is defined as n 2
gini( D)  1  p j
j 1
where pj is the relative frequency of class j in D
 If a data set D is split on A into two subsets D1 and D2, the gini
index gini(D) is defined as
|D1| |D |
gini A ( D)  gini( D1)  2 gini( D 2)
 Reduction in Impurity: |D| |D|
gini( A)  gini( D)  giniA ( D)
 The attribute provides the smallest ginisplit(D) (or the largest
reduction in impurity) is chosen to split the node (need to
enumerate all the possible splitting points for each attribute)
Computation of Gini Index
 Ex. D has 9 tuples in buys_computer = “yes” and 5 in “no”
2 2
   
9 5
gini ( D)  1        0.459
 14   14 
 Suppose the attribute income partitions D into 10 in D1: {low, medium}
and 4 in D2
 10  4
giniincome{low,medium} ( D)   Gini( D1 )   Gini( D2 )
 14   14 

Gini{low,high} is 0.458; Gini{medium,high} is 0.450. Thus, split on the

{low,medium} (and {high}) since it has the lowest Gini index
 All attributes are assumed continuous-valued
 May need other tools, e.g., clustering, to get the possible split values
 Can be modified for categorical attributes
Comparing Attribute Selection Measures

 The three measures, in general, return good results but

 Information gain:
 biased towards multivalued attributes

 Gain ratio:
 tends to prefer unbalanced splits in which one partition is much smaller than the others

 Gini index:
 biased to multivalued attributes
 has difficulty when # of classes is large
 tends to favor tests that result in equal-sized partitions and purity in both partitions
Other Attribute Selection Measures
 CHAID: a popular decision tree algorithm, measure based on χ2 test for
independence
 C-SEP: performs better than info. gain and gini index in certain cases
 G-statistic: has a close approximation to χ2 distribution
 MDL (Minimal Description Length) principle (i.e., the simplest solution is
preferred):
 The best tree as the one that requires the fewest # of bits to both (1)
encode the tree, and (2) encode the exceptions to the tree
 Multivariate splits (partition based on multiple variable combinations)
 CART: finds multivariate splits based on a linear comb. of attrs.
 Which attribute selection measure is the best?
 Most give good results, none is significantly superior than others
Preparing Data for
Classification and Prediction
 Data Cleaning : remove or reduce noise (apply smoothing) and
treatment of missing values (replacing a missing value with
the most commonly occurring values for that attribute, or
with the most probable value based on statistics)
 Relevance Analysis : 1) Correlation analysis is used to check
whether any two given attributes are statistically related. 2)
Dataset may also contain irrelevant attributes. These kind of
attributes may otherwise slow down and possibly mislead the
learning step.
 Data Transformation and Reduction : may be transformed by
normalization – scaling attributes such that it comes in one
range. Or by generalizing it to higher-level concepts.
Advantage of ID3

 Understandable prediction rules are created from

the training data.
 Builds the fastest tree.
 Builds a short tree.
 Only need to test enough attributes until all data is
classified.
 Finding leaf nodes enables test data to be pruned,
reducing number of tests.
Disadvantage of ID3

 Data may be over-fitted or over-classified, if a small

sample is tested.
 Only one attribute at a time is tested for making a
decision.
 Classifying continuous data may be
computationally expensive, as many trees must be
generated to see where to break the continuum.
Bayes’ Theorem: Basics
 Total probability Theorem:
M
P(B)   P(B | A )P( A )
i i
i 1
 Bayes’ Theorem:
P(H | X)  P(X | H )P(H )  P(X | H ) P(H ) / P(X)

P(X)
Let X be a data sample (“evidence”): class label is unknown
 Let H be a hypothesis that X belongs to class C
 Classification is to determine P(H|X), (i.e., posteriori probability): the
probability that the hypothesis holds given the observed data sample X
 P(H) (prior probability): the initial probability
 E.g., X will buy computer, regardless of age, income, …
 P(X): probability that sample data is observed
 P(X|H) (likelihood): the probability of observing the sample X, given that
the hypothesis holds
 E.g., Given that X will buy computer, the prob. that X is 31..40,
medium income
Prediction Based on Bayes’ Theorem
 Given training data X, posteriori probability of a
hypothesis H, P(H|X), follows the Bayes’ theorem

P(H | X)  P(X | H ) P( H )  P(X | H ) P( H ) / P(X)

P(X)
 Informally, this can be viewed as
posteriori = likelihood x prior/evidence
 Predicts X belongs to Ci iff the probability P(Ci|X) is the
highest among all the P(Ck|X) for all the k classes
 Practical difficulty: It requires initial knowledge of many
probabilities, involving significant computational cost
Classification Is to Derive the Maximum Posteriori
 Let D be a training set of tuples and their associated class
labels, and each tuple is represented by an n-D attribute
vector X = (x1, x2, …, xn)
 Suppose there are m classes C1, C2, …, Cm.
 Classification is to derive the maximum posteriori, i.e., the
maximal P(Ci|X)
 This can be derived from Bayes’ theorem
P(X | C )P(C )
P(C | X)  i i
i P(X)
 Since P(X) is constant for all classes, only
P(C | X)  P(X | C )P(C )
i i i
needs to be maximized
Naïve Bayes Classifier
 A simplified assumption: attributes are conditionally
independent (i.e., no dependence relation between
attributes): n
P( X | C i )   P( x | C i )  P( x | C i )  P( x | C i )  ... P( x | C i )
k 1 2 n
k 1
 This greatly reduces the computation cost: Only counts
the class distribution
 If Ak is categorical, P(xk|Ci) is the # of tuples in Ci having
value xk for Ak divided by |Ci, D| (# of tuples of Ci in D)
 If Ak is continous-valued, P(xk|Ci) is usually computed
based on Gaussian distribution with a mean μ and
( x ) 2

standard deviation σ 1 
g ( x,  ,  ) 
2
e 2
2 

and P(xk|Ci) is
P ( X | C i )  g ( xk ,  Ci ,  Ci )

Unit 3 Classification
No ratings yet
Unit 3 Classification
71 pages
Unit 3
No ratings yet
Unit 3
25 pages
DWDM Unit 6 Cluster Analysis
No ratings yet
DWDM Unit 6 Cluster Analysis
183 pages
Machine Learning for Tech Enthusiasts
No ratings yet
Machine Learning for Tech Enthusiasts
12 pages
Concept Learning in Machine Learning
No ratings yet
Concept Learning in Machine Learning
71 pages
Classification and Regression Trees (CART - I) : Dr. A. Ramesh
No ratings yet
Classification and Regression Trees (CART - I) : Dr. A. Ramesh
34 pages
Unit 2 - Bda Notes
No ratings yet
Unit 2 - Bda Notes
37 pages
K-Means Clustering in Data Mining
No ratings yet
K-Means Clustering in Data Mining
8 pages
Text Classification and Rocchio Algorithm
No ratings yet
Text Classification and Rocchio Algorithm
32 pages
Data Mining: Classification & Prediction
No ratings yet
Data Mining: Classification & Prediction
16 pages
K-Means Clustering Guide
100% (1)
K-Means Clustering Guide
12 pages
Combining Classifiers in Machine Learning An Introductory Guide
No ratings yet
Combining Classifiers in Machine Learning An Introductory Guide
11 pages
Unit-5 DWDM Material
No ratings yet
Unit-5 DWDM Material
19 pages
Lecture+Notes+ +clustering
No ratings yet
Lecture+Notes+ +clustering
13 pages
Revised Data Analytics Modelling Sem III Lab Questions (2022) Revised
No ratings yet
Revised Data Analytics Modelling Sem III Lab Questions (2022) Revised
5 pages
FP-Growth Algorithm
No ratings yet
FP-Growth Algorithm
23 pages
Confusion Matrix
No ratings yet
Confusion Matrix
4 pages
Quantum Technology A Note-1-3
No ratings yet
Quantum Technology A Note-1-3
3 pages
Unit-5 DS Notes
No ratings yet
Unit-5 DS Notes
19 pages
Unit 1 - Machine Learning
No ratings yet
Unit 1 - Machine Learning
21 pages
Unit 3 AI
No ratings yet
Unit 3 AI
11 pages
PAI Unit-02
No ratings yet
PAI Unit-02
48 pages
OMT
No ratings yet
OMT
19 pages
Fuzzy Rule-Based Models Overview
100% (2)
Fuzzy Rule-Based Models Overview
23 pages
Practical 5: Introduction To Weka For Classfication
100% (1)
Practical 5: Introduction To Weka For Classfication
4 pages
Game Theory and AI Algorithms Explained
No ratings yet
Game Theory and AI Algorithms Explained
24 pages
Data Mining & Warehousing Basics
100% (1)
Data Mining & Warehousing Basics
86 pages
Naïve Bayes Classifier Algorithm
No ratings yet
Naïve Bayes Classifier Algorithm
10 pages
Unit - 5
No ratings yet
Unit - 5
32 pages
Decision Trees for Data Scientists
No ratings yet
Decision Trees for Data Scientists
25 pages
Decision Trees & ID3 for Beginners
No ratings yet
Decision Trees & ID3 for Beginners
109 pages
Ch-4 Ensemble Learning
No ratings yet
Ch-4 Ensemble Learning
18 pages
Data Mining Lab Manual for GTU
No ratings yet
Data Mining Lab Manual for GTU
52 pages
K Means
No ratings yet
K Means
26 pages
ML-5TH Unit
No ratings yet
ML-5TH Unit
28 pages
Computational Graphs in Deep Learning Unit v4 Deep Leaerning
No ratings yet
Computational Graphs in Deep Learning Unit v4 Deep Leaerning
3 pages
Decision Tree Induction
No ratings yet
Decision Tree Induction
23 pages
BDA Unit 1
No ratings yet
BDA Unit 1
10 pages
MapReduce for Data Engineers
No ratings yet
MapReduce for Data Engineers
28 pages
Module 3 Informed Search Techniques and Knowledge Representation
No ratings yet
Module 3 Informed Search Techniques and Knowledge Representation
26 pages
Unit 5 1
No ratings yet
Unit 5 1
18 pages
Compiler Design Short Notes and Concepts
100% (1)
Compiler Design Short Notes and Concepts
36 pages
K-Means Clustering Guide
No ratings yet
K-Means Clustering Guide
51 pages
Adversarial Search Notes
100% (1)
Adversarial Search Notes
4 pages
Constraint Satisfaction Problems: AIMA: Chapter 6
No ratings yet
Constraint Satisfaction Problems: AIMA: Chapter 6
64 pages
AI - Expert System
100% (1)
AI - Expert System
24 pages
Unit 4 Data Science
No ratings yet
Unit 4 Data Science
21 pages
ML Unit-3
No ratings yet
ML Unit-3
92 pages
DAA Unit-2: Fundamental Algorithmic Strategies
No ratings yet
DAA Unit-2: Fundamental Algorithmic Strategies
5 pages
Functional Dependencies and Normalization
No ratings yet
Functional Dependencies and Normalization
7 pages
Gaussian Mixture Model Parameters
No ratings yet
Gaussian Mixture Model Parameters
24 pages
Planning and Search: Classical Planning: Planning Graphs, Graphplan
No ratings yet
Planning and Search: Classical Planning: Planning Graphs, Graphplan
22 pages
Artificial Intelligence Module 5
No ratings yet
Artificial Intelligence Module 5
23 pages
Greedy Algorithm: Definition & Examples
No ratings yet
Greedy Algorithm: Definition & Examples
20 pages
Understanding Classification and Decision Trees
No ratings yet
Understanding Classification and Decision Trees
80 pages
Decision Trees & ID3 Explained
No ratings yet
Decision Trees & ID3 Explained
22 pages
Decision Tree Learning Guide
No ratings yet
Decision Tree Learning Guide
33 pages
Lecture 4
No ratings yet
Lecture 4
74 pages
Unit6 - 2 Classification-Decision-Trees
No ratings yet
Unit6 - 2 Classification-Decision-Trees
36 pages
Dimensionality Reduction & Models
No ratings yet
Dimensionality Reduction & Models
59 pages
Unit V Graphical Models
No ratings yet
Unit V Graphical Models
23 pages
Triangle Angle Relationships
No ratings yet
Triangle Angle Relationships
1 page
Ending The Demo
No ratings yet
Ending The Demo
2 pages
A Machine Learning Approach For Identifying Disease-Treatment Relations in Short Texts
No ratings yet
A Machine Learning Approach For Identifying Disease-Treatment Relations in Short Texts
14 pages
System Architecture
No ratings yet
System Architecture
2 pages
Ram Ma Ble /D Isp Lay Er Fac E: Iie - Sap
No ratings yet
Ram Ma Ble /D Isp Lay Er Fac E: Iie - Sap
37 pages
1527 ArticleText 4081 1 10 20220514
No ratings yet
1527 ArticleText 4081 1 10 20220514
13 pages
Data Structures Q&A Guide
No ratings yet
Data Structures Q&A Guide
20 pages
Chapter 8
No ratings yet
Chapter 8
58 pages
MPC for Inverted Pendulum Control
No ratings yet
MPC for Inverted Pendulum Control
5 pages
(Ebook) Matrix Calculus and Zero-One Matrices: Statistical and Econometric Applications by Darrell A. Turkington ISBN 9780521807883, 0521807883 Instant Download
No ratings yet
(Ebook) Matrix Calculus and Zero-One Matrices: Statistical and Econometric Applications by Darrell A. Turkington ISBN 9780521807883, 0521807883 Instant Download
89 pages
Acadia (Interview Question)
No ratings yet
Acadia (Interview Question)
19 pages
Fundamentos Sistema Dinámico en DWSIM
No ratings yet
Fundamentos Sistema Dinámico en DWSIM
10 pages
Linear Programming
No ratings yet
Linear Programming
32 pages
MATLAB Programs for Lab Exercises
No ratings yet
MATLAB Programs for Lab Exercises
13 pages
Time Series Using Python
No ratings yet
Time Series Using Python
18 pages
Module 5
No ratings yet
Module 5
41 pages
An Ensemble Features Aware Machine Learning Model For Detection and Staging of Dyslexia
No ratings yet
An Ensemble Features Aware Machine Learning Model For Detection and Staging of Dyslexia
10 pages
Hw1-1 Answers Hand Written
No ratings yet
Hw1-1 Answers Hand Written
16 pages
Mean Centre - A Statistical Tool in Geography
No ratings yet
Mean Centre - A Statistical Tool in Geography
10 pages
Inbound 737509994069519427
No ratings yet
Inbound 737509994069519427
4 pages
Study and Comparison of Various Image Ed
No ratings yet
Study and Comparison of Various Image Ed
12 pages
RCM Library for Graph Node Ordering
No ratings yet
RCM Library for Graph Node Ordering
4 pages
3CS501CC24 DAA Course Policy 24 25
No ratings yet
3CS501CC24 DAA Course Policy 24 25
33 pages
Basic Block and Flow Graph
No ratings yet
Basic Block and Flow Graph
11 pages
机器学习 Mitchell 中文清晰版
No ratings yet
机器学习 Mitchell 中文清晰版
282 pages
Data Science in Weather Forecasting
No ratings yet
Data Science in Weather Forecasting
25 pages
Introduction To Machine Learning - Unit 4 - Week 1
No ratings yet
Introduction To Machine Learning - Unit 4 - Week 1
5 pages
Econometrics Assignment Help
100% (2)
Econometrics Assignment Help
6 pages
6 Digital Controller Design
No ratings yet
6 Digital Controller Design
23 pages
Deep Learning Course Syllabus EE 414
No ratings yet
Deep Learning Course Syllabus EE 414
4 pages
Introduction To Second-Quantization I
No ratings yet
Introduction To Second-Quantization I
16 pages
ORCA Basis Sets Overview
No ratings yet
ORCA Basis Sets Overview
12 pages
Artificial Intelligence - Lecture Notes, Study Material and Important Questions, Answers
100% (1)
Artificial Intelligence - Lecture Notes, Study Material and Important Questions, Answers
3 pages
Chaptetr 1 Solution Steven M KAY
100% (1)
Chaptetr 1 Solution Steven M KAY
4 pages
Capstone Project: Team India Performance Analysis
No ratings yet
Capstone Project: Team India Performance Analysis
27 pages

Decision Tree Induction

Uploaded by

Decision Tree Induction

Uploaded by

Decision Trees

 Decision tree induction is the

 ID3Iterative Dichotomiser 3 by J. Ross

 Create a root node N for the tree

 It is a heuristic for selecting the splitting criterion

Gain(A)  Info(D)  Info A(D)

 Calculate the information(entropy)

 Info(S) = - (9/14) log2 (9/14) - (5/14) log2 (5/14)

 Information gain is based on the decrease in

 14 examples, 9 positive 5 negative

 8 occurrences of weak winds

Now, we have to find the IG for all four attributes

Anim War Feath Fur Swim Lays

We now repeat the procedure,

Entropy(S) = -(1/3)log2(1/3) – (2/3)log2(2/3)

Lays eggs Warm-blooded

Does not lay eggs Lays Eggs

Name Hair Height Weight Lotion Sunburned

Values(Hair) : [Blonde, Brown, Red]

Values(Height) : [Average, Tall, Short]

Values(Lotion) : [Yes, No]

Therefore, Gain(S,Lotion) is maximum

Gini{low,high} is 0.458; Gini{medium,high} is 0.450. Thus, split on the

 The three measures, in general, return good results but

 Understandable prediction rules are created from

 Data may be over-fitted or over-classified, if a small

P(H | X)  P(X | H ) P( H )  P(X | H ) P( H ) / P(X)

You might also like