0% found this document useful (0 votes)

69 views58 pages

Decision Trees and Decision Modeling

1. The document describes various data mining methods that can be used depending on the type of data (continuous or categorical predictors and continuous, categorical, or no response variables). 2. It discusses supervised learning methods like decision trees that are used for classification and prediction problems and require labeled training data. 3. Decision trees work by recursively splitting the data into purer subgroups based on input variables, with the goal of minimizing errors in classification. Leaf nodes represent class labels.

Uploaded by

DAVID JOSUE TEJADA CALDERON

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

69 views58 pages

Decision Trees and Decision Modeling

Uploaded by

DAVID JOSUE TEJADA CALDERON

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 58

+ 1

Decision
Trees and
Modeling
Data Mining Methods According to 2

the Nature of the Data

Continuous Categorical No
Response Response Response
Continuous Linear Logistic regression Principal
predictors regression Neural nets components
Neural nets Discriminant Cluster
k-nearest analysis analysis
neighbors k-nearest neighbors
Categorical Linear Neural nets Association
predictors regression Classification rules
Neural nets trees
Regression trees Logistic regression
Naïve Bayes
3

Supervised
Types of
 Algorithms used in classification and
Data prediction
Mining  Classification  models discrete-valued
functions
Methods  Prediction  models continuous-valued
functions
 Require data in which the value of the
outcome of interest (e.g., purchase or no
purchase) is known
 Use training data, validation data, and test
data
Unsupervised
 Algorithms where there is no outcome
variable to predict or classify
 Include association rules, data reduction
methods, and clustering techniques
Supervised Learning Process: 4

Two Steps
Learning (Training)
Learn a model using the training data

Testing
Test the model using unseen test data to assess model
accuracy

apply
model

Induction Deduction

Source: Bing Liu, Web Data Mining, 2nd ed., Fig. 3.1., p. 66, Springer, 2011.
Supervised Learning: 5

Classification
 The task of learning a target function f that maps each
attribute set x to one of the predefined class labels y

Input Output
Attribute set Classification Class label
(x) model (y)

 Thetarget function is also known informally at a

classification model which is useful for predicting or
describing data with binary or nominal categories
6
Classification: Explanatory vs.
Predictive Modeling
Descriptive Models Predictive Models
 Serve as an explanatory  Predict the class label of
tool to distinguish between new cases accurately
objects of different classes
 Data is split into a training and
 Fit the data closely validation set (holdout set)
 Use the entire dataset for  Training set  estimate the
estimating the best-fit model model
to maximize information  Holdout set  test the
about the hypothesized model’s performance on
relationship in the population new unobserved data
 Performance measures  Performance measures
assess how well the model assess predictive accuracy
approximates the data
7
Introducing Decision Trees

Decision trees are the workhorses of many practical

applications of machine learning because they are fast and
produce intelligible output that is often surprisingly accurate.
Aim: Partition the data into smaller, more homogenous groups,
(i.e., the nodes of the split are more pure)
Fast training performance
High degree of accuracy
Relatively simple computing methodology
Easy to build
Easy to interpret
Donot require any assumptions about the distribution of the
measurements in each group which can be categorical, discrete
numeric, or continuous
8

A Simple
Decision
Tree For
Buying A
Car
• Each node
specifies a test
(question) of
an attribute.
• Each branch
from a node is a
possible value
of this
attribute
Source: IBM SPSS Modeler 14.2 Modeling Nodes, 2011, Figure 6-2, p. 115.
9
Decision Tree Goals

1. Accurate classification
 minimize error

2. Efficient decision making

 fewest # of decisions/tests
10
How Decision Trees Work

Generate a tree structure of training data subsets by splitting a

dataset on selected input attributes in a top-down recursive
manner:
1. Begin with a single node (root) composed of all
observations in the training set
2. Choose the best input attribute whose values within the
node can be used to split the node into subsets that are
more homogenous with respect to the classification
attribute than the unsplit node (Attributes are selected on
the basis of an impurity function)
3. For each subset of the split, determine if that subset can
or should split. For each subset of the split, return to step
2, until the stopping criteria is reached.
How A Decision Tree Works: 11

Three Types of Nodes

 A root node
Has no incoming edges and zero or more outgoing
edges

 Internal nodes
Each has exactly one incoming edge and two or
more outgoing edges

 Leaf or terminal nodes

 Each has exactly one incoming edge and no
outgoing edges
 Each has a class label
12
The Play Golf Training Data
Day Outlook Temperature Humidity Windy Play Golf
D1 sunny 85 85 false no
D2 sunny 80 90 true no
D3 overcast 83 86 false yes
D4 rainy 70 96 false yes
D5 rainy 68 80 false yes
D6 rainy 65 70 true no
D7 overcast 64 65 true yes
D8 sunny 72 95 false no
D9 sunny 69 70 false yes
D10 rainy 75 80 false yes
D11 sunny 75 70 true yes
D12 overcast 72 90 true yes
D13 overcast 81 75 false yes
D14 rainy 71 91 true no
13
Decision Tree Example: Play Golf

Root node
• Each node
specifies a test of
an attribute.
• Each branch from
Internal a node is a
nodes possible value of
this attribute.

Leave nodes
Decision trees seek to create a set of leaf nodes
that are as ”pure” as possible!

Source: http://gautam.lis.illinois.edu/monkmiddleware/public/analytics/decisiontree.html
14
What do we mean by “pure”?

Very impure group Less impure Very pure group

Three Requirements that must be met 15

before DT algorithms may be applied:

1. DTs represent supervised learning  thus

require pre-classified target variables.
A training data set must be supplied
which provides the algorithm with the
values of the target variable.
Three requirements that must be met 16

before DT algorithms may be applied:

2. The training set should be rich and varied,

providing the algorithm with a healthy cross section
of the types of records for which classification may
be needed in the future
 Training set too large  created tree may overfit

DTs learn by example  if examples are

systematically lacking for a definable subset of
records, classification and prediction for this subset
will be problematic or impossible.
 Training data set too small  tree might not be specific
enough to work with more general data (underfit - both
training and test errors are large)
Three requirements that must be met 17

before DT algorithms may be applied:

3. The target attribute classes must be discrete,

cannot be applied to continuous target variable
 Target variable must take on values that are
clearly demarcated as either belonging to a
particular class or not belonging
 Play golf (yes, no)
 Credit risk (high, low)
 Loan applications (accept, reject)
 Credit card transaction classification (fraudulent,
legitimate)
 Medical diagnosis (cancer, no cancer)
 Spam filters (spam, no spam)
 Security screening (threat, no threat)
18

Decision DT algorithms mainly differ on

Tree  Splitting criteria
Algorithms  Which variable to split first?
 What values to use to split?
 How many splits to form for each node?

 Stopping criteria
 When to stop building the tree
 Pruning (generalization method)
 Pre-pruning versus post-pruning

Most popular DT algorithms

 ID3,C4.5, C5
 CART (Classification And Regression
Tree  binary decision tree)
 CHAID, M5
19
SPSS Modeler Decision Tree Nodes
Name Target Field # of Method Used For
Measurement splits Splitting
Type
C&RT Continuous, Binary Dispersion measure (Gini) or
categorical, flag, least squared deviation
nominal, or ordinal (continuous target) to minimize
impurity at each step
CHAID Continuous, Multiple Chi-square statistics to identify
categorical, flag, optimal splits
nominal, or ordinal
QUEST Categorical, flag, Binary Chi-square test (categorical
or nominal predictors), ANOVA (continuous
inputs)
C5.0 Flag, nominal, or Multiple Information theory measure:
ordinal information gain ratio
Alternative Measures for 20

Selecting the Best Split

 Gini index
 Determines the purity of a specific class as a result of a
decision to branch along a particular attribute (used in CART)
 Chooses attribute that maximizes reduction in impurity as
splitting attribute at a particular node
 See the video: https://
www.youtube.com/watch?v=Zze7SKuz9QQ
 Information gain
 Most common splitting criterion based on entropy, a
measure of the extent of uncertainty or randomness of a
particular attribute/value split (used in ID3, C4.5, C5)
 Measures the change in entropy due to any amount of new
information being added: how much would be gained by
branching on an attribute
 Chooses attribute with the highest information gain as
splitting attribute at a particular node
 Chi-square statistics (used in CHAID)
A Loan Application Training 21

Data Set

Class attribute:
indicates if loan
application was
approved (Yes)
in the past or not
(No)

Source: Bing Liu, Web Data Mining, 2nd ed., Table 3.1., p. 64, Springer, 2011.
Chapter 3 can be downloaded from:
http://www.springer.com/cda/content/document/cda_downloaddocument/9783642194597-c3.pdf?SGWID=0-
0-45-1229046-p174122057
Loan Application Training Data Set 22

Which attributes should be used for splitting?

In which order?
What splits are needed, what is the test condition?
Application
Approved
23
Selecting The Best Split

 The objective is to reduce the impurity or uncertainty

in data as much as possible
 A subset of data is pure if all instances belong to the
same class.

An attribute is good when

The resulting node contains records of mostly one class,
i.e. it’s “pure” or homogenous

An attribute is poor when

• It provides no discrimination - attribute is immaterial to the
decision
• For each value we have the same number of positive and
negative instances
Choosing Splitting Attributes: Which 24
one discriminates best vis-à-vis the
class attribute “Approved”?
Multi-way split Two-way split

No: 60% No: 40% No: 20% No: 0% No: 66.67%

Yes: 40% Yes: 60% Yes: 80% Yes: 100% Yes: 33.33%

Source: Bing Liu, Web Data Mining, 2nd ed., Fig. 3.5., p. 71, Springer, 2011.
+ 25

One Possible Tree

The tree generalizes the data as the tree captures the
key regularities in the data.

Dependent Variable: Approved

Yes 9
No 6

Source: Bing Liu, Web Data Mining, 2nd ed., Fig. 3.2., p. 67, Springer, 2011.
Using the tree to predict the 26

class of a new loan applicant

Age Has_job Own_house Credit-rating Class
young false false good ?

Dependent Variable: Support - the proportion of all

Decision rule records in the data set that rest
IF Age = young Approved in that particular terminal leaf
AND Has_job = false node
THEN Class = No Yes 9
No 6 Confidence of the rule – the
Support = 3/15 = 20% proportion of records in the leaf
Confidence = 3/3 = 100% node for which the decision rule
is true (here leaf nodes are
pure)
27
Is the first tree unique?

 No, there are many Dependent Variable: Approved

possible trees that can
be learned from the Yes 9
No 6
data.
 Here is a simpler,
smaller tree.
 In practice, one wants
to have a small and
accurate tree
 It’s more general
 It’s easier to understand
by human users
Source: Bing Liu, Web Data Mining, 2nd ed., Fig. 3.3., p. 68, Springer, 2011.
Using the smaller tree to 28

predict the class of a new loan

Age Has_job Own_house Credit-rating Class
young false false good ?

Dependent Variable: Approved It is possible to read a set of

rules directly off a decision tree.
Yes 9
No 6 Preconditions are logically
ANDed together, and all the tests
must succeed if the rule is to fire.

IF Own_house = false
AND Has_job = false
THEN Class = No

Support = 6/15 = 40%

Confidence = 6/6 = 100%
29

C5.0 Tree
Built with
SPSS
Modeler
18

One of the most

attractive aspect
of decision trees
lies in their
interpretability,
especially with
respect to the
construction of
decision rules.
30
From Trees to Rules

 Each path from the root to a leaf is a rule.

 The tree paths or rules are mutually exclusive and exhaustive.
 Every data instance is covered by a single rule (a tree path).

Rule Antecedent Consequent Support Confidence

1 If Own_house = true Then Yes 6/15 6/6 = 100%
(loan approved)
2 If Own_house = false Then Yes 3/15 3/3 = 100%
And Has_job = true (loan approved)
3 If Own_house = false Then No 6/15 6/6 = 100%
And Has_job = false (loan not approved)
31

Classifier Predictive accuracy

A simple measure of classification error.
Evaluation
Speed/Efficiency
 time to construct the model
 time to use the model

Robustness
 Ability
to make reasonably accurate predictions, given
noisy data or data with missing and erroneous values
Scalability
 Ability
to construct a prediction model given a rather
large amount of data
Interpretability
 Level
of understanding & insight provided by the
model
Compactness of the model
 Size of the tree, or the number of rules.
Judging Classification Performance:
Evaluation methods

 Theavailable data set is divided into two disjoint

subsets
 the training set (for learning a model)
 the test set (for testing the model)

 The training set should not be used in testing and

the test set should not be used in learning.
 Unseen test set provides a unbiased estimate of accuracy.

 The test set is also called the holdout set.

 Mainly used when the data set D is large.
33
Judging Classification Performance

Errors committed by a The goal is a good

classification model: classification model with:
 Training errors Low training error
Number of misclassification  fits the training data well
errors committed on training
records AND

 Generalization errors Low generalization error

Expected error of the model  accurately classifies
on previously unseen records it has never seen
records (testing records) before
+ Classifier Evaluation: Confusion Matrix 34

A matrix showing the predicted and actual classifications. A

confusion matrix is of size L x L, where L is the number of
different label values. The following confusion matrix is for L=2:

Accuracy Measures
The following terms are defined for a 2 x 2 confusion matrix:

Source: Kohavi & Provost, Glossary of Terms, Machine Learning, 30, 271-274 (1998),
http://robotics.stanford.edu/~ronnyk/glossary.html
Evaluating Performance When
One Class is More Important
 Data
sets with imbalanced class distributions are quite
common in many real applications.
 In
many such cases it is more important to identify
members of one class
 Taxfraud
 Credit card fraud
 Credit default
 Response to promotional offer
 Detecting electronic network intrusion
 Predicting delayed flights

 In
such cases, we are willing to tolerate greater overall
error, in return for better identifying the important
class for further attention
36
Judging Classification Performance:
Different Kinds of Wrong Predictions

False Positive False Negative

 A record that is  A record that is
classified as positive classified as negative
but is actually negative but is actually positive

Example: Insurance company wants to predict if an

uninvestigated claim is fraudulent
 Claim predicted as  Claim is predicted as
fraudulent that is actually legitimate but is actually
legitimate fraudulent
 Cost: Additional money to  Cost: Payoff of claim that
investigate a claim that should be not be paid
should be paid
Judging Classification Performance: 37

Which Model Performs Better?

Good results correspond to large numbers down

the main diagonal (green, from top left to bottom right)
and small, ideally zero, off-diagonal elements.
Model 1 Predicted Class
N = 500 Class = Yes Class = No

Actual Class = Yes 150 40

Class Class = No 60 250
Model 2 Predicted Class
N = 500 Class = Yes Class = No

Actual Class = Yes 250 45

Class Class = No 5 200
Judging Classification Performance: 38

A numeric example with 3 classes?

How accurate is the model that

produced this classification matrix?
N = 200 Predicted Class
a b c total
Actual a 88 10 2 100
Class b 14 40 6 60
c 18 10 12 40
total 120 60 20 200
The Cost of Classification: 39

A Targeted Marketing Example

False Consumer classified as Cost of preparing
positive likely responder, but does and mailing
not respond marketing materials
False Consumer predicted not to No money was
negative be a likely responder, but spent, nothing was
would have bought if offered gained
True A consumer is offered the Profit based on
positive product and buys it revenue minus costs
True Consumer who was not No cost incurred, no
negative offered a deal and would not profit
have bought it even it if had
been offered
If costs are known, they can be 40
incorporated into a financial analysis
of the decision-making process.
Cost Predicted Class • A standard accuracy measure would
Matrix Yes No have preferred model M2 over M1.
• Incorporating costs, model M2 is
inferior since the improvement comes
Actual Yes -1 100 at the expense of increasing the more
Class No 1 0 costly false negative errors.

Model 1 Predicted Class Model 2 Predicted Class

N = 500 Yes No N = 500 Yes No

Actual Yes 150 40 Actual Yes 250 45

Class Class
No
Overall accuracy: 60
400/500 250
= .80 No 5 200
Overall accuracy: 450/500 = .90
FP rate: 60/310 = .1935 FP rate: 5/205 = .0243
FN rate = 40/190 = .2105 FN rate = 45/295 = .1525

Cost M1 = 150 * (-1) + 40 * 100 + 60 * 1 + 250 * 0 = 3910

Cost M2 = 250 * (-1) + 45 * 100 + 5 * 1 + 200 * 0 = 4255
Benchmarking the Model’s Performance 41
to Morrison’s Proportional Chance
Criterion
 Morrison’s likelihood analysis (1969) provides a criterion that
may be used to compare the proportion of correctly
classified observations with the proportion expected by
chance.
 The overall percentage of objects likely to be classified
correctly by chance alone:
Cpro = alpha2 + (1 – alpha)2
 Alpha: proportion of cases in class = Yes
 1 – alpha: proportion of cases in class = No

Predictive accuracy of model should be higher than the

proportional chance criterion!
 Example:
 Class = Yes 1: 73% of the cases, Class = No: 27% of the cases
 C = 0.732 + 0.272 = 0.59
 Use as a cutoff estimated probability
42
Cutoff for Classification

Most DM algorithms classify via a 2-step process:

 For each record,
 Compute probability of belonging to class “1”
 Compare to cutoff value, and classify accordingly

 Default cutoff value is 0.50

 If >= 0.50, classify as “1”
 If < 0.50, classify as “0”

 Can use different cutoff values

 Typically, error rate is lowest for cutoff = 0.50
43
Cutoff
Table
Example

Cutoff Value = 0.75:

8 records are
classified as “1”

Cutoff Value = 0.50:

13 records are
classified as “1”

Cutoff Value = 0.25:

15 records are
classified as “1”

Source: Figure 5.7 in Shmueli et al. (2010), Data Mining for Business Intelligence, p. 104.
44
Cutoff Value = 0.75 Predicted Class
Classification Actual Class 0 1
Confusion 0 11 1
Matrices for 1 5 7
Various True Positive Rate: 7/12 = .5833
Cutoff Values False Positive Rate: 1/12 = .0833
Cutoff Value = 0.50 Predicted Class
Actual Class 0 1
0 10 2
1 1 11
True Positive Rate: 11/12 = .9166
False Positive Rate: 2/12 = .1667
Cutoff Value = 0.25 Predicted Class
Actual Class 0 1
0 8 4
1 1 11
True Positive Rate: 11/12 = .9166
False Positive Rate: 4/12 = .3333
45
The Class Imbalance Problem

 Animbalance occurs when one or more classes have very

low proportions in the training data as compared to the other
classes
 Example:
 1% of credit card transactions are fraudulent
 Fraudulent = yes is a “rare” event / class

 Detectinginstances of the rare class is akin to finding a

needle in a haystack
 Predictionresults will likely be that every record is predicted
as a non fraudulent (predicted target = 0), making the
prediction result insufficiently informative, i.e., model does
not detect any class 1 (fraudulent)
Accuracy Limitations with 46

Imbalanced Data Sets

 The accuracy measure may be misleading
and not be well-suited for evaluating models
derived from imbalanced data sets.
Remedies for Severe Class 47

Imbalance: Sampling Methods

 Up-sampling (oversampling)
 Simulates / imputes additional data points to improve
balance across the classes
 Replicates the positive cases (the “rare” event cases)

 Down-sampling (undersampling)
 Reduces the number of samples to improve balance
across the classes
 Eliminates negative examples, i.e., majority event
cases
48
Remedying the Imbalance Problem with
SPSS Modeler’s Balance Node (Record Ops)

Performs random undersampling by assigning a

factor smaller than 1 to the majority class
Remedying the Imbalance Problem with 49
SPSS Modeler’s Balance Node (Record Ops)

Carries out simple random oversampling by

assigning a factor larger than 1 to the minority class
50
Avoid Overfitting in Classification

 Model overfitting
 Model fits the training data too well
 Good accuracy on training data but poor on test data
 Symptoms:
 Tree too large - too deep and too many branches
 Some may reflect anomalies in the training data due to
noise or outliers

 Trees are pruned to avoid overfitting

 Removes the weakest, least reliable branches so a tree
captures the patterns but not the noise in the training data
 Resulting tree is smaller and less complex, thus easier to
understand
 Faster and better at correctly classifying independent test
data (i.e., previously unseen tuples) than unpruned tree
51
How Does Tree Pruning Work?

 Prepruning approach  Postpruning

 Tree construction is  Successively selecting
halted early, e.g., no a decision node and
further splits or redesignating it as a
partitions at a given leaf node
node  Removes subtrees from
 Based on measures of a fully grown tree, i.e.,
statistical significance, lop off the branches
information gain, Gini extending beyond a
index, etc. particular decision node
 Upon halting, the node
becomes a leaf
Pruning A Decision Tree: 52

When To Stop Splitting Data

 All examples for a given node belong to the same class
 There are no examples left
 Thereare no remaining attributes for further partitioning
– majority class is the leaf
 No attribute satisfies a minimum information gain
threshold
A maximal depth is reached
 There are less than a certain number of examples in the
current subtree: again a mechanism to prevent
overfitting
+

IBM SPSS Modeler 18

Building Decision Tree
Hands-On Exercise

• Drug Reaction (C&RT, C5.0)

• Survival of Passengers on the Titanic (CHAID)
CHAID Tree without Partition
Stream 1: CHAID Tree Table Node

$R-Survived: Predicted score

$RC-Survived: Confidence value

(between 0.0 and 1.0) is model’s own
estimation how accurate each
predicted value is)
CHAID Tree Partition Node
CHAID Tree Analysis Node

Unit-2 Material
No ratings yet
Unit-2 Material
52 pages
DWM - Module 3
No ratings yet
DWM - Module 3
22 pages
Module 04
No ratings yet
Module 04
75 pages
Classification & Prediction Guide
100% (1)
Classification & Prediction Guide
67 pages
Classification and Prediction Guide
No ratings yet
Classification and Prediction Guide
93 pages
AI Chapter 3 Part 2
No ratings yet
AI Chapter 3 Part 2
51 pages
Data Mining
No ratings yet
Data Mining
68 pages
Decision Tree
No ratings yet
Decision Tree
41 pages
Ch02 DecisionTree
100% (1)
Ch02 DecisionTree
41 pages
Lecture 17 18
No ratings yet
Lecture 17 18
52 pages
3-Classification, Clustering and Prediction
No ratings yet
3-Classification, Clustering and Prediction
142 pages
4 & 5 DWM 2024-25
No ratings yet
4 & 5 DWM 2024-25
32 pages
Module - 4.1-DM-1
No ratings yet
Module - 4.1-DM-1
63 pages
Slide 3
No ratings yet
Slide 3
23 pages
DataMining-Handouts1 5
No ratings yet
DataMining-Handouts1 5
8 pages
Session 5b Classification by Decision Tree Induction
No ratings yet
Session 5b Classification by Decision Tree Induction
42 pages
Data Mining Unit-Iii
No ratings yet
Data Mining Unit-Iii
36 pages
DWDM Unit 4
No ratings yet
DWDM Unit 4
80 pages
DM - 06 Mar 2025
No ratings yet
DM - 06 Mar 2025
13 pages
Classification and Clustering
No ratings yet
Classification and Clustering
59 pages
Updated DM Unit 3
No ratings yet
Updated DM Unit 3
28 pages
L05 - Advance Analytical Theory and Methods - Classification
No ratings yet
L05 - Advance Analytical Theory and Methods - Classification
34 pages
Class Basic
No ratings yet
Class Basic
75 pages
Lec05 Classification DecisionTree
No ratings yet
Lec05 Classification DecisionTree
67 pages
Classification
No ratings yet
Classification
45 pages
DM Unit-3
No ratings yet
DM Unit-3
46 pages
Data Mining Chapter 3 Classification
No ratings yet
Data Mining Chapter 3 Classification
82 pages
Classification
No ratings yet
Classification
75 pages
Unit 3 Notes
No ratings yet
Unit 3 Notes
25 pages
Machine - Learning - Lecture - 08 - Decision Tree Learning
No ratings yet
Machine - Learning - Lecture - 08 - Decision Tree Learning
67 pages
Asset v1 MKAU+SEng9032+DEV 01+Type@Asset+Block@ML Chapterthree
No ratings yet
Asset v1 MKAU+SEng9032+DEV 01+Type@Asset+Block@ML Chapterthree
129 pages
Dmi Unit 4
No ratings yet
Dmi Unit 4
34 pages
Unit - Iii
No ratings yet
Unit - Iii
52 pages
Decision Tree Classification Guide
No ratings yet
Decision Tree Classification Guide
23 pages
ML Unit II
No ratings yet
ML Unit II
183 pages
Data Mining: Classification Algorithms
No ratings yet
Data Mining: Classification Algorithms
34 pages
Decision Tree Method in Financial Analysis of Listed Logistics Companies
No ratings yet
Decision Tree Method in Financial Analysis of Listed Logistics Companies
6 pages
2.12 Chapter 6 Decision Tree
No ratings yet
2.12 Chapter 6 Decision Tree
56 pages
Classification: Decision Trees: Business Analytics Lecture 7/8
No ratings yet
Classification: Decision Trees: Business Analytics Lecture 7/8
35 pages
Unit IV Da Online - PPTX 2 82
No ratings yet
Unit IV Da Online - PPTX 2 82
81 pages
Classification
No ratings yet
Classification
45 pages
Unit 1 ML (NN& ML Techniques)
No ratings yet
Unit 1 ML (NN& ML Techniques)
40 pages
ML-PPT Unit Iii-1
No ratings yet
ML-PPT Unit Iii-1
38 pages
Supervised Learning Algorithms
No ratings yet
Supervised Learning Algorithms
224 pages
Classifiction
No ratings yet
Classifiction
42 pages
Classification & Prediction Guide
No ratings yet
Classification & Prediction Guide
83 pages
DM Unit 4
No ratings yet
DM Unit 4
24 pages
Decision Trees Edited
No ratings yet
Decision Trees Edited
56 pages
Decision Tree
No ratings yet
Decision Tree
13 pages
Classification & Prediction Techniques
No ratings yet
Classification & Prediction Techniques
71 pages
Classification and Regression Trees (CART - I) : Dr. A. Ramesh
No ratings yet
Classification and Regression Trees (CART - I) : Dr. A. Ramesh
34 pages
CH 5
No ratings yet
CH 5
84 pages
Lecture 7 Overview of ML Models
No ratings yet
Lecture 7 Overview of ML Models
77 pages
Machine Learning Chapter 2
No ratings yet
Machine Learning Chapter 2
53 pages
Dwdm-Unit-3 R16
No ratings yet
Dwdm-Unit-3 R16
14 pages
Decision Tree
No ratings yet
Decision Tree
21 pages
Decision Trees for AI Students
No ratings yet
Decision Trees for AI Students
28 pages
06-Classification Part1
No ratings yet
06-Classification Part1
44 pages
Unit-3 DWDM
No ratings yet
Unit-3 DWDM
11 pages
Six Sigma Green Belt Training Statistical Self Assessment Tool
No ratings yet
Six Sigma Green Belt Training Statistical Self Assessment Tool
5 pages
PR 1 Answers
100% (2)
PR 1 Answers
4 pages
Supporting Information Placeholder
0% (1)
Supporting Information Placeholder
3 pages
2 Variables and Data
No ratings yet
2 Variables and Data
23 pages
Exercise 3 - MathEcon Bootcamp 2024
No ratings yet
Exercise 3 - MathEcon Bootcamp 2024
2 pages
Algebra1 Func Graph Intercept1
No ratings yet
Algebra1 Func Graph Intercept1
2 pages
Chapter 4
No ratings yet
Chapter 4
35 pages
Statistical Process Control Review
No ratings yet
Statistical Process Control Review
4 pages
A Three-Phase PWM Rectifier Using A Linear Active Disturbance Rejection Control
No ratings yet
A Three-Phase PWM Rectifier Using A Linear Active Disturbance Rejection Control
6 pages
Teorema Proyeksi
No ratings yet
Teorema Proyeksi
3 pages
Airbus Catia v5 - Modelizacion Piezas de Chapa
No ratings yet
Airbus Catia v5 - Modelizacion Piezas de Chapa
35 pages
Let Reviewer
No ratings yet
Let Reviewer
28 pages
CATIA V5 Interview Questions Guide
No ratings yet
CATIA V5 Interview Questions Guide
14 pages
Inbound 5337384559732601659
No ratings yet
Inbound 5337384559732601659
31 pages
Exploratory Factor Analysis: Prof. Andy Field
No ratings yet
Exploratory Factor Analysis: Prof. Andy Field
33 pages
Physics Neet Rank Booster Vol 4
No ratings yet
Physics Neet Rank Booster Vol 4
68 pages
Inventory Models Notes - Handout
100% (1)
Inventory Models Notes - Handout
17 pages
Homework 6 Mckinney Ce374L Prob. 4.4.5. Drawdown Was Measured During A Pumping Test in A Confined Aquifer at Frequent
No ratings yet
Homework 6 Mckinney Ce374L Prob. 4.4.5. Drawdown Was Measured During A Pumping Test in A Confined Aquifer at Frequent
3 pages
Digital Electronics Tutorial Sheet-2
No ratings yet
Digital Electronics Tutorial Sheet-2
2 pages
CBSE Class 10 Mathematics Answer Key 2024
No ratings yet
CBSE Class 10 Mathematics Answer Key 2024
15 pages
Master Thesis
No ratings yet
Master Thesis
115 pages
Indian Navy Chargement Recruitment Engineering Diploma 07 04
No ratings yet
Indian Navy Chargement Recruitment Engineering Diploma 07 04
14 pages
Regression Analysis for Students
100% (1)
Regression Analysis for Students
2 pages
KSOU Distance Diploma in Mechanical Engineering Syllabus
No ratings yet
KSOU Distance Diploma in Mechanical Engineering Syllabus
64 pages
LabView Practice
No ratings yet
LabView Practice
21 pages
Quiz 11: This Is A Preview of The Published Version of The Quiz
No ratings yet
Quiz 11: This Is A Preview of The Published Version of The Quiz
4 pages
Grade 9 Math Lesson: Ratio & Proportion
No ratings yet
Grade 9 Math Lesson: Ratio & Proportion
21 pages
DAA Mini Project
No ratings yet
DAA Mini Project
20 pages
MS Ovba
No ratings yet
MS Ovba
110 pages
SSP Paper PDF
No ratings yet
SSP Paper PDF
4 pages

Decision Trees and Decision Modeling

Uploaded by

Decision Trees and Decision Modeling

Uploaded by

+ 1

the Nature of the Data

 Thetarget function is also known informally at a

Decision trees are the workhorses of many practical

2. Efficient decision making

Generate a tree structure of training data subsets by splitting a

Three Types of Nodes

 Leaf or terminal nodes

Very impure group Less impure Very pure group

before DT algorithms may be applied:

1. DTs represent supervised learning  thus

before DT algorithms may be applied:

2. The training set should be rich and varied,

DTs learn by example  if examples are

before DT algorithms may be applied:

3. The target attribute classes must be discrete,

Decision DT algorithms mainly differ on

Most popular DT algorithms

Selecting the Best Split

Which attributes should be used for splitting?

 The objective is to reduce the impurity or uncertainty

An attribute is good when

An attribute is poor when

No: 60% No: 40% No: 20% No: 0% No: 66.67%

One Possible Tree

Dependent Variable: Approved

class of a new loan applicant

Dependent Variable: Support - the proportion of all

 No, there are many Dependent Variable: Approved

predict the class of a new loan

Dependent Variable: Approved It is possible to read a set of

Support = 6/15 = 40%

One of the most

 Each path from the root to a leaf is a rule.

Rule Antecedent Consequent Support Confidence

Classifier Predictive accuracy

 Theavailable data set is divided into two disjoint

 The training set should not be used in testing and

 The test set is also called the holdout set.

Errors committed by a The goal is a good

 Generalization errors Low generalization error

A matrix showing the predicted and actual classifications. A

False Positive False Negative

Example: Insurance company wants to predict if an

Which Model Performs Better?

Good results correspond to large numbers down

Actual Class = Yes 150 40

Actual Class = Yes 250 45

A numeric example with 3 classes?

How accurate is the model that

A Targeted Marketing Example

Model 1 Predicted Class Model 2 Predicted Class

Actual Yes 150 40 Actual Yes 250 45

Cost M1 = 150 * (-1) + 40 * 100 + 60 * 1 + 250 * 0 = 3910

Predictive accuracy of model should be higher than the

Most DM algorithms classify via a 2-step process:

 Default cutoff value is 0.50

 Can use different cutoff values

Cutoff Value = 0.75:

Cutoff Value = 0.50:

Cutoff Value = 0.25:

 Animbalance occurs when one or more classes have very

 Detectinginstances of the rare class is akin to finding a

Imbalanced Data Sets

Imbalance: Sampling Methods

Performs random undersampling by assigning a

Carries out simple random oversampling by

 Trees are pruned to avoid overfitting

 Prepruning approach  Postpruning

When To Stop Splitting Data

IBM SPSS Modeler 18

• Drug Reaction (C&RT, C5.0)

$R-Survived: Predicted score

$RC-Survived: Confidence value

You might also like