Machine Learning
Lecture # 1
Introduction & Fundamentals
Courtesy: Dr Usman Akram (NUST)
1
What is Machine Learning?
• Machine Learning
– Study of algorithms that
– improve their performance
– at some task
– with experience
• Optimize a performance criterion using
example data or past experience.
What is Machine Learning?
• Role of Statistics: Inference from a sample
• Role of Computer science: Efficient algorithms
to
– Solve the optimization problem
– Representing and evaluating the model for inference
What is Machine Learning?
• Adapt to / learn from data
– To optimize a performance function
Can be used to:
– Extract knowledge from data
– Learn tasks that are difficult to formalise
– Create software that improves over time
An Introduction to Machine Learning
• According to Herbert Simon, learning is, “Any change
in a System that allows it to perform better the
second time on repetition of the same task or on
another task drawn from the same population.” [G. F.
Luger and W. A. Stubblefield, Artificial Intelligence:
Structures and Strategies for Complex Problem
Solving, The Banjamin/Cummings Publishing
Company, Inc 1989
Why
• “Learn”?
Machine learning is programming computers
to optimize a performance criterion using
example data or past experience.
• Learning is used when:
– Human expertise does not exist (navigating on
Mars),
– Humans are unable to explain their expertise
(speech recognition)
– Solution changes in time (routing on a computer
network)
– Solution needs to be adapted to particular cases
(user biometrics)
9
What We Talk About "Learning”
• Learning general models from a data of
particular examples
• Example in retail: Customer transactions to
consumer behavior:
People who bought “Da Vinci Code” also bought
“The Five People You Meet in Heaven”
(www.amazon.com)
• Build a model that is a good and useful
approximation to the data.
10
Machine Learning
Applications:
• Retail: Market basket analysis, Customer
relationship management (CRM)
• Finance: Credit scoring, fraud detection
• Manufacturing: Optimization, troubleshooting
• Medicine: Medical diagnosis
• Telecommunications: Quality service
of optimization
• Web mining: Search engines
11
Growth of Machine Learning
• Machine learning is preferred approach to
– Speech recognition, Natural language processing
– Computer vision
– Medical outcomes analysis
– Robot control
– Computational biology
Growth of Machine Learning
• This trend is accelerating
– Improved machine learning algorithms
– Improved data capture, networking, faster computers
– New sensors / IO devices
– Demand for self-customization,user, environment
Categories
• Association Analysis
• Supervised Learning
– Classification
– Regression/Prediction
• Unsupervised Learning
• Reinforcement Learning
13
Learning Associations
• Basket analysis:
P (Y | X ) probability that somebody who
buys X also buys Y where X and Y are
products/services.
Example: P ( bread |cold drink) = 0.7
Market-Basket transactions
TID Items
1 Bread, Milk
2 Bread, Diaper, Cold Drink, Eggs
3 Milk, Diaper, Cold Drink
4 Bread, Milk, Diaper, Cold Drink
5 Bread, Milk, Diaper, Water
Classification
• Example: Credit
scoring
• Differentiating
between low-
risk and high-
risk customers
from their
income and
savings
Discriminant: IF
income > θ 1 AND
savings > θ2 15
Prediction: Regression
• Example: Price of a
used car
y= 0
• x : car attributes wx+w
y : price
y = g (x | θ )
g ( ) model,
θ parameters
17
Pattern
A pattern is an entity that can be given a
name
19
Recognition
• Identification of a pattern as a member of a
category
20
Classification
Classification
• You had some training
example or ‘training
data’
• The examples were
‘labeled’
• You used those examples to
make the kid ‘learn’ the
difference between an apple
and an orange
Classification
Pattern Recognition
Given an input pattern, make a decision about
the “category” or “class” of the pattern
24
Unsupervised Learning
• Learning “what normally happens”
• No output
• Clustering: Grouping similar instances
• Other applications: Summarization,
Association Analysis
• Example applications
– Image compression: Color quantization
– Bioinformatics: Learning motifs
25
Reinforcement Learning
• Topics:
– Policies: what actions should an agent take in a
particular situation
– Utility estimation: how good is a state (used by
policy)
• Credit assignment problem (what as
responsible for the outcome)
• Applications:
– Game playing
– Robot in a maze
– Multiple agents, partial26 observability, ...
Model Choice
– What type of classifier shall we use? How shall we
select its parameters? Is there best classifier...?
– How do we train...? How do we adjust the
parameters of the model (classifier) we picked so
that the model fits the data?
Features
• Features: a set of variables believed to carry discriminating
and characterizing information about the objects under
consideration
• Feature vector: A collection of d features, ordered in some
meaningful way into a d- dimensional column vector, that
represents the signature of the object to be identified.
• Feature space: The d-dimensional space in which the feature
vectors lie. A d-dimensional vector in a d-dimensional space
constitutes a point in that space.
Feature
s
Features
• Feature choice
– Good Features
• Ideally, for a given group of patterns coming from the
same class, feature values should all be similar
• For patterns coming from different classes, the feature
values should be different.
– Bad Features
• irrelevant, noisy, outlier?
Features
Classification
Learn a method for predicting the instance class
from pre- labeled (classified) instances
Many approaches:
Statistics,
Decision Trees, Neural
Networks,
...
33
Clustering
Find “natural” grouping of instances given un-
labeled data
34
Supervised Learning
x2
We knew the correct answers
x1
Unsupervised Learning
x2
We need to figure out the patterns
x1
Example Applications
37
Examples: Medicine
Microscopic tissue data - Cancer
Detection
38
Examples: GIS
Geographic Information Systems
Manipulation of Satellite Imagery
Terrain Classification, Meteorology
39
Examples: Industrial Inspection
Human operators are
expensive, slow and
unreliable
Make machines do the
job instead
40
Examples: HCI
Try to make human computer
interfaces more natural
Gesture recognition
Facial Expression Recognition
Lip reading
41
Examples: Sign Language/Gesture
Recognition
British Sign Language Alphabet
42
Examples: Facial Expression
Recognition
Implicit customer feedback
43
Examples: Facial Expression
Recognition
Implicit customer feedback
44
Examples: Biometrics
Biometrics - Authentication
techniques
Physiological Biometrics
Face, IRIS, DNA, Finger Prints
Behavioral Biometrics
Typing Rhythm, Handwriting,
Gait
45
Face Recognition
Training examples of a person
Test images
AT&T Laboratories, Cambridge UK
http://www.uk.research.att.com/facedatabase.html
46
Examples: Biometrics – Face
Recognition
47
Faces and Digital Cameras
Setting camera focus Camera waits for everyone to
via face detection smile to take a photo
[Canon]
Automatic lighting
correction based
on face detection
Examples: Biometrics – Finger Print
Recognition
49
Examples: Biometrics – Signature
Verification
50
Examples: Robotics
51
Examples: Optical Character
Recognition
Convert document image into text
53
Examples: Optical Character
Recognition
License Plate Recognition
55
Examples: Optical Character
Recognition
Automatic Mail Sorting
56
Example: Hand-written digits
Data representation: Greyscale images Task:
Classification (0,1,2,3…..9) Problem
features:
• Highly variable inputs from same class
including some “weird” inputs,
• imperfect human classification,
• high cost associated with errors so “don’t
know” may be useful.
A classic example of a task that requires machine learning:
It is very hard to say what makes a 2
Safety and Security
Autonomous robots Driver assistance
Monitoring pools
(Poseidon)
Pedestrian detection Surveillance
[MERL, Viola et al.]
Summary of Applications
Problem Domain Application Input Pattern Output Class
Document Image Optical Character Document Image Characters/words
Analysis Recognition
Document Internet search Text Document Semantic categories
Classification
Document Junk mail filtering Email Junk/Non-Junk
Classification
Multimedia retrieval Internet search Video clip Video genres
Speech Recognition Telephone directory Speech waveform Spoken words
assistance
Natural Language Information extraction Sentence Parts of Speech
Processing
Biometric Recognition Personal identification Face, finger print, Iris Authorized users for
access control
Medical Computer aided Microscopic Image Healthy/cancerous cell
diagnosis
Military Automatic target Infrared image Target type
recognition
Industrial automation Fruit sorting Images taken on Grade of quality
conveyor belt
Bioinformatics Sequence analysis DNA sequence Known types of genes
62
Resources: Datasets
• UCI Repository:
http://www.ics.uci.edu/~mlearn/MLRepository.html
• UCI KDD Archive:
http://kdd.ics.uci.edu/summary.data.application.html
• Statlib: http://lib.stat.cmu.edu/
• Delve: http://www.cs.utoronto.ca/~delve/
63
Resources: Journals
• Journal of Machine Learning Research
www.jmlr.org
• Machine Learning
• IEEE Transactions on Neural Networks
• IEEE Transactions on Pattern Analysis and
Machine Intelligence
• Annals of Statistics
• Journal of the American Statistical
Association
• …. 64
Resources: Conferences
• International Conference on Machine Learning (ICML)
• European Conference on Machine Learning (ECML)
• Neural Information Processing Systems (NIPS)
• Computational Learning
• International Joint Conference on Artificial Intelligence (IJCAI)
• ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD)
• IEEE Int. Conf. on Data Mining (ICDM)
65
Course Information
Course Information
Books
Pattern Recognition & Machine Learning, 1st Edition,
Chris Bishop
Pattern Classification, Duda, Hart & Stork
Introduction to Machine Learning, Alphaydin
Course Contents
• Introduction to Machine Learning
• K- Nearest Neighbor (KNN)
• Minimum Distance classifier
• Single and Multi layer Neural Network (Perceptron)
• Decision Trees
• Naive Bayes Classifier
• Bayesian decision Theory
• Maximum likelihood Estimation
• Gaussian Mixture Models (GMM) and Expectation maximization
• Reinforcement Learning
• Clustering
• PCA and Auto encoders
• Ensemble Networks
Grading Policy
Mid Term Exam: 25%
Quizzes (4-6): 10%
Computer and numerical assignments: 10%
Final Project: 10%
Final Exam: 40%
A Classification Problem
Example
Katydids
Given a collection of annotated
data. In this case 5 instances of
Katydids and five of Grasshoppers,
decide what type of insect the
unlabeled example is. Grasshoppers
Katydid or Grasshopper?
For any domain of interest, we can measure
features
Color {Green, Brown, Gray, Other} Has Wings?
Abdomen Thorax
Length Antennae
Length Length
Mandible
Size
Spiracle
Diameter
Leg Length
My_Collection
We can store features
in a database. Insect Abdomen Antennae Insect Class
ID Length Length
1 2.7 5.5 Grasshopper
2 8.0 9.1 Katydid
The classification 3 0.9 4.7 Grasshopper
problem can now be 4 1.1 3.1 Grasshopper
expressed as: 5 5.4 8.5 Katydid
6 2.9 1.9 Grasshopper
•Given a training database 7 6.1 6.6 Katydid
(My_Collection), predict the class 8 0.5 1.0 Grasshopper
label of a previously unseen
instance 9 8.3 6.6 Katydid
10 8.1 4.7 Katydids
previously unseen instance = 11 5.1 7.0 ???????
Grasshoppers Katydids
10
9
8
7
Antenna Length
6
5
4
3
2
1
1 2 3 4 5 6 7 8 9 10
Abdomen Length
Pigeon Problem 1
Examples of class A Examples of class
B
3 4 5 2.5
1.5 5 5 2
6 8 8 3
2.5 5 4.5 3
Pigeon Problem 1 What class is this
object?
Examples of class A Examples of class
B
3 4 5 2.5 8 1.5
What about this one,
1.5 5 5 2
A or B?
6 8 8 3
4.5 7
2.5 5 4.5 3
Pigeon Problem 1 This is a B!
Examples of class A Examples of class
B
3 4 5 2.5 8 1.5
1.5 5 5 2
Here is the rule.
If the left bar is
6 8 8 3
smaller than the
right bar, it is an A,
2.5 5 4.5 3
otherwise it is a B.
Pigeon Problem 2 Oh! This ones hard!
Examples of class A Examples of class
B
4 4 5 2.5 8 1.5
Even I know this one
5 5 2 5
6 6 5 3
7 7
3 3 2.5 3
Pigeon Problem 2
Examples of class A Examples of class
B
The rule is as follows, if the
two bars are equal sizes, it is
an A. Otherwise it is a B.
4 4 5 2.5
5 5 2 5
So this one is an A.
6 6 5 3
7 7
3 3 2.5 3
Pigeon Problem 3
Examples of class A Examples of class
B
6 6
4 4 5 6 This one is really hard!
What is this, A or B?
1 5 7 5
6 3 4 8
3 7 7 7
Pigeon Problem 3 It is a B!
Examples of class A Examples of class
B
6 6
4 4 5 6
The rule is as follows, if the
sum of the two bars is less
than or equal to 10, it is an
1 5 7 5
A. Otherwise it is a B.
6 3 4 8
3 7 7 7
Pigeon Problem 1
10
9
8
Examples of class A Examples of class 7
B 6
Left Bar
5
4
3
2
3 4 5 2.5 1
1 2 3 4 5 6 7 8 9 10
Right Bar
1.5 5 5 2
Here is the rule again.
If the left bar is smaller
than the right bar, it is
6 8 8 3
an A, otherwise it is a B.
2.5 5 4.5 3
Pigeon Problem 2
10
9
8
Examples of class A Examples of class 7
B 6
Left Bar
5
4
3
2
4 4 5 2.5 1
1 2 3 4 5 6 7 8 9 10
Right Bar
5 5 2 5
Let me look it up… here it is..
the rule is, if the two bars
are equal sizes, it is an A.
Otherwise it is a B.
6 6 5 3
3 3 2.5 3
Pigeon Problem 3
100
90
80
Examples of class A Examples of class 70
B 60
Left Bar
50
40
30
20
4 4 5 6 10
10 20 30 40 50 60 70 80 90 100
Right Bar
1 5 7 5
6 3 4 8
The rule again:
if the square of the sum of the
two bars is less than or equal
to 10, it is an A. Otherwise it
3 7 7 7
is a B.
Grasshoppers Katydids
10
9
8
7
Antenna Length
6
5
4
3
2
1
1 2 3 4 5 6 7 8 9 10
Abdomen Length
previously unseen instance = 11 5.1 7.0 ???????
•We can “project” the previously
10 unseen instance into the same space
9 as the database.
8 •We have now abstracted away the
details of our particular problem. It
7
will be much easier to talk about
Antenna Length
6 points in space.
5
4
3
2
1
Katydids
1 2 3 4 5 6 7 8 9 10 Grasshoppers
Abdomen Length
Simple Linear Classifier
10
9
8
7 R.A. Fisher
1890-
6 1962
5 If previously unseen instance above the line
4 then
class is Katydid
3
else
2 class is
1 Grasshopper
Katydids
1 2 3 4 5 6 7 8 9 10 Grasshopper
s
Which of the “Pigeon Problems” can be solved by the
Simple Linear Classifier? 10
9
8
7
6
5
1) Perfect 4
2) Useless 3
3) Pretty Good 2
1
1 2 3 4 5 6 7 8
9 10
100 10
90 9
Problems that can 80 8
70 7
be solved by a linear 6
60
classifier are call 50 5
linearly separable. 40 4
30 3
20 2
10 1
10 20 30 40 50 60 70 80 90 100 1 2 3 4 5 6 7 8
9 10
MULTI-CLASS CLASSIFICATION
Virginica
A Famous Problem
R. A. Fisher’s Iris Dataset.
• 3 classes
• 50 of each class
Setosa
The task is to classify Iris plants Versicolor
into one of 3 varieties using the
Petal Length and Petal Width.
Iris Setosa Iris Versicolor Iris Virginica
We can generalize the piecewise linear classifier to N classes, by fitting N-1 lines. In this case
we first learned the line to (perfectly) discriminate between Setosa and Virginica/Versicolor,
then we learned to approximately discriminate between Virginica and Versicolor.
Virginica
Setosa
Versicolo
r
Binary classification: Multi-class
classification:
x2 x2
x1 x1
x2
One-vs-all (one-vs-rest):
x1
x2 x2
x1
x1
x2
Class 1:
Class 2:
Class 3:
x1
SPLITTING OF TRAINING AND TEST
DATA
Dividing Up Data
• We need independent data sets to train, set
parameters, and test performance
• Thus we will often divide a data set into
three
– Training set
– Parameter selection set
– Test set
• These must be independent
• Data set 2 is not always necessary
Dataset
Inputs Labels
15 95 1
33 90 1
78 70 0
70 45 0
80 18 0
35 65 1
45 70 1
31 61 1
50 63 1
98 80 0
73 81 0
50 18 0
15 95 1
Inputs Labels 33 90 1
15 95 1
78 70 0
33 90 1
78 70 0 70 45 0
70 45 0 80 18 0
80 18 0 35 65 1
35 65 1 50:50
45 70 1 split
31 61 1
50 63 1 45 70 1
98 80 0 31 61 1
73 81 0 50 63 1
50 18 0
98 80 0
73 81 0
50 18 0
• Can be 70:30 or any other
Estimating the Generalisation Error
• We have a dilemma if we have limited data
– We want to use as much data as possible for
training
– We need lots of data for estimating the
generalisation error
• Obtaining a good estimate of generalisation
performance is important for selecting the best
parameter values
Cross Validation
• We can solve our dilemma by repeating the
training many times on different partitioning
• This is known as K-fold cross validation
Cross Validation
Cross Validation
Cross Validation
Cross Validation
Cross Validation
Cross Validation
Cross Validation
Cross Validation
Cross Validation
Cross Validation
Cross Validation
Cross Validation
Cross Validation
Cross Validation
Cross Validation
Cross Validation
Cross Validation
Cross Validation
Cross Validation
Cross Validation
Cross Validation
Cross Validation
Cross Validation
Leave-one-out cross-validation is extreme case
Price of Cross Validation
• Cross-validation is computationally expensive (K-fold
cross-validation requires K times as much work)
• Cross-validation is only necessary when you have
little data
• Re-running code is usually cheap compared with
writing the code
PERFORMANCE MEASUREMENTS
R.O.C. Analysis (Receiver Operating Characteristics)
False positives – i.e. falsely predicting an event
False negatives – i.e. missing an incoming event
Similarly, we have “true positives” and “true negatives”
Prediction
0 1
0
TN FP
Truth
1 FN TP
Accuracy Measures
• Accuracy
– = (TP+TN)/(P+N)
• Sensitivity or true positive rate (TPR)
– = TP/(TP+FN) = TP/P
• Specificity or TNR
– = TN/(FP+TN) = TN/N
• Positive Predictive value (Precision) (PPV)
– = Tp/(Tp+Fp)
• Recall
– = Tp/(Tp+Fn)
Acknowledgements
Pattern Recognition & Machine Learning, 1st Edition, Chris Bishop
Introduction to Machine Learning, Alphaydin
Statistical Pattern Recognition: A Review – A.K Jain et al., PAMI (22) 2000
Pattern Recognition and Analysis Course – A.K. Jain, MSU
Material in these slides has been taken from, the following
Pattern Classification” by Duda et al., John Wiley & Sons.
Some material adapted from Dr Ali Hassan’s slides
resources
129