0% found this document useful (0 votes)
29 views21 pages

Ch4 Machine Learning

Machine Learning is the process of programming computers to learn from data, improving performance on tasks through experience. It is categorized into supervised, unsupervised, semi-supervised, and reinforcement learning, with applications including classification, regression, clustering, and anomaly detection. Different algorithms are used for each type, and systems can learn in batch or online modes depending on the data flow and resource constraints.

Uploaded by

pereirajoshnatba
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views21 pages

Ch4 Machine Learning

Machine Learning is the process of programming computers to learn from data, improving performance on tasks through experience. It is categorized into supervised, unsupervised, semi-supervised, and reinforcement learning, with applications including classification, regression, clustering, and anomaly detection. Different algorithms are used for each type, and systems can learn in batch or online modes depending on the data flow and resource constraints.

Uploaded by

pereirajoshnatba
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 21

MACHINE

LEARNING
WITH
PYTHON
WHAT IS MACHINE LEARNING?

• Machine Learning is the science (and art) of programming


computers so they can learn from data.
• And a more engineering-oriented one:
• A computer program is said to learn from experience E with respect
to some task T and some performance measure P, if its performance
on T, as measured by P, improves with experience E.
• —Tom Mitchell, 1997
EXAMPLE

• For example, your spam filter is a Machine Learning program that can learn
to flag
spam given examples of spam emails (e.g., flagged by users) and examples of
regular (nonspam, also called “ham”) emails.
• The examples that the system uses to learn are called the training set. Each
training example is called a training instance (or sample).
• In this case, the task T is to flag spam for new emails, the experience E is
the training data, and the performance measure P needs to be defined;
• for example, you can use the ratio of correctly classified emails. This particular
performance measure is called accuracy and it is often used in classification
tasks.
TYPES OF MACHINE LEARNING SYSTEMS

• There are so many different types of Machine Learning systems that it is useful
toclassify them in broad categories based on:
1. Whether or not they are trained with human supervision (supervised,
unsupervised,semisupervised, and Reinforcement Learning)

2. • Whether or not they can learn incrementally on the fly (online versus batch
learning)

3. Whether they work by simply comparing new data points to known data points, or
instead detect patterns in the training data and build a predictive model, much like
scientists do (instance-based versus model-based learning)
SUPERVISED/UNSUPERVISED LEARNING

• Machine Learning systems can be classified according to the amount and


type of supervision they get during training.
• There are four major categories:
1. Supervised learning,
2. Unsupervised learning,
3. Semi-supervised learning, and
4. Reinforcement Learning.
SUPERVISED LEARNING

• In supervised learning, the training data you feed to the algorithm


includes the desired solutions, called labels.
CLASSIFICATION

• A typical supervised learning task is


classification.
• The spam filter is a good example of
this: it is trained with many example
emails along with their class (spam
or ham),and it must learn how to
classify new emails.

A labeled training set for supervised


learning (e.g., spam classification)
REGRESSION
• Another supervised learning task is
regression
• A typical task is to predict a target
numeric value, such as the price of
a car, given a set of features
(mileage, age, brand, etc.) called
predictors. This sort of task is
called regression To train the
system, you need to give it many
examples of cars, including both
their predictors and their labels (i.e.,
their prices).
IMPORTANT SUPERVISED LEARNING
ALGORITHMS
1. k-Nearest Neighbors
2. Linear Regression
3. Logistic Regression
4. Support Vector Machines (SVMs)
5. Decision Trees and Random Forests
6. Neural networks2
UNSUPERVISED LEARNING

• In unsupervised learning, as you An unlabeled training set for


might guess, the training data is unsupervised learning
unlabeled.
• The system tries to learn without a
teacher.
MOST IMPORTANT UNSUPERVISED
LEARNING ALGORITHMS
• Clustering
1. k-Means
2. Hierarchical Cluster Analysis (HCA)
3. Expectation Maximization

• Visualization and dimensionality reduction


1. Principal Component Analysis (PCA)
2. Kernel PCA
3. Locally-Linear Embedding (LLE)
4. t-distributed Stochastic Neighbor Embedding (t-SNE)

• Association rule learning


5. Apriori
6. Eclat
CLUSTERING
• For example, say you have a lot of data about your
blog’s visitors.
• You may want to run a clustering algorithm to try
to detect groups of similar visitors
• At no point do you tell the algorithm which group a
visitor belongs to: it finds those connections
without your help.
• For example, it might notice that 40% of your
visitors are males who love comic books and
generally read your blog in the evening, while 20%
are young sci-fi lovers who visit during the
weekends, and so on.
• If you use a hierarchical clustering algorithm, it
may also subdivide each group into smaller
groups. This may help you target your posts for
each group.
VISUALIZATION
• Visualization algorithms are also good
examples of unsupervised learning
algorithms
• you feed them a lot of complex and
unlabeled data, and they output a 2D or
3D representation of your data that can
easily be plotted.
• These algorithms try to preserve as
much structure as they can (e.g., trying
to keep separate clusters in the input
space from overlapping in the
visualization), so you can understand
how the data is organized and perhaps
identify unsuspected patterns. Example of a t-SNE visualization highlighting
semantic clusters3
DIMENSIONALITY REDUCTION

• A related task is dimensionality reduction, in which the goal is to simplify the


data
without losing too much information.
One way to do this is to merge several correlated features into one.
For example, a car’s mileage may be very correlated with its age, so the
dimensionality reduction algorithm will merge them into one feature that
represents the car’s wear and tear. This is called feature extraction.
ANOMALY DETECTION

• Another important unsupervised task is


anomaly detection
• for example: detecting unusual credit card
transactions to prevent fraud, catching
manufacturing defects, or automatically
removing outliers from a dataset before
feeding it to another learning algorithm.
• The system is trained with normal
instances, and when it sees a new instance
it can tell whether it looks like a normal
one or whether it is likely an anomaly.
ASSOCIATION RULE LEARNING

• Another common unsupervised task is association rule learning, in which the


goal is to dig into large amounts of data and discover interesting relations
between attributes.
• For example, suppose you own a supermarket.
• Running an association rule on your sales logs may reveal that people who
purchase barbecue sauce and potato chips also tend to buy steak.
• Thus, you may want to place these items close to each other.
SEMISUPERVISED LEARNING

• Some algorithms can deal with partially labeled


training data, usually a lot of unlabeled data and a
little bit of labeled data. This is called
semisupervised learning
• Some photo-hosting services, such as Google
Photos, are good examples of this.
• Once you upload all your family photos to the
service, it automatically recognizes that the same
person A shows up in photos 1, 5, and 11, while
another person B shows up in photos 2, 5, and 7.
This is the unsupervised part of the algorithm
(clustering).
• Now all the system needs is for you to tell it who
these people are. Just one label per person,4 and it
is able to name everyone in every photo, which is
useful for searching photos.
REINFORCEMENT LEARNING
• The learning system, called an
agent in this context, can observe
the environment, select and
perform actions, and get rewards
in return (or penalties in the form
of negative rewards).
• It must then learn by itself what is
the best strategy, called a policy, to
get the most reward over time. A
policy defines what action the agent
should choose when it is in a given
situation.
BATCH AND ONLINE LEARNING
• The system is incapable of learning incrementally: it must be trained using all
the available data. This will generally take a lot of time and computing
resources, so it is typically done offline.
• First the system is trained, and then it is launched into production and runs
without learning anymore; it just applies what it has learned. This is called
offline learning.
• If you want a batch learning system to know about new data (such as a new
type of spam), you need to train a new version of the system from scratch on
the full dataset (not just the new data, but also the old data), then stop the
old system and replace it with the new one.
ONLINE LEARNING

• In online learning, you train the system incrementally by feeding it data instances
sequentially, either individually or by small groups called mini-batches.
• Each learning step is fast and cheap, so the system can learn about new data on
the fly.
• Online learning is great for systems that receive data as a continuous flow (e.g.,
stock prices) and need to adapt to change rapidly or autonomously.
• It is also a good option if you have limited computing resources: once an online
learning system has learned about new data instances, it does not need them
anymore, so you can discard them (unless you want to be able to roll back to a
previous state and “replay” the data). This can save a huge amount of space.
INSTANCE-BASED VERSUS MODEL-BASED
LEARNING

• One more way to categorize Machine Learning systems is by how they


generalize.
• Most Machine Learning tasks are about making predictions.
• This means that given a number of training examples, the system needs to
be able to generalize to examples it has never seen before.

You might also like