Machine Learning
By,
Dr. Pallavi Adke,
Research Assistant Professor,
Institute of Artificial Intelligence
Teri Batoon Main Aisa Ulaza Jiya
Robot movies
Definition
Machine learning is a subset of Artificial Intelligence (AI) that enables
computers to learn from data and make predictions without being explicitly
programmed.
An Analogy from Human Brain
The brain is the most complex part of our body, and the knowledge about its
working principle is still big research underway. But to understand it, let's take one
simplest example.
Suppose we want to catch the ball coming towards us. What would be our first
reaction? We will use our hands to catch it, right?
But were you able to catch it in your childhood days?
In our childhood days, we might
have faced difficulty in catching
balls. We practiced more and more,
and finally, our brains learned how
to catch a ball.
Saying more explicitly, it learns the
mapping function from the input
set to the output set.
Machine Learning
This learning of mapping function through several experiences is
known as intelligence in living things. But what if some non-living
things started doing the same?
Can we provide this same capability to our computers anyway?
That's where the buzzword "Machine Learning" comes into the picture.
Let's move towards understanding it in detail.
Can you guess "how" we can do that?
● We saw that human brains
have a unique characteristic of
learning from different
experiences and evolving to do
their best for similar
experiences.
● Continuing the same example of
catching the ball, suppose we
want to make a Robot catch
the ball by providing human
intelligence.
There are two possibilities
Method 1:
● We can write a function-based computer program that will take the
ball's location as the input parameter and give us the output position
to place our robot to catch the ball. This method can also be called
traditional programming.
There are two possibilities
Method 2:
● We recorded some data for the ball positions and corresponding robot
positions (placed manually) to catch the ball perfectly.
● Let's say we asked our robot to guess where it should be present to catch
the ball, given the ball position is X1. Initially, the robot made very wrong
guesses about its positions. Let's say it guessed Y1', and the difference
between Y1 and Y1' was very high.
What is Machine Learning?
● But our robot did not lose hope. It learned from its mistakes and
corrected the predictions for every recorded ball position (X1, X2, …, Xn).
Continuously trying for all these instances, it achieved the intelligence
and started predicting its correct or near-correct positions (Y1',
Y2', …, Yn').
● This type of Computer Intelligence gained from learning via
different recorded experiences is known as the Machine Learning.
● Once this learning has been done, it can predict the robot's position for
similar ball positions that were not recorded earlier. Let's say for Xn+1, it
will be able to give some output Yn+1', and we will decide the success by
outcome whether it caught the ball or not.
Difference Between Machine Learning and Traditional Programming
Definition
● Machine learning is a subset of Artificial Intelligence (AI) that enables
computers to learn from data and make predictions without being
explicitly programmed.
● In machine learning, we are not writing programs/functions. Instead, it is
learning by itself based on historical data, and that's the core reason why both
definitions include the term "without explicit programming".
● Here computer takes Input Data and Output Data as input parameters and tries
to produce the best suitable function that maps Input Data to Output Data. The
machine learns a mapping function that maps the input data to the output
data using existing experiences (Similar to the brain's functioning). But still,
one question is not straightforward:
● Why do we even need Machine Learning?
Example Applications that can not be programmed traditionally
1. Self-driving vehicles: There are millions and billions of possible scenarios on the road,
and it might not be possible to write explicit programs for that.
2. Handwritten character recognition: There are billions of people out there, and two
persons' handwriting barely matches. If we opt for traditional programming to identify the
characters in handwritten documents, we might never cover all the cases in the world.
3. Email Spam/Non-spam classification: Based on the content of the email, we classify
whether that is spam or non-spam. Every professional person in the world daily gets
around 150 emails on average. Suppose we traditionally programmed our computer that if
an email contains these sets of words/links/phrases, then mark it as spam. Don't you think
spammers will quickly get the hint that their emails are marked as spam when they are
sending these sets? They will easily find an alternative, and now we have to program our
computer again for those alternatives.
4. Face Recognition: All computer vision tasks involve billions of image pixels (one unit of
image). We can not explicitly write programs to perform functions on various pixels.
ML is Automating the Process of Automation
Types of Learning in ML
Supervised Learning
Supervised machine learning is a fundamental approach for machine learning and
artificial intelligence. It involves training a model using labeled data, where each input
comes with a corresponding correct output. The process is like a teacher guiding a
student—hence the term “supervised” learning.
What is Supervised Machine Learning?
● Supervised learning is a type of machine learning where a model is trained on
labeled data—meaning each input is paired with the correct output.
● The model learns by comparing its predictions with the actual answers
provided in the training data. Over time, it adjusts itself to minimize errors and
improve accuracy.
● The goal of supervised learning is to make accurate predictions when given
new, unseen data. For example, if a model is trained to recognize handwritten
digits, it will use what it learned to correctly identify new numbers it hasn’t seen
before.
● Supervised learning can be applied in various forms, including supervised
learning classification and supervised learning regression,
● A fundamental concept in supervised machine learning is learning a class from
examples.
● This involves providing the model with examples where the correct label is
known, such as learning to classify images of cats and dogs by being shown
labeled examples of both.
How Supervised Learning Works?
● First step is that we need to
train the model for each
shape.
● After training, we test our
model using the test set, and
the task of the model is to
identify the shape.
● The machine is already
trained on all types of
shapes, and when it finds a
new shape, it classifies the
shape on the bases of a
number of sides, and
predicts the output.
Steps Involved in Supervised Learning
1. First Determine the type of training dataset
2. Collect/Gather the labelled training data.
3. Split the training dataset into training dataset, test dataset, and
validation dataset.
4. Determine the input features of the training dataset, which should
have enough knowledge so that the model can accurately predict
the output.
5. Determine the suitable algorithm for the model, such as support
vector machine, decision tree, etc.
6. Execute the algorithm on the training dataset. Sometimes we
need validation sets as the control parameters, which are the subset of
training datasets.
7. Evaluate the accuracy of the model by providing the test set. If the
model predicts the correct output, which means our model is accurate.
Types of Supervised Learning Algorithms
There are two types of supervised
learning algorithms
1. Regression
2. Classification
The main difference between regression
and classification models is that
regression algorithms are used to
predict continuous values (test
scores), while classification
algorithms predict discrete values
(spam/not spam, male/female,
true/false).
Types of Regression
1. Linear Regression
2. Logistics Regression
3. Non-Linear Regression
4. Regression Trees
5. Bayesian Linear Regression
6. Polynomial Regression
1. Regression
This is used to predict the relationship between independent variables and a
dependent variable. It is based on the idea that the dependent variable can be
modeled as a linear combination of the independent variables and an
error term.
Fig shows a simple example of how values of a property increase as its
surface area increases, creating a linear relationship between both
characteristics.
Linear Regression
● It is used for solving regression problems.
● It is used for predicting the continuous
dependent variable with the help of
independent variables.
● The goal of the Linear regression is to find the
best fit line that can accurately predict the
output for the continuous dependent variable.
● If single independent variable is used for
prediction then it is called Simple Linear
Regression and
● If there are more than two independent
variables then such regression is called as
Multiple Linear Regression.
● By finding the best fit line, algorithm establish
the relationship between dependent variable y= ax + b
and independent variable. And the relationship Where a is slope of a line
should be of linear nature. b is intercept
x is independent variable
● The output for Linear regression should only be and
the continuous values such as price, age, salary, y is dependent variable
etc.
● The relationship between the dependent
Logistic Regression:
● It can be used for Classification as well as for
Regression problems, but mainly used for
Classification problems.
● Logistic regression is used to predict the
categorical dependent variable with the
help of independent variables.
● The output of Logistic Regression problem can
be only between the 0 and 1.
● Logistic regression can be used where the
probabilities between two classes is
required.
● Such as whether it will rain today or not,
either 0 or 1, true or false etc.
● Nonlinear regression models can capture more
complex relationships and are not linear in
parameters, necessitating iterative methods
for parameter estimation.
● These models provide flexibility to fit data
better but are more complex and
computationally intensive, with coefficients
that are harder to interpret directly.
Difference between Linear Regression and Logistic Regression
Difference between Linear Regression and Logistic Regression
Linear Regression Logistic Regression
Linear regression is used to predict the continuous Logistic Regression is used to predict the categorical
dependent variable using a given set of independent dependent variable using a given set of independent
variables. variables.
Linear Regression is used for solving Regression problem. Logistic regression is used for solving Classification
problems.
In Linear regression, we predict the value of continuous In logistic Regression, we predict the values of categorical
variables. variables.
In linear regression, we find the best fit line, by which we In Logistic Regression, we find the S-curve by which we
can easily predict the output. can classify the samples.
Least square estimation method is used for estimation of Maximum likelihood estimation method is used for
accuracy. estimation of accuracy.
The output for Linear Regression must be a continuous The output of Logistic Regression must be a Categorical
value, such as price, age, etc. value such as 0 or 1, Yes or No, etc.
In Linear regression, it is required that relationship In Logistic regression, it is not required to have the linear
between dependent variable and independent variable relationship between the dependent and independent
must be linear. variable.
In linear regression, there may be collinearity between the In logistic regression, there should not be collinearity
Problem on Linear Regression
Find the linear regression equation for the given data:
x y
3 8
9 6
5 4
3 2
Calculating intercept and slope value.
x y x2 xy
y = ax + b
3 8 9 24
Where a is slope of a line
b is intercept
x is independent variable and
y is dependent variable 9 6 81 54
5 4 25 20
3 2 9 6
∑xy = 104
∑x = 20 ∑y = 20 ∑x = 124
2
Calculating intercept and slope value.
x y x2 xy
3 8 9 24
9 6 81 54
5 4 25 20
3 2 9 6
∑x2 = ∑xy =
∑x = 20 ∑y = 20
124 104
2. Classification
● The Classification algorithm is a Supervised Learning technique that is
used to identify the category of new observations on the basis of
training data.
● In Classification, a program learns from the given dataset or observations
and then classifies new observation into a number of classes or groups.
Such as, Yes or No, 0 or 1, Spam or Not Spam, cat or dog, etc.
Classes can be called as targets/labels or categories.
● Unlike regression, the output variable of Classification is a category,
not a value, such as "Green or Blue", "fruit or animal", etc. Since the
Classification algorithm is a Supervised learning technique, hence it takes
labeled input data, which means it contains input with the corresponding
output.
● In classification algorithm, a discrete output function(y) is mapped to input
variable(x).
● y=f(x), where y = categorical output
Classification
This is used to assign data to one of several categories (classes) based on certain
characteristics. Once classifier has been trained, it can be used to make predictions about new
and unknown data.
Binary Classification is a type of classification where each data sample is assigned into one of
two mutually exclusive classes. On the other hand, Multiclass Classification is where each data
sample is assigned into one of more than two classes
Types of Classification Algorithm
Linear Models
● Logistic Regression
● Support Vector Machines
Non-linear Models
● K-Nearest Neighbours
● Kernel SVM
● Naive Bayes
● Decision Tree Classification
● Random Forest Classification
K-Nearest Neighbor(KNN) Algorithm
● K-Nearest Neighbour is one of the simplest Machine Learning algorithms
based on Supervised Learning technique.
● K-NN algorithm assumes the similarity between the new case/data and
available cases and put the new case into the category that is most similar
to the available categories.
● K-NN algorithm stores all the available data and classifies a new data
point based on the similarity. This means when new data appears then
it can be easily classified into a well suite category by using K- NN
algorithm.
● K-NN algorithm can be used for Regression as well as for
Classification but mostly it is used for the Classification problems.
● It is also called a lazy learner algorithm because it does not learn from
the training set immediately instead it stores the dataset and at the time
of classification, it performs an action on the dataset.
● KNN algorithm at the training phase just stores the dataset and when it
gets new data, then it classifies that data into a category that is much
similar to the new data.
How does K-NN work?
● Step-1: Select the number K of the neighbors
● Step-2: Calculate the Euclidean distance of K number of neighbors
● Step-3: Take the K nearest neighbors as per the calculated Euclidean
distance.
● Step-4: Among these k neighbors, count the number of the data
points in each category.
● Step-5: Assign the new data points to that category for which the
number of the neighbor is maximum.
● Step-6: Our model is ready.
K-Nearest Neighbor(KNN) Algorithm
● Firstly, we will choose the number of neighbors, so we will choose the
k=5.
● Next, we will calculate the Euclidean distance between the data points.
The Euclidean distance is the distance between two points, which we have
already studied in geometry. It can be calculated as:
Why do we need a K-NN Algorithm?
K-Nearest Neighbor(KNN) Algorithm
● By calculating the Euclidean distance we got the nearest neighbors, as
three nearest neighbors in category A and two nearest neighbors in
category B. Consider the below image:
● As we can see the 3 nearest neighbors are from category A, hence this
new data point must belong to category A.
How to select the value of K in the K-NN Algorithm?
● There is no particular way to determine the best value for "K", so we need to try some values to find the
best out of them. The most preferred value for K is 5 or sqrt(no. of samples)
● A very low value for K such as K=1 or K=2, can be noisy and lead to the effects of outliers in the model.
● Large values for K are good, but it may find some difficulties.
Statistical Methods for Selecting k:
● Cross-Validation: A robust method for selecting the best k is to perform k-fold cross-validation. This
involves splitting the data into k subsets training the model on some subsets and testing it on the
remaining ones and repeating this for each subset. The value of k that results in the highest average
validation accuracy is usually the best choice.
● Elbow Method: In the elbow method we plot the model’s error rate or accuracy for different values of k.
As we increase k the error usually decreases initially. However after a certain point the error rate starts to
decrease more slowly. This point where the curve forms an “elbow” that point is considered as best k.
● Odd Values for k: It’s also recommended to choose an odd value for k especially in classification tasks to
avoid ties when deciding the majority class.
Advantages of KNN Algorithm
● It is simple to implement.
● It is robust to the noisy training data
● It can be more effective if the training data is large.
Disadvantages of KNN Algorithm
● Always needs to determine the value of K which may be complex some
time.
● The computation cost is high because of calculating the distance
between the data points for all the training samples.
Problem on KNN
Apply K nearest neighbor classifier to predict the diabetic patient with the given features BMI = 43.6, Age =
40. If the training examples are as shown in table. Assume K=3
BMI Age Sugar
33.6 50 1
26.6 30 O
23.4 40 O
43.1 67 O
35.3 23 1
35.9 67 1
36.7 45 1
25.7 46 O
23.3 29 O
31 56 1
Solution
The given training dataset has 10 instances with two features BMI (Body Mass Index) and Age. Sugar is the
target label. The target label has two possibilities 0 and 1. 0 means the diabetic patient has no sugar and 1
means the diabetic patient has sugar.
Given the dataset and new test instance, we need to find the distance from the new test instance to every
training example. Here we use the euclidean distance formula to find the distance.
In the next table, you can see the calculated distance from text example to training instances.
Calculated distance
BMI Age Sugar Formula Distance
33.6 50 1 √((43.6-33.6)^2+(40-50)^2 ) 14.14
26.6 30 O √((43.6-26.6)^2+(40-30)^2 ) 19.72
23.4 40 O √((43.6-23.4)^2+(40-40)^2 ) 20.20
43.1 67 O √((43.6-43.1)^2+(40-67)^2 ) 27.00
35.3 23 1 √((43.6-35.3)^2+(40-23)^2 ) 18.92
35.9 67 1 √((43.6-35.9)^2+(40-67)^2 ) 28.08
36.7 45 1 √((43.6-36.7)^2+(40-45)^2 ) 8.52
25.7 46 O √((43.6-25.7)^2+(40-46)^2 ) 18.88
23.3 29 O √((43.6-23.3)^2+(40-29)^2 ) 23.09
31 56 1 √((43.6-31)^2+(40-56)^2 ) 20.37
Find 3 nearest neighbors.
Once you calculate the distance, the next step is to find the nearest neighbors based on the value of k. In
this case, the value of k is 3. Hence we need to find 3 nearest neighbors.
BMI Age Sugar Distance Rank
33.6 50 1 14.14 2
26.6 30 O 19.72
23.4 40 O 20.20
43.1 67 O 27.00
35.3 23 1 18.92
35.9 67 1 28.08
36.7 45 1 8.52 1
25.7 46 O 18.88 3
23.3 29 O 23.09
Majority voting technique
Now, we need to apply the majority voting technique to decide the resulting label from the new example.
Here the 1st and 2nd nearest neighbors have target label 1 and the 3rd nearest neighbor has target label 0.
Target label 1 has the majority. Hence the new example is classified as 1, That is the diabetic patient has
Sugar.
Test Example BMI=43.6, Age=40, Sugar=1
Hierarchical clustering
Hierarchical clustering is a clustering analysis technique that places data points into hierarchical
groups, or hierarchical clusters, based on similarities.
How Does Hierarchical Clustering Work?
1. The hierarchical clustering process involves finding the two data points closest to
each other and combining the two most similar ones.
2. After repeating this process until all data points are grouped into clusters, the end
result is a hierarchical tree of related groups known as a dendrogram.
3. Hierarchical clustering is based on the core idea that similar objects lie nearby to
each other in a data space while others lie far away.
4. It uses distance functions to find nearby data points and group the data points
together as clusters.
1. Agglomerative Clustering
1. Agglomerative clustering is a bottom-up approach.
2. It starts clustering by treating the individual data points as a single cluster, then it is
merged continuously based on similarity until it forms one big cluster containing all objects.
3. It is good at identifying small clusters.
4. Example:
Imagine you have four fruits with different weights: an apple (100g), a banana (120g), a cherry
(50g), and a grape (30g). Hierarchical clustering starts by treating each fruit as its own group.
● It then merges the closest groups based on their weights.
● First, the cherry and grape are grouped together because they are the lightest.
● Next, the apple and banana are grouped together.
Finally, all the fruits are merged into one large group, showing how hierarchical clustering progressively
combines the most similar data points.
Workflow for the Hierarchical Agglomerative Clustering
1. Start with individual points: Each data point is its own cluster. For
example if you have 5 data points you start with 5 clusters each
containing just one data point.
2. Calculate distances between clusters: Calculate the distance
between every pair of clusters. Initially since each cluster has one point
this is the distance between the two data points.
3. Merge the closest clusters: Identify the two clusters with the
smallest distance and merge them into a single cluster.
4. Update distance matrix: After merging you now have one less
cluster. Recalculate the distances between the new cluster and the
remaining clusters.
5. Repeat steps 3 and 4: Keep merging the closest clusters and
updating the distance matrix until you have only one cluster left.
6. Create a dendrogram: As the process continues you can visualize the
merging of clusters using a tree-like diagram called a dendrogram. It
shows the hierarchy of how clusters are merged.
2. Dimensionality Reduction
This is a technique used in machine learning to reduce the number of features (variables) in a
data set, while keeping the most important information. This is achieved by removing redundant
or irrelevant features, allowing machine learning models to be trained faster and with better
results.
Reinforcement learning
This is a technique in which an agent interacts with its environment and receives rewards or punishments based on its
actions. The agent learns through exploration and experimentation, striving to maximize the total reward received over
time. This is accomplished by selecting actions that the agent believes are most likely to produce a reward.
Reinforcement Learning
What is Difference Between Supervised, Unsupervised and Reinforcement Learning