Logistic Regression Data science and AI Certification Course Visit: Learnbay.co
Introduction Logistic Regression is a classification algorithm. It is used to predict a binary outcome (1 / 0, Yes / No, True / False) given a set of independent variables. You can also think of logistic regression as a special case of linear regression when the outcome variable is categorical, where we are using log of odds as dependent variable. In simple words, it predicts the probability of occurrence of an event by fitting data to a logitfunction. Visit: Learnbay.co
Logit/Sigmoid function The logistic function, also called the sigmoid function was developed by statisticians to describe properties of population growth in ecology, rising quickly and maxing out at the carrying capacity of the environment. It’s an S-shaped curve that can take any real-valued number and map it into a value between 0 and 1, but never exactly at those limits. 1 / (1 + e^-value) Visit: Learnbay.co
Where e is the base of the natural logarithms (Euler’s number or the EXP() function in your spreadsheet) and value is the actual numerical value that you want to transform. Below is a plot of the numbers between -5 and 5 transformed into the range 0 and 1 using the logistic function. Visit: Learnbay.co
Logit function Visit: Learnbay.co
Generalized Linear Model g(E(y)) = α + βx1 +γx2 Here, g() is the linkfunction, E(y) is the expectation oftarget variable and α + βx1 + γx2 is the linear predictor ( α,β,γ to be predicted). The role of link function is to ‘link’ the expectation of y to linear predictor. Visit: Learnbay.co
Problem statement We are provided a sample of 1000 customers. We need to predict the probability whether a customer will buy (y) a particular magazine or not. As you can see, we’ve a categorical outcomevariable, we’ll use logistic regression g(y) = βo +β(Age) ----(a) Visit: Learnbay.co
Derivation In logistic regression, we are only concerned about the probability of outcome dependent variable ( success or failure). As described above, g() is the link function. This function is established using two things: Probability of Success(p) and Probability of Failure(1-p). p should meet following criteria: 1. It must always be positive (since p >= 0) 2. It must always be less than equals to 1 (since p <= 1) Visit: Learnbay.co
Since probability must always be positive, we’ll put the linear equation in exponential form. For any value of slope and dependent variable, exponent of this equation will never benegative. p = exp(βo + β(Age)) = e^(βo+ β(Age)) ------- (b) Visit: Learnbay.co
To make the probability less than 1, we must divide p by a number greater than p. This can simply be doneby: p = exp(βo + β(Age)) / exp(βo + β(Age)) + 1 = e^(βo + β(Age)) / e^(βo + β (Age)) + 1 ----- (c) Using (a), (b) and (c), we can redefine the probability as: Visit: Learnbay.co
p = e^y/ 1 + e^y --- (d) where p is the probability of success. This (d) is the Logit Function If p is the probability of success, 1-p will be the probability of failure which can be written as: q = 1 - p = 1 - (e^y/ 1+ e^y) --- (e) Visit: Learnbay.co
On dividing, (d) / (e), we get, After taking log on both side, we get, log(p/1-p) is the link function. Logarithmic transformation on the outcome variable allows us to model a non-linear association in a linear way. Visit: Learnbay.co
Final equation After substituting value of y,we’llget: This is the equation used in Logistic Regression. Here (p/1-p) is the odd ratio. A typical logistic model plot is shown next. You can see probability never goes below 0 and above 1. Visit: Learnbay.co
Logit function graph Visit: Learnbay.co
Logistic regression models the probability of the default class (e.g. the first class). For example, if we are modeling people’s Gender as male or female from their height, then the first class could be male and the logistic regression model could be written as the probability of male given a person’s height, or moreformally: P(Gender=male|height) Written another way, we are modeling the probability that an input (X) belongs to the default class (Y=1), we can write this formallyas: P(X) = P(Y=1|X) Visit: Learnbay.co
ln(p(X) / 1 – p(X)) = b0 + b1 * X This equation is useful because we can see that the calculation of the output on the right is linear again (just like linear regression), and the input on the left is a log of the probability of the default class. This ratio on the left is called the odds of the default class Visit: Learnbay.co
Learning Logistic model The coefficients (Beta values b) of the logistic regression algorithm must be estimated from your training data. This is done using maximum-likelihood estimation. Maximum-likelihood estimation is a common learning algorithm used by a variety of machine learning algorithms, although it does make assumptions about the distribution of your data Visit: Learnbay.co
The best coefficients would result in a model that would predict a value very close to 1 (e.g. male) for the default class and a value very close to 0 (e.g. female) for the other class. The intuition for maximum-likelihood for logistic regression is that a search procedure seeks values for the coefficients (Beta values) that minimize the error in the probabilities predicted by the model to those in the data (e.g. probability of 1 if the data is the primary class). Visit: Learnbay.co
Let’s say we have a model that can predict whether a person is male or female based on their height (completely fictitious). Given a height of 150cm is the person male or female. We have learned the coefficients of b0 = -100 and b1 = 0.6. Using the equation above we can calculate the probability of male given a height of 150cm or more formally P(male|height=150). We will use EXP() for e, because that is what you can use if you type this example into yourspreadsheet: y = e^(b0 + b1*X) / (1 + e^(b0 + b1*X)) y = exp(-100 + 0.6*150) / (1 + EXP(-100 + 0.6*X)) y = 0.0000453978687 Visit: Learnbay.co
we can snap the probabilities to a binary class value, for example: 1 if p(male) <0.5 2 if p(male) >=0.5 Visit: Learnbay.co

Logistic regression, machine learning algorithms

  • 1.
    Logistic Regression Data scienceand AI Certification Course Visit: Learnbay.co
  • 2.
    Introduction Logistic Regression isa classification algorithm. It is used to predict a binary outcome (1 / 0, Yes / No, True / False) given a set of independent variables. You can also think of logistic regression as a special case of linear regression when the outcome variable is categorical, where we are using log of odds as dependent variable. In simple words, it predicts the probability of occurrence of an event by fitting data to a logitfunction. Visit: Learnbay.co
  • 3.
    Logit/Sigmoid function The logisticfunction, also called the sigmoid function was developed by statisticians to describe properties of population growth in ecology, rising quickly and maxing out at the carrying capacity of the environment. It’s an S-shaped curve that can take any real-valued number and map it into a value between 0 and 1, but never exactly at those limits. 1 / (1 + e^-value) Visit: Learnbay.co
  • 4.
    Where e isthe base of the natural logarithms (Euler’s number or the EXP() function in your spreadsheet) and value is the actual numerical value that you want to transform. Below is a plot of the numbers between -5 and 5 transformed into the range 0 and 1 using the logistic function. Visit: Learnbay.co
  • 5.
  • 6.
    Generalized Linear Model g(E(y))= α + βx1 +γx2 Here, g() is the linkfunction, E(y) is the expectation oftarget variable and α + βx1 + γx2 is the linear predictor ( α,β,γ to be predicted). The role of link function is to ‘link’ the expectation of y to linear predictor. Visit: Learnbay.co
  • 7.
    Problem statement We areprovided a sample of 1000 customers. We need to predict the probability whether a customer will buy (y) a particular magazine or not. As you can see, we’ve a categorical outcomevariable, we’ll use logistic regression g(y) = βo +β(Age) ----(a) Visit: Learnbay.co
  • 8.
    Derivation In logistic regression,we are only concerned about the probability of outcome dependent variable ( success or failure). As described above, g() is the link function. This function is established using two things: Probability of Success(p) and Probability of Failure(1-p). p should meet following criteria: 1. It must always be positive (since p >= 0) 2. It must always be less than equals to 1 (since p <= 1) Visit: Learnbay.co
  • 9.
    Since probability mustalways be positive, we’ll put the linear equation in exponential form. For any value of slope and dependent variable, exponent of this equation will never benegative. p = exp(βo + β(Age)) = e^(βo+ β(Age)) ------- (b) Visit: Learnbay.co
  • 10.
    To make theprobability less than 1, we must divide p by a number greater than p. This can simply be doneby: p = exp(βo + β(Age)) / exp(βo + β(Age)) + 1 = e^(βo + β(Age)) / e^(βo + β (Age)) + 1 ----- (c) Using (a), (b) and (c), we can redefine the probability as: Visit: Learnbay.co
  • 11.
    p = e^y/1 + e^y --- (d) where p is the probability of success. This (d) is the Logit Function If p is the probability of success, 1-p will be the probability of failure which can be written as: q = 1 - p = 1 - (e^y/ 1+ e^y) --- (e) Visit: Learnbay.co
  • 12.
    On dividing, (d)/ (e), we get, After taking log on both side, we get, log(p/1-p) is the link function. Logarithmic transformation on the outcome variable allows us to model a non-linear association in a linear way. Visit: Learnbay.co
  • 13.
    Final equation After substitutingvalue of y,we’llget: This is the equation used in Logistic Regression. Here (p/1-p) is the odd ratio. A typical logistic model plot is shown next. You can see probability never goes below 0 and above 1. Visit: Learnbay.co
  • 14.
  • 15.
    Logistic regression modelsthe probability of the default class (e.g. the first class). For example, if we are modeling people’s Gender as male or female from their height, then the first class could be male and the logistic regression model could be written as the probability of male given a person’s height, or moreformally: P(Gender=male|height) Written another way, we are modeling the probability that an input (X) belongs to the default class (Y=1), we can write this formallyas: P(X) = P(Y=1|X) Visit: Learnbay.co
  • 16.
    ln(p(X) / 1– p(X)) = b0 + b1 * X This equation is useful because we can see that the calculation of the output on the right is linear again (just like linear regression), and the input on the left is a log of the probability of the default class. This ratio on the left is called the odds of the default class Visit: Learnbay.co
  • 17.
    Learning Logistic model Thecoefficients (Beta values b) of the logistic regression algorithm must be estimated from your training data. This is done using maximum-likelihood estimation. Maximum-likelihood estimation is a common learning algorithm used by a variety of machine learning algorithms, although it does make assumptions about the distribution of your data Visit: Learnbay.co
  • 18.
    The best coefficientswould result in a model that would predict a value very close to 1 (e.g. male) for the default class and a value very close to 0 (e.g. female) for the other class. The intuition for maximum-likelihood for logistic regression is that a search procedure seeks values for the coefficients (Beta values) that minimize the error in the probabilities predicted by the model to those in the data (e.g. probability of 1 if the data is the primary class). Visit: Learnbay.co
  • 19.
    Let’s say wehave a model that can predict whether a person is male or female based on their height (completely fictitious). Given a height of 150cm is the person male or female. We have learned the coefficients of b0 = -100 and b1 = 0.6. Using the equation above we can calculate the probability of male given a height of 150cm or more formally P(male|height=150). We will use EXP() for e, because that is what you can use if you type this example into yourspreadsheet: y = e^(b0 + b1*X) / (1 + e^(b0 + b1*X)) y = exp(-100 + 0.6*150) / (1 + EXP(-100 + 0.6*X)) y = 0.0000453978687 Visit: Learnbay.co
  • 20.
    we can snapthe probabilities to a binary class value, for example: 1 if p(male) <0.5 2 if p(male) >=0.5 Visit: Learnbay.co