LOGISTIC
REGRESSION
1
What is logistic regression?
Logit for short, a specialized form of regression
used when the dependent variable is dichotomous and categorical while the independent
variable(s) could be any type.
Dependent variable is binary (yes/no, Male/Female )
outcome.
Independent variables are continuous interval
Examples:
Relation of weight and BP to 10 year risk of death
Relation of CD4 count to 1 year risk of AIDS diagnosis
2
Logistic Regression
Form of regression that allows the prediction of
discrete variables by a mix of continuous and
discrete/nominal predictors.
Addresses the same questions that discriminant
function analysis and multiple regression do but
with no distributional assumptions on the
predictors (the predictors do not have to be
normally distributed, linearly related or have
equal variance in each group)
Does not assume a linear relationship between
3
Why we use logistic ?
No assumptions about the distributions of the predictor variables.
Predictors do not have to be normally distributed
Does not have to be linearly related.
When equal variances , covariance doesn't exist across the
groups.
Predictor variables is not parametric and there is no
homoscedasticity. E.g variance of dependent and independent
variables are not equal. 4
DECISION PROCESS
Stage 1:
Objectives Of logistic regression
Identify the independent variable that impact
in the dependent variable
Establishing classification system based on the
logistic model for determining the group
membership
Types of logistic regression
BINARY LOGISTIC REGRESSION
It is used when the dependent variable is dichotomous.
MULTINOMIAL LOGISTIC REGRESSION
It is used when the dependent or outcomes variable has more than
two categories.
Linear Regression
Independent Variable Dependent Variable
7
Logistic Regression
Independent Variable Dependent Variable
8
Binary logistic regression
expression
BINAR
Y
Y = Dependent Variables
ß˚ = Constant
ß1 = Coefficient of
variable X1
X1 = Independent
Variables
SAMPLE SIZE
Verysmall samples have so much
sampling errors.
Verylarge sample size decreases the
chances of errors.
Logistic
requires larger sample size
than multiple regression.
Hosmer and Lamshow recommended
sample size greater than 400.
SAMPLE SIZE PER CATEGORY OF THE INDEPENDENT
VARIABLE
The recommended sample size for each
group is at least 10 observations per
estimated parameters.
Estimation of logistic regression
model assessing overall fit
Logistic relationship describe earlier in both estimating the logistic model and
establishing the relationship between the dependent and independent variables.
Result is a unique transformation of dependent variables which impacts not only
the estimation process but also the resulting coefficients of independent variables.
Transforming the dependent variable
S-shaped
Range (0-1)
Description of the data
The data used to conduct logistic regression is from a survey of 30
homeowners conducted by an electricity company about an offer of
roof solar panels with a 50% subsidy from the state government as
part of the state’s environmental policy.
The variables are:
IVs: household income measured in units of a thousand dollars
age of householder
monthly mortgage
size of family household
DV: whether the householder would take or decline the offer. Take
the offer was coded as 1 and decline the offer was coded as 0.
WHAT IS THE RESEARCH QUESTION?
To
determine whether household income and
monthly mortgage will predict taking or
declining the solar panel offer
IndependentVariables: household income and
monthly mortgage
Dependent Variables: Take the offer or decline
the offer
Two hypotheses to be tested
There are two hypotheses to test in relation to the overall fit of the
model:
H0: The model is a good fitting model
H1: The model is not a good fitting model (i.e. the predictors have a
significant effect)