11Linear models
22#############
33
4- The linear models module contains several popular instances of the generalized linear model (GLM).
5-
64.. raw :: html
75
8- <h2 >Linear Regression</h2 >
9-
10- The simple linear regression model is
11-
12- .. math ::
13-
14- \mathbf {y} = \mathbf {bX} + \mathbf {\epsilon }
15-
16- where
17-
18- .. math ::
19-
20- \epsilon \sim \mathcal {N}(0 , \sigma ^2 I)
6+ <h2 >Ordinary and Weighted Linear Least Squares</h2 >
217
22- In probabilistic terms this corresponds to
8+ In weighted linear least-squares regression (WLS), a real-valued target
9+ :math: `y_i`, is modeled as a linear combination of covariates
10+ :math: `\mathbf {x}_i` and model coefficients **b **:
2311
2412.. math ::
2513
26- \mathbf {y} - \mathbf {bX} &\sim \mathcal {N}(0 , \sigma ^2 I) \\
27- \mathbf {y} \mid \mathbf {X}, \mathbf {b} &\sim \mathcal {N}(\mathbf {bX}, \sigma ^2 I)
14+ y_i = \mathbf {b}^\top \mathbf {x}_i + \epsilon _i
2815
29- The loss for the model is simply the squared error between the model
30- predictions and the true values:
16+ In the above equation, :math: `\epsilon _i \sim \mathcal {N}(0 , \sigma _i^2 )` is a
17+ normally distributed error term with variance :math: `\sigma _i^2 `. Ordinary
18+ least squares (OLS) is a special case of this model where the variance is fixed
19+ across all examples, i.e., :math: `\sigma _i = \sigma _j \ \forall i,j`. The
20+ maximum likelihood model parameters, :math: `\hat {\mathbf {b}}_{WLS}`, are those
21+ that minimize the weighted squared error between the model predictions and the
22+ true values:
3123
3224.. math ::
3325
34- \mathcal {L} = ||\mathbf {y} - \mathbf {bX}||_2 ^2
26+ \mathcal {L} = ||\mathbf {W}^{ 0.5 }( \mathbf { y} - \mathbf {bX}) ||_2 ^2
3527
36- The MLE for the model parameters **b ** can be computed in closed form via
37- the normal equation:
28+ where :math: `\mathbf {W}` is a diagonal matrix of the example weights. In OLS,
29+ :math: `\mathbf {W}` is the identity matrix. The maximum likelihood estimate for
30+ the model parameters can be computed in closed-form using the normal equations:
3831
3932.. math ::
4033
41- \mathbf {b}_{MLE} = (\mathbf {X}^\top \mathbf {X})^{-1 } \mathbf {X}^\top \mathbf {y}
34+ \hat {\mathbf {b}}_{WLS} =
35+ (\mathbf {X}^\top \mathbf {WX})^{-1 } \mathbf {X}^\top \mathbf {Wy}
4236
43- where :math: `(\mathbf {X}^\top \mathbf {X})^{-1 } \mathbf {X}^\top ` is known
44- as the pseudoinverse / Moore-Penrose inverse.
4537
4638 **Models **
4739
@@ -55,19 +47,14 @@ Ridge regression uses the same simple linear regression model but adds an
5547additional penalty on the `L2 `-norm of the coefficients to the loss function.
5648This is sometimes known as Tikhonov regularization.
5749
58- In particular, the ridge model is still simply
50+ In particular, the ridge model is the same as the OLS model:
5951
6052.. math ::
6153
6254 \mathbf {y} = \mathbf {bX} + \mathbf {\epsilon }
6355
64- where
65-
66- .. math ::
67-
68- \epsilon \sim \mathcal {N}(0 , \sigma ^2 I)
69-
70- except now the error for the model is calcualted as
56+ where :math: `\epsilon \sim \mathcal {N}(0 , \sigma ^2 I)`, except now the error
57+ for the model is calculated as
7158
7259.. math ::
7360
@@ -78,7 +65,8 @@ the adjusted normal equation:
7865
7966.. math ::
8067
81- \mathbf {b}_{MLE} = (\mathbf {X}^\top \mathbf {X} + \alpha I)^{-1 } \mathbf {X}^\top \mathbf {y}
68+ \hat {\mathbf {b}}_{Ridge} =
69+ (\mathbf {X}^\top \mathbf {X} + \alpha I)^{-1 } \mathbf {X}^\top \mathbf {y}
8270
8371 where :math: `(\mathbf {X}^\top \mathbf {X} + \alpha I)^{-1 }
8472\mathbf {X}^\top ` is the pseudoinverse / Moore-Penrose inverse adjusted for
@@ -235,6 +223,60 @@ We can also compute a closed-form solution for the posterior predictive distribu
235223
236224- :class: `~numpy_ml.linear_models.BayesianLinearRegressionUnknownVariance `
237225
226+ .. raw :: html
227+
228+ <h2 >Naive Bayes Classifier</h2 >
229+
230+ The naive Bayes model assumes the features of a training example
231+ :math: `\mathbf {x}` are mutually independent given the example label :math: `y`:
232+
233+ .. math ::
234+
235+ P(\mathbf {x}_i \mid y_i) = \prod _{j=1 }^M P(x_{i,j} \mid y_i)
236+
237+ where :math: `M` is the rank of the :math: `i^{th}` example :math: `\mathbf {x}_i`
238+ and :math: `y_i` is the label associated with the :math: `i^{th}` example.
239+
240+ Combining this conditional independence assumption with a simple application of
241+ Bayes' theorem gives the naive Bayes classification rule:
242+
243+ .. math ::
244+
245+ \hat {y} &= \arg \max _y P(y \mid \mathbf {x}) \\
246+ &= \arg \max _y P(y) P(\mathbf {x} \mid y) \\
247+ &= \arg \max _y P(y) \prod _{j=1 }^M P(x_j \mid y)
248+
249+ The prior class probability :math: `P(y)` can be specified in advance or
250+ estimated empirically from the training data.
251+
252+ **Models **
253+
254+ - :class: `~numpy_ml.linear_models.GaussianNBClassifier `
255+
256+ .. raw :: html
257+
258+ <h2 >Generalized Linear Model</h2 >
259+
260+ The generalized linear model (GLM) assumes that each target/dependent variable
261+ :math: `y_i` in target vector :math: `\mathbf {y} = (y_1 , \ldots , y_n)`, has been
262+ drawn independently from a pre-specified distribution in the exponential family
263+ with unknown mean :math: `\mu _i`. The GLM models a (one-to-one, continuous,
264+ differentiable) function, *g *, of this mean value as a linear combination of
265+ the model parameters :math: `\mathbf {b}` and observed covariates,
266+ :math: `\mathbf {x}_i` :
267+
268+ .. math ::
269+
270+ g(\mathbb {E}[y_i \mid \mathbf {x}_i]) =
271+ g(\mu _i) = \mathbf {b}^\top \mathbf {x}_i
272+
273+ where *g * is known as the link function. The choice of link function is
274+ informed by the instance of the exponential family the target is drawn from.
275+
276+ **Models **
277+
278+ - :class: `~numpy_ml.linear_models.GeneralizedLinearModel `
279+
238280.. toctree ::
239281 :maxdepth: 2
240282 :hidden:
0 commit comments