Skip to content

Commit 672dd2b

Browse files
committed
update documentation for linear models
1 parent 96c70d2 commit 672dd2b

File tree

6 files changed

+125
-66
lines changed

6 files changed

+125
-66
lines changed

docs/conf.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@
2424
# -- Project information -----------------------------------------------------
2525

2626
project = "numpy-ml"
27-
copyright = "2020, David Bourgin"
27+
copyright = "2022, David Bourgin"
2828
author = "David Bourgin"
2929

3030
# The short X.Y version

docs/numpy_ml.linear_models.lm.rst

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -37,3 +37,19 @@
3737
:members:
3838
:undoc-members:
3939
:inherited-members:
40+
41+
``GaussianNBClassifier``
42+
-----------------------------------------
43+
44+
.. autoclass:: numpy_ml.linear_models.GaussianNBClassifier
45+
:members:
46+
:undoc-members:
47+
:inherited-members:
48+
49+
``GeneralizedLinearModel``
50+
-----------------------------------------
51+
52+
.. autoclass:: numpy_ml.linear_models.GeneralizedLinearModel
53+
:members:
54+
:undoc-members:
55+
:inherited-members:

docs/numpy_ml.linear_models.rst

Lines changed: 77 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -1,47 +1,39 @@
11
Linear models
22
#############
33

4-
The linear models module contains several popular instances of the generalized linear model (GLM).
5-
64
.. raw:: html
75

8-
<h2>Linear Regression</h2>
9-
10-
The simple linear regression model is
11-
12-
.. math::
13-
14-
\mathbf{y} = \mathbf{bX} + \mathbf{\epsilon}
15-
16-
where
17-
18-
.. math::
19-
20-
\epsilon \sim \mathcal{N}(0, \sigma^2 I)
6+
<h2>Ordinary and Weighted Linear Least Squares</h2>
217

22-
In probabilistic terms this corresponds to
8+
In weighted linear least-squares regression (WLS), a real-valued target
9+
:math:`y_i`, is modeled as a linear combination of covariates
10+
:math:`\mathbf{x}_i` and model coefficients **b**:
2311

2412
.. math::
2513
26-
\mathbf{y} - \mathbf{bX} &\sim \mathcal{N}(0, \sigma^2 I) \\
27-
\mathbf{y} \mid \mathbf{X}, \mathbf{b} &\sim \mathcal{N}(\mathbf{bX}, \sigma^2 I)
14+
y_i = \mathbf{b}^\top \mathbf{x}_i + \epsilon_i
2815
29-
The loss for the model is simply the squared error between the model
30-
predictions and the true values:
16+
In the above equation, :math:`\epsilon_i \sim \mathcal{N}(0, \sigma_i^2)` is a
17+
normally distributed error term with variance :math:`\sigma_i^2`. Ordinary
18+
least squares (OLS) is a special case of this model where the variance is fixed
19+
across all examples, i.e., :math:`\sigma_i = \sigma_j \ \forall i,j`. The
20+
maximum likelihood model parameters, :math:`\hat{\mathbf{b}}_{WLS}`, are those
21+
that minimize the weighted squared error between the model predictions and the
22+
true values:
3123

3224
.. math::
3325
34-
\mathcal{L} = ||\mathbf{y} - \mathbf{bX}||_2^2
26+
\mathcal{L} = ||\mathbf{W}^{0.5}(\mathbf{y} - \mathbf{bX})||_2^2
3527
36-
The MLE for the model parameters **b** can be computed in closed form via
37-
the normal equation:
28+
where :math:`\mathbf{W}` is a diagonal matrix of the example weights. In OLS,
29+
:math:`\mathbf{W}` is the identity matrix. The maximum likelihood estimate for
30+
the model parameters can be computed in closed-form using the normal equations:
3831

3932
.. math::
4033
41-
\mathbf{b}_{MLE} = (\mathbf{X}^\top \mathbf{X})^{-1} \mathbf{X}^\top \mathbf{y}
34+
\hat{\mathbf{b}}_{WLS} =
35+
(\mathbf{X}^\top \mathbf{WX})^{-1} \mathbf{X}^\top \mathbf{Wy}
4236
43-
where :math:`(\mathbf{X}^\top \mathbf{X})^{-1} \mathbf{X}^\top` is known
44-
as the pseudoinverse / Moore-Penrose inverse.
4537
4638
**Models**
4739

@@ -55,19 +47,14 @@ Ridge regression uses the same simple linear regression model but adds an
5547
additional penalty on the `L2`-norm of the coefficients to the loss function.
5648
This is sometimes known as Tikhonov regularization.
5749

58-
In particular, the ridge model is still simply
50+
In particular, the ridge model is the same as the OLS model:
5951

6052
.. math::
6153
6254
\mathbf{y} = \mathbf{bX} + \mathbf{\epsilon}
6355
64-
where
65-
66-
.. math::
67-
68-
\epsilon \sim \mathcal{N}(0, \sigma^2 I)
69-
70-
except now the error for the model is calcualted as
56+
where :math:`\epsilon \sim \mathcal{N}(0, \sigma^2 I)`, except now the error
57+
for the model is calculated as
7158

7259
.. math::
7360
@@ -78,7 +65,8 @@ the adjusted normal equation:
7865

7966
.. math::
8067
81-
\mathbf{b}_{MLE} = (\mathbf{X}^\top \mathbf{X} + \alpha I)^{-1} \mathbf{X}^\top \mathbf{y}
68+
\hat{\mathbf{b}}_{Ridge} =
69+
(\mathbf{X}^\top \mathbf{X} + \alpha I)^{-1} \mathbf{X}^\top \mathbf{y}
8270
8371
where :math:`(\mathbf{X}^\top \mathbf{X} + \alpha I)^{-1}
8472
\mathbf{X}^\top` is the pseudoinverse / Moore-Penrose inverse adjusted for
@@ -235,6 +223,60 @@ We can also compute a closed-form solution for the posterior predictive distribu
235223

236224
- :class:`~numpy_ml.linear_models.BayesianLinearRegressionUnknownVariance`
237225

226+
.. raw:: html
227+
228+
<h2>Naive Bayes Classifier</h2>
229+
230+
The naive Bayes model assumes the features of a training example
231+
:math:`\mathbf{x}` are mutually independent given the example label :math:`y`:
232+
233+
.. math::
234+
235+
P(\mathbf{x}_i \mid y_i) = \prod_{j=1}^M P(x_{i,j} \mid y_i)
236+
237+
where :math:`M` is the rank of the :math:`i^{th}` example :math:`\mathbf{x}_i`
238+
and :math:`y_i` is the label associated with the :math:`i^{th}` example.
239+
240+
Combining this conditional independence assumption with a simple application of
241+
Bayes' theorem gives the naive Bayes classification rule:
242+
243+
.. math::
244+
245+
\hat{y} &= \arg \max_y P(y \mid \mathbf{x}) \\
246+
&= \arg \max_y P(y) P(\mathbf{x} \mid y) \\
247+
&= \arg \max_y P(y) \prod_{j=1}^M P(x_j \mid y)
248+
249+
The prior class probability :math:`P(y)` can be specified in advance or
250+
estimated empirically from the training data.
251+
252+
**Models**
253+
254+
- :class:`~numpy_ml.linear_models.GaussianNBClassifier`
255+
256+
.. raw:: html
257+
258+
<h2>Generalized Linear Model</h2>
259+
260+
The generalized linear model (GLM) assumes that each target/dependent variable
261+
:math:`y_i` in target vector :math:`\mathbf{y} = (y_1, \ldots, y_n)`, has been
262+
drawn independently from a pre-specified distribution in the exponential family
263+
with unknown mean :math:`\mu_i`. The GLM models a (one-to-one, continuous,
264+
differentiable) function, *g*, of this mean value as a linear combination of
265+
the model parameters :math:`\mathbf{b}` and observed covariates,
266+
:math:`\mathbf{x}_i` :
267+
268+
.. math::
269+
270+
g(\mathbb{E}[y_i \mid \mathbf{x}_i]) =
271+
g(\mu_i) = \mathbf{b}^\top \mathbf{x}_i
272+
273+
where *g* is known as the link function. The choice of link function is
274+
informed by the instance of the exponential family the target is drawn from.
275+
276+
**Models**
277+
278+
- :class:`~numpy_ml.linear_models.GeneralizedLinearModel`
279+
238280
.. toctree::
239281
:maxdepth: 2
240282
:hidden:

numpy_ml/linear_models/glm.py

Lines changed: 18 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
"""A module for the generalized linear model."""
22
import numpy as np
33

4-
from numpy_ml.linear_models import LinearRegression
4+
from numpy_ml.linear_models.linear_regression import LinearRegression
55

66
eps = np.finfo(float).eps
77

@@ -48,20 +48,20 @@
4848
class GeneralizedLinearModel:
4949
def __init__(self, link, fit_intercept=True, tol=1e-5, max_iter=100):
5050
r"""
51-
A generalized linear model [1]_ [2]_ with maximum likelihood fit via
52-
iteratively reweighted least squares (IRLS) [3]_.
51+
A generalized linear model with maximum likelihood fit via
52+
iteratively reweighted least squares (IRLS).
5353
5454
Notes
5555
-----
56-
The generalized linear model (GLM) assumes that each target/dependent
56+
The generalized linear model (GLM) [a]_ [b]_ assumes that each target/dependent
5757
variable :math:`y_i` in target vector :math:`\mathbf{y} = (y_1, \ldots,
5858
y_n)`, has been drawn independently from a pre-specified distribution
59-
in the exponential family [5]_ with unknown mean :math:`\mu_i`. The GLM
59+
in the exponential family [e]_ with unknown mean :math:`\mu_i`. The GLM
6060
models a (one-to-one, continuous, differentiable) function, *g*, of
6161
this mean value as a linear combination of the model parameters
6262
:math:`\mathbf{b}` and observed covariates, :math:`\mathbf{x}_i`:
6363
64-
.. math:
64+
.. math::
6565
6666
g(\mathbb{E}[y_i \mid \mathbf{x}_i]) =
6767
g(\mu_i) = \mathbf{b}^\top \mathbf{x}_i
@@ -70,31 +70,31 @@ def __init__(self, link, fit_intercept=True, tol=1e-5, max_iter=100):
7070
choice of link function is informed by the instance of the exponential
7171
family the target is drawn from. Common examples:
7272
73-
.. csv-table:: Distributions and their canonical link functions
74-
:header: "Distribution", "Link Name", "Description"
75-
:widths: auto
73+
.. csv-table::
74+
:header: "Distribution", "Link", "Formula"
75+
:widths: 25, 20, 30
7676
7777
"Normal", "Identity", ":math:`g(x) = x`"
7878
"Bernoulli", "Logit", ":math:`g(x) = \log(x) - \log(1 - x)`"
7979
"Binomial", "Logit", ":math:`g(x) = \log(x) - \log(n - x)`"
8080
"Poisson", "Log", ":math:`g(x) = \log(x)`"
8181
82-
An iteratively re-weighted least squares (IRLS) algorithm [3]_ can be
82+
An iteratively re-weighted least squares (IRLS) algorithm [c]_ can be
8383
employed to find the maximum likelihood estimate for the model
8484
parameters :math:`\beta` in any instance of the generalized linear
85-
model. IRLS is equivalent to Fisher scoring [1]_ [4]_, which itself is
85+
model. IRLS is equivalent to Fisher scoring [d]_, which itself is
8686
a slight modification of classic Newton-Raphson for finding the zeros
8787
of the first derivative of the model log-likelihood.
8888
8989
References
9090
----------
91-
.. [1] Nelder, J., & Wedderburn, R. (1972). "Generalized linear
92-
models," _Journal of the Royal Statistical Society, Series A
93-
(General),_ 135(3): 370–384.
94-
.. [2] https://en.wikipedia.org/wiki/Generalized_linear_model
95-
.. [3] https://en.wikipedia.org/wiki/Iteratively_reweighted_least_squares
96-
.. [4] https://en.wikipedia.org/wiki/Scoring_algorithm
97-
.. [5] https://en.wikipedia.org/wiki/Exponential_family
91+
.. [a] Nelder, J., & Wedderburn, R. (1972). Generalized linear
92+
models. *Journal of the Royal Statistical Society, Series A
93+
(General), 135(3)*: 370–384.
94+
.. [b] https://en.wikipedia.org/wiki/Generalized_linear_model
95+
.. [c] https://en.wikipedia.org/wiki/Iteratively_reweighted_least_squares
96+
.. [d] https://en.wikipedia.org/wiki/Scoring_algorithm
97+
.. [e] https://en.wikipedia.org/wiki/Exponential_family
9898
9999
Parameters
100100
----------

numpy_ml/linear_models/linear_regression.py

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -66,7 +66,7 @@ def update(self, X, y, weights=None):
6666
6767
Notes
6868
-----
69-
The recursive least-squares algorithm [1]_ [2]_ is used to efficiently
69+
The recursive least-squares algorithm [3]_ [4]_ is used to efficiently
7070
update the regression parameters as new examples become available. For
7171
a single new example :math:`(\mathbf{x}_{t+1}, \mathbf{y}_{t+1})`, the
7272
parameter updates are
@@ -84,20 +84,20 @@ def update(self, X, y, weights=None):
8484
examples observed from timestep 1 to *t*.
8585
8686
In the single-example case, the RLS algorithm uses the Sherman-Morrison
87-
formula [3]_ to avoid re-inverting the covariance matrix on each new
87+
formula [5]_ to avoid re-inverting the covariance matrix on each new
8888
update. In the multi-example case (i.e., where :math:`\mathbf{X}_{t+1}`
8989
and :math:`\mathbf{y}_{t+1}` are matrices of `N` examples each), we use
90-
the generalized Woodbury matrix identity [4]_ to update the inverse
90+
the generalized Woodbury matrix identity [6]_ to update the inverse
9191
covariance. This comes at a performance cost, but is still more
9292
performant than doing multiple single-example updates if *N* is large.
9393
9494
References
9595
----------
96-
.. [1] Gauss, C. F. (1821) _Theoria combinationis observationum
97-
erroribus minimis obnoxiae_, Werke, 4. Gottinge
98-
.. [2] https://en.wikipedia.org/wiki/Recursive_least_squares_filter
99-
.. [3] https://en.wikipedia.org/wiki/Sherman%E2%80%93Morrison_formula
100-
.. [4] https://en.wikipedia.org/wiki/Woodbury_matrix_identity
96+
.. [3] Gauss, C. F. (1821) *Theoria combinationis observationum
97+
erroribus minimis obnoxiae*, Werke, 4. Gottinge
98+
.. [4] https://en.wikipedia.org/wiki/Recursive_least_squares_filter
99+
.. [5] https://en.wikipedia.org/wiki/Sherman%E2%80%93Morrison_formula
100+
.. [6] https://en.wikipedia.org/wiki/Woodbury_matrix_identity
101101
102102
Parameters
103103
----------

numpy_ml/linear_models/naive_bayes.py

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,4 @@
1+
"""A module for naive Bayes classifiers"""
12
import numpy as np
23

34

@@ -10,14 +11,15 @@ def __init__(self, eps=1e-6):
1011
-----
1112
The naive Bayes model assumes the features of each training example
1213
:math:`\mathbf{x}` are mutually independent given the example label
13-
:math:`y`:
14+
*y*:
1415
1516
.. math::
1617
1718
P(\mathbf{x}_i \mid y_i) = \prod_{j=1}^M P(x_{i,j} \mid y_i)
1819
19-
where :math:`M` is the rank of the `i`th example :math:`\mathbf{x}_i`
20-
and :math:`y_i` is the label associated with the `i`th example.
20+
where :math:`M` is the rank of the :math:`i^{th}` example
21+
:math:`\mathbf{x}_i` and :math:`y_i` is the label associated with the
22+
:math:`i^{th}` example.
2123
2224
Combining the conditional independence assumption with a simple
2325
application of Bayes' theorem gives the naive Bayes classification
@@ -186,7 +188,6 @@ def _log_class_posterior(self, X, class_idx):
186188
187189
\mathbf{x}_i \mid y_i = c, \theta \sim \mathcal{N}(\mu_c, \Sigma_c)
188190
189-
190191
Parameters
191192
----------
192193
X: :py:class:`ndarray <numpy.ndarray>` of shape `(N, M)`

0 commit comments

Comments
 (0)