Skip to content

bashtage/linearmodels

Repository files navigation

Linear Models

Metric
Latest Release PyPI version
Continuous Integration Build Status
Coverage codecov
Code Quality Codacy Badge
Citation DOI

Linear (regression) models for Python. Extends statsmodels with Panel regression, instrumental variable estimators, system estimators and models for estimating asset prices:

  • Panel models:

    • Fixed effects (maximum two-way)
    • First difference regression
    • Between estimator for panel data
    • Pooled regression for panel data
    • Fama-MacBeth estimation of panel models
  • High-dimensional Regresssion:

    • Absorbing Least Squares
  • Instrumental Variable estimators

    • Two-stage Least Squares
    • Limited Information Maximum Likelihood
    • k-class Estimators
    • Generalized Method of Moments, also with continuously updating
  • Factor Asset Pricing Models:

    • 2- and 3-step estimation
    • Time-series estimation
    • GMM estimation
  • System Regression:

    • Seemingly Unrelated Regression (SUR/SURE)
    • Three-Stage Least Squares (3SLS)
    • Generalized Method of Moments (GMM) System Estimation

Designed to work equally well with NumPy, Pandas or xarray data.

Panel models

Like statsmodels to include, supports formulas for specifying models. For example, the classic Grunfeld regression can be specified

import numpy as np from statsmodels.datasets import grunfeld data = grunfeld.load_pandas().data data.year = data.year.astype(np.int64) # MultiIndex, entity - time data = data.set_index(['firm','year']) from linearmodels import PanelOLS mod = PanelOLS(data.invest, data[['value','capital']], entity_effects=True) res = mod.fit(cov_type='clustered', cluster_entity=True)

Models can also be specified using the formula interface.

from linearmodels import PanelOLS mod = PanelOLS.from_formula('invest ~ value + capital + EntityEffects', data) res = mod.fit(cov_type='clustered', cluster_entity=True)

The formula interface for PanelOLS supports the special values EntityEffects and TimeEffects which add entity (fixed) and time effects, respectively.

Formula support comes from the formulaic package which is a replacement for patsy.

Instrumental Variable Models

IV regression models can be similarly specified.

import numpy as np from linearmodels.iv import IV2SLS from linearmodels.datasets import mroz data = mroz.load() mod = IV2SLS.from_formula('np.log(wage) ~ 1 + exper + exper ** 2 + [educ ~ motheduc + fatheduc]', data)

The expressions in the [ ] indicate endogenous regressors (before ~) and the instruments.

Installing

The latest release can be installed using pip

pip install linearmodels

The main branch can be installed by cloning the repo and running setup

git clone https://github.com/bashtage/linearmodels cd linearmodels pip install .

Documentation

Stable Documentation is built on every tagged version using doctr. Development Documentation is automatically built on every successful build of main.

Plan and status

Should eventually add some useful linear model estimators such as panel regression. Currently only the single variable IV estimators are polished.

  • Linear Instrumental variable estimation - complete
  • Linear Panel model estimation - complete
  • Fama-MacBeth regression - complete
  • Linear Factor Asset Pricing - complete
  • System regression - complete
  • Linear IV Panel model estimation - not started
  • Dynamic Panel model estimation - not started

Requirements

Running

  • Python 3.10+
  • NumPy (1.22+)
  • SciPy (1.8+)
  • pandas (1.4+)
  • statsmodels (0.13.1+)
  • formulaic (1.0.0+)
  • xarray (0.16+, optional)
  • Cython (3.0.10+, optional)

Testing

  • py.test

Documentation

  • sphinx
  • sphinx-immaterial
  • nbsphinx
  • nbconvert
  • nbformat
  • ipython
  • jupyter