Name	Name	Last commit message	Last commit date
Latest commit History 55 Commits
config	config
gradle	gradle
src	src
.gitignore	.gitignore
.travis.yml	.travis.yml
LICENSE	LICENSE
README.adoc	README.adoc
build.gradle	build.gradle
gradle.properties	gradle.properties
gradlew	gradlew
gradlew.bat	gradlew.bat
settings.gradle	settings.gradle

A multiple linear regression plugin for Elasticsearch

Linear regression model has been a mainstay of statistics and machine learning in the past decades and remains one of the most important tools in context of supervised learning algorithms. It’s a powerful technique for prediction of the value of a dependent variable y (called response variable) given the values of another independent variables x = (x₁, x₂,…,x_C) (called explanatory variables) based on a training data set. Prediction of the response variable with respect to the input values for the explanatory variables is described by the linear hypothesis function h(x) with

$gif$

This plugin enhances Elasticsearch’s query engine by two new aggregations, which utilize the index data during search for estimating a linear regression model in order to expose information like prediction of a value for the target variable, anomaly detection and measuring the accuracy or rather predictiveness of the model. Estimation is performed regarding the OLS (ordinary least-squares) approach over the search result set.

Aggregations

Both aggregations are numeric aggregations that estimate the linear regression coefficients $gif.latex?\theta 0,%20\theta 1,%20\theta 2,.%20.%20$ based on document results of a search query. Each search result document is handled as an observation and the numerical fields as variables (explanatory and response) for the linear model.

Aggregation for prediction

The linreg_predict aggregation computes the predicted outcome for the response variable regarding the estimated model with respect to a set of given input values for the explanatory variables.

`value`	The predicted value for the response variable computed using the estimated linear hypothesis function `h(x)` with `x` given by `C` input values for the explanatory variables `x = [x₁, x₂,…,x_C]`.
`coefficients`	Estimated slope coefficients $gif.latex?\theta 1,%20\theta 2,%20\theta 3,.%20.%20$ of the linear linear hypothesis function `h(x)`.
`intercept`	Estimated intercept coefficient $gif$ of the linear hypothesis function `h(x)`.

Assuming the data consists of documents representing sold house prices with features like number of bedrooms, bathrooms and size etc. We can use this data to predict or validate the price for our house in Morro Bay with 2000 square feet, 4 bedrooms and 2 bathrooms.

/houses/_search?size=0 { "query": { "match" : { "location" : "Morro Bay" } }, "aggs": { "house_prices": { "linreg_predict": { "fields": ["size", "bedrooms", "bathrooms", "price"], (1) "inputs": [2000, 4, 2] (2) } } } }

fields instructs this aggregation to use for the linear regression model the house feature fields size, bedrooms and bathrooms as explanatory variables and the price field as the response variable. The size of the fields array is C + 1 with C entries for the explanatory variables and one entry for the response variable.
inputs passes the feature values of our house we like to predict the price for. The numeric input values have to be passed in array form in the order corresponding to the features listed in the fields attribute. The size of the inputs array is C equivalent to the number of the explanatory variables.

And the following may be the response with the estimated price for our house:

{ ... "aggregations": { "my_house_price": { "value": 581458.3087492324, "coefficients": [ 248.92285661317254, -68297.7720278421, 64406.52205356777 ], "intercept": 227990.63952712028 } } }

Aggregation for linear regression statistics

The linreg_stats aggregation computes statistics for the estimated linear regression model.

`rss`	Residual sum of squares as a measure of the discrepancy between the data and the estimated model. The lower the `rss` number, the smaller the error of the prediction, and the better the model.
`mse`	Mean squared error or rather `rss` divided by the number of documents consumed for model estimation.
`coefficients`	Slope coefficients $gif.latex?\theta 1,%20\theta 2,%20\theta 3,.%20.%20$ of the linear linear hypothesis function `h(x)`.
`intercept`	Intercept coefficient $gif$ of the linear hypothesis function `h(x)`.

Assuming the data consists of documents representing house prices we can compute statistics for the estimated best fitting linear hypothesis function which predicts house prices based on number of bedrooms, bathrooms and size with

/houses/_search?size=0 { "aggs": { "house_prices": { "linreg_stats": { "fields": ["bedrooms", "bathrooms", "size", "price"] } } } }

The aggregation type is linreg_stats and the fields setting defines the set of fields (as an array) to be used for building the linear model. The first one to many fields stand for the explanatory variables and the last for the response variable. The above request returns the following response:

{ ... "aggregations": { "house_prices": { "rss": 49523788338938.734, "mse": 63410740510.80504, "coefficients": [ -100544.0725894584, 45981.15827544966, 309.6013051477475 ], "intercept": 47553.18737564783 } } }

Installation

Elasticsearch 5.x

For installing this plugin please choose first the proper version under the compatible matrix which matches your Elasticsearch version and use the download link for the following command.

./bin/elasticsearch-plugin install https://github.com/scaleborn/elasticsearch-linear-regression/releases/download/5.3.0.1/elasticsearch-linear-regression-5.3.0.1.zip

The plugin will be installed under the name "linear-regression". Do not forget to restart the node after installing.

Table 1. Compatibility matrix

Plugin version	Elasticsearch version	Release date
5.3.0.1	5.3.0	Jun 1, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

A multiple linear regression plugin for Elasticsearch

Aggregations

Aggregation for prediction

Aggregation for linear regression statistics

Installation

Elasticsearch 5.x

About

Uh oh!

Releases 6

Packages

Languages

License

mbok/elasticsearch-linear-regression

Folders and files

Latest commit

History

Repository files navigation

A multiple linear regression plugin for Elasticsearch

Aggregations

Aggregation for prediction

Aggregation for linear regression statistics

Installation

Elasticsearch 5.x

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 6

Packages 0

Languages

Packages