Skip to content

A machine learning plugin for Elasticsearch providing aggregations to compute multiple linear regression on search results in real-time for predictive analytics.

License

Notifications You must be signed in to change notification settings

mbok/elasticsearch-linear-regression

Repository files navigation

A multiple linear regression plugin for Elasticsearch

Build Status

Linear regression model has been a mainstay of statistics and machine learning in the past decades and remains one of the most important tools in context of supervised learning algorithms. It’s a powerful technique for prediction of the value of a dependent variable y (called response variable) given the values of another independent variables x = (x1, x2,…​,xC) (called explanatory variables) based on a training data set. Prediction of the response variable with respect to the input values for the explanatory variables is described by the linear hypothesis function h(x) with

gif

This plugin enhances Elasticsearch’s query engine by two new aggregations, which utilize the index data during search for estimating a linear regression model in order to expose information like prediction of a value for the target variable, anomaly detection and measuring the accuracy or rather predictiveness of the model.

Aggregations

The aggregations estimate the linear regression coefficients gif.latex?\theta 0,%20\theta 1,%20\theta 2,.%20.%20 based on document results of a search query. Each search result document is handled as an observation and the numerical fields as variables (explanatory and response) for the linear model.

Linear Regression Stats

The linreg_stats aggregation is a numeric aggregation that computes statistics for the estimated linear regression model over a set of document fields standing for the response and the explanatory variables.

rss

Residual sum of squares as a measure of the discrepancy between the data and the estimated model. The lower the rss number, the smaller the error of the prediction, and the better the model.

mse

Mean squared error or rather rss divided by the number of documents consumed for model estimation.

coefficients

Slope coefficients gif.latex?\theta 1,%20\theta 2,%20\theta 3,.%20.%20 of the linear function estimated regarding least-squares regression.

intercept

Intercept coefficient gif of the linear function estimated regarding least-squares regression.

Installation

Elasticsearch 5.x

For installing this plugin please choose first the proper version under the compatible matrix which matches your Elasticsearch version and use the download link for the following command.

./bin/elasticsearch-plugin install https://github.com/scaleborn/elasticsearch-linear-regression/releases/download/5.3.0.1/elasticsearch-linear-regression-5.3.0.1.zip

The plugin will be installed under the name "linear-regression". Do not forget to restart the node after installing.

Table 1. Compatibility matrix

Plugin version

Elasticsearch version

Release date

5.3.0.1

5.3.0

Jun 1, 2017

About

A machine learning plugin for Elasticsearch providing aggregations to compute multiple linear regression on search results in real-time for predictive analytics.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages