Machine Learning with Spark MLlib

Overview • MLlib is Spark’s library of machine learning (ML) functions designed to run in parallel on clusters. MLlib contains a variety of learning algorithms • MLlib invokes various algorithms on RDDs • Some classic ML algorithms are not included with Spark MLlib because they were not designed for parallel

Overview • Divided into two packages: • spark.mllib contains the original API built on top of RDDs. • spark.ml provides higher-level API built on top of DataFrames • Using spark.ml is recommended because with DataFrames the API is more versatile and ﬂexible. Plan is to keep supporting spark.mllib along with the development of spark.ml.

Machine Learning Recap • Machine learning algorithms try to predict or make decisions based on training data. • There are multiple types of learning problems, including classiﬁcation, regression, or clustering. All of which have diﬀerent objectives.

Spark MLlib Data Types • MLlib contains a few speciﬁc data types including Vector, LabeledPoint, Rating, Matrix (local and distributed) and various Model classes.

MLlib Supported Supervised Algorithm Methods • Binary Classiﬁcation Problems • linear SVMs, logistic regression, decision trees, random forests, gradient-boosted trees, naive bayes • Multiclass Classiﬁcation Problems • logistic regression, decision trees, random forests, naive Bayes • Regression Problems • linear least squares, Lasso, ridge regression, decision trees, random forests, gradient-boosted trees, isotonic regression

MLlib Supported Unsupervised Models • K-means • Gaussian mixture • Power iteration clustering (PIC) • Latent Dirichlet allocation (LDA) • Bisecting k-means • Streaming k-means

Recommender Systems • Collaborative ﬁltering is commonly used for recommender systems. • spark.mllib currently supports model-based collaborative ﬁltering, in which users and products are described by a small set of latent factors that can be used to predict missing entries. • spark.mllib uses the alternating least squares (ALS) algorithm to learn these latent factors.

For more, visit https://supergloo.com

Machine Learning with Spark MLlib

More Related Content

What's hot

Viewers also liked

Similar to Machine Learning with Spark MLlib

Recently uploaded

In this document

Machine Learning with Spark MLlib