Scikit Learn: How to Deal with Missing Values

Missing Data: A Machine Learning Approach

DAMIAN MINGLE CHIEF DATA SCIENTIST, WPC Healthcare @DamianMingle

GET THE FULL STORY bit.ly/UseSciKitNow

What’s Imputation Anyway?  Some models don’t do well with missing values, so filling with values could prove useful.  Missing values can be replaced by the mean, median, or frequent value.

Why Imputation Matters  Imputing the missing values can give better results than discarding the samples containing any missing value.  Imputing does not always improve the predictions – cross-validation is good to review.  In some cases, dropping rows or using marker values is more effective.

 Preprocessing  Clustering  Regression  Classification  Dimensionality Reduction  Model Selection

Let’s Look at an ML Recipe Imputation

The Imports import numpy as np import urllib from sklearn.preprocessing import Imputer

Load Dataset with Missing Values url = “https://goo.gl/3jvZXE” raw_data = urllib.urlopen(url) dataset = np.loadtxt(raw_data, delimiter=",") print(dataset.shape)

Separate Features from Target X = dataset[:,0:7] y = dataset[:,8

Mark Values with 0 X[X==0]=np.nan

Impute Missing Values with Mean imp = Imputer(missing_values =‘NaN’, strategy=‘mean’) imputed_X = imp.fit_transform(X)

Imputation Recipe # Impute missing values with the mean import numpy as np import urllib from sklearn.preprocessing import Imputer # Load dataset from UCI Machine Learning Repo url = “https://goo.gl/3jvZXE” raw_data = urllib.urlopen(url) dataset = np.loadtxt(raw_data, delimiter=",") print(dataset.shape) # Segregate the data by features and target X = dataset[:,0:7] y = dataset[:,8] # All values with 0 become “not actual number” (NaN) X[X==0]=np.nan # Make use of the mean value for attribute imp = Imputer(missing_values='NaN', strategy='mean') imputed_X = imp.fit_transform(X)

Resources  Society of Data Scientists  SciKit Learn  Also:  Fit the imputer on X, fit(X[,y])  Fit to data, then transform it, fit_transform (X[,y])  Impute all missing values in X, transform(X)

Scikit Learn: How to Deal with Missing Values

More Related Content

Similar to Scikit Learn: How to Deal with Missing Values

More from Damian R. Mingle, MBA

Recently uploaded

Scikit Learn: How to Deal with Missing Values