Python Pandas - Filling missing column values with median



Median separates the higher half from the lower half of the data. Use the fillna() method and set the median to fill missing columns with median. At first, let us import the required libraries with their respective aliases −

import pandas as pd import numpy as np

Create a DataFrame with 2 columns. We have set the NaN values using the Numpy np.NaN

dataFrame = pd.DataFrame(    {       "Car": ['Lexus', 'BMW', 'Audi', 'Bentley', 'Mustang', 'Tesla'],"Units": [100, 150, np.NaN, 80, np.NaN, np.NaN] } )

Find median of the column values with NaN i.e, for Units columns here. Replace NaNs with the median of the column where it is located using median() on Units column −

dataFrame.fillna(dataFrame['Units'].median(), inplace = True) 

Example

Following is the code −

import pandas as pd import numpy as np # Create DataFrame dataFrame = pd.DataFrame(    {       "Car": ['Lexus', 'BMW', 'Audi', 'Bentley', 'Mustang', 'Tesla'],"Units": [100, 150, np.NaN, 80, np.NaN, np.NaN] } ) print"DataFrame ...\n",dataFrame # finding median of the column values with NaN i.e, for Units columns here # Replace NaNs with the median of the column where it is located dataFrame.fillna(dataFrame['Units'].median(), inplace = True) print"\nUpdated Dataframe after filling NaN values with median...\n",dataFrame

Output

This will produce the following output −

DataFrame ...        Car   Units 0    Lexus   100.0 1      BMW   150.0 2     Audi     NaN 3  Bentley    80.0 4  Mustang     NaN 5    Tesla     NaN Updated Dataframe after filling NaN values with median...        Car Units 0    Lexus 100.0 1      BMW 150.0 2     Audi 100.0 3  Bentley 80.0 4  Mustang 100.0 5    Tesla 100.0
Updated on: 2021-09-21T07:03:26+05:30

3K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements