Slicing, Indexing, Manipulating and Cleaning Pandas Dataframe

Working with a Pandas DataFrame often involves various operations such as slicing, indexing, manipulating, and cleaning data. Let's delve into each of these operations:

1. Setting Up:

First, let's import the necessary libraries and create a sample DataFrame:

import pandas as pd import numpy as np # Sample DataFrame data = { 'A': [1, 2, 3, 4, 5], 'B': ['a', 'b', 'c', 'd', 'e'], 'C': [1.1, 2.2, 3.3, 4.4, np.nan] } df = pd.DataFrame(data)

2. Slicing and Indexing:

Selecting Columns:

df['A'] # Select column A df[['A', 'B']] # Select multiple columns

Selecting Rows:

df[1:4] # Select rows from index 1 to 3

Using `.loc` and `.iloc`:

df.loc[2, 'A'] # Get value at specific row and column using labels df.iloc[2, 0] # Get value using integer indices df.loc[0:2, ['A', 'B']] # Slice using labels df.iloc[0:2, 0:2] # Slice using integer indices

3. Manipulating Data:

Adding a Column:

df['D'] = [10, 20, 30, 40, 50]

Modifying a Column:

df['A'] = df['A'] * 10

Dropping a Column:

df.drop(columns=['D'], inplace=True)

Renaming Columns:

df.rename(columns={'A': 'X', 'B': 'Y'}, inplace=True)

Applying Functions:

df['X'] = df['X'].apply(lambda x: x + 5)

4. Cleaning Data:

Handling Missing Data:

df.dropna(inplace=True) # Remove rows with NaN values df.fillna(0, inplace=True) # Replace NaN values with 0

Removing Duplicates:

df.drop_duplicates(inplace=True)

Resetting Index:

df.reset_index(drop=True, inplace=True)

Filtering Data:

filtered_df = df[df['X'] > 20]

These are just a few of the many operations you can perform on a Pandas DataFrame. Pandas offers a rich set of functionalities that can handle various tasks related to data analysis, manipulation, and cleaning.

More Tags

mat put actionmode odata compression dataweave stanford-nlp django-celery chrome-extension-manifest-v3 parameters