Python - Basics of Pandas using Iris Dataset

Python - Basics of Pandas using Iris Dataset

The Iris dataset is a classic in the machine learning world, often used for classification tasks. It contains measurements of 150 iris flowers from three different species: Setosa, Versicolour, and Virginica. The four measurements are: sepal length, sepal width, petal length, and petal width.

Let's go through some basics of Pandas using the Iris dataset:

1. Loading the Iris dataset:

You can load the Iris dataset directly from the seaborn library or through datasets available in sklearn. Here, we'll use seaborn.

import seaborn as sns # Load the dataset iris = sns.load_dataset('iris') 

2. Inspecting the dataset:

# Display the first 5 rows print(iris.head()) # Get info about the dataset: column data types, non-null values, etc. print(iris.info()) # Describe the dataset: count, mean, std, min, 25th percentile, median, 75th percentile, max print(iris.describe()) 

3. Data Selection:

Using Pandas, you can select specific columns or rows based on conditions.

# Select only the 'sepal_length' column sepal_length = iris['sepal_length'] # Select rows where species is 'setosa' setosa = iris[iris['species'] == 'setosa'] 

4. Grouping and Aggregating:

With Pandas, you can easily group by a column and then perform aggregate functions on other columns.

# Group by species and calculate the mean of other columns grouped = iris.groupby('species').mean() print(grouped) 

5. Visualization:

Pandas integrates well with Matplotlib to allow easy visualization.

import matplotlib.pyplot as plt # Box plot to visualize the distribution of measurements by species iris.boxplot(by='species', figsize=(15, 10)) plt.show() 

6. Handling Missing Values (if any):

For a dataset like Iris, you typically don't have missing values. But if you did, here's how you'd handle them:

# Check for missing values print(iris.isnull().sum()) # Drop rows with missing values iris_dropna = iris.dropna() # Or fill missing values with mean (or median, mode, etc.) iris_filled = iris.fillna(iris.mean()) 

7. Applying Functions:

Using Pandas, you can apply a function to transform data.

# Apply a function to convert sepal length from cm to mm iris['sepal_length_mm'] = iris['sepal_length'].apply(lambda x: x * 10) 

These are just some basic operations you can perform with Pandas using the Iris dataset. Pandas is an extensive library, and you can do a lot more with it, including merging datasets, reshaping data, and more.


More Tags

windows-phone-7 scrapy catplot area android-edittext mobile-application git-stash ms-access grid-layout winforms

More Programming Guides

Other Guides

More Programming Examples