UNIVERSITY OF STEEL TECHNOLOGY
AND MANAGEMENT
Introduction to Data Science and
Data Analytics
Presented by:
Dr. Ravindra Singh Saluja
OP Jindal University, Raigarh
UNIVERSITY OF STEEL TECHNOLOGYAND MANAGEMENT
Introduction to Pandas
• Pandas is primarily used for working
with structured data. It provides two
main data structures:
• Series: One-dimensional labeled array
capable of holding any data type.
• DataFrame: Two-dimensional labeled
data structure with columns of
potentially different types.
2
Creating a series
• From a List:
import pandas as pd
data = [10, 20, 30, 40, 50]
series = pd.Series(data)
print(series)
• From a Dictionary:
import pandas as pd
data = {"a": 1, "b": 2, "c": 3}
series = pd.Series(data)
print(series)
• With Custom Index:
import pandas as pd
data = [10, 20, 30, 40, 50]
series = pd.Series(data, index=["a", "b", "c", "d", "e"])
print(series)
3
Creating a DataFrame
import pandas as pd
# Creating a DataFrame from a dictionary
data = {
'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35],
'City': ['New York', 'Los Angeles', 'Chicago']
}
df = pd.DataFrame(data)
print(df)
4
Data Manipulation
Pandas provides a wide range of methods to manipulate data, such as filtering, sorting,
and grouping.
•Filtering: Select rows based on conditions.
# Filtering rows where Age is greater than 28
filtered_df = df[df['Age'] > 28]
print(filtered_df)
• Sorting: Sort the DataFrame by a specific column.
# Sorting by Age in descending order
sorted_df = df.sort_values(by='Age', ascending=False)
print(sorted_df)
• Grouping: Group data and perform aggregate functions.
# Grouping by City and calculating the mean age
grouped_df = df.groupby('City')['Age'].mean()
print(grouped_df)
Handling Missing Data
Pandas makes it easy to handle missing data with methods like fillna() and dropna().
# Filling missing values with a default value
df.fillna(0, inplace=True)
# Dropping rows with any missing values
df.dropna(inplace=True)
Merging and Joining
DataFrames
# Merging two DataFrames on a common column
df1 = pd.DataFrame({'ID': [1, 2, 3], 'Name': ['Alice', 'Bob', 'Charlie']})
df2 = pd.DataFrame({'ID': [1, 2, 4], 'Age': [25, 30, 40]})
merged_df = pd.merge(df1, df2, on='ID', how='inner')
print(merged_df)
Reading and Writing Data
You can read from and write to various file formats like CSV, Excel, and SQL databases.
# Reading from a CSV file
df = pd.read_csv('data.csv')
# Writing to a CSV file
df.to_csv('output.csv', index=False)
String Manipulation
df['Name'] = df['Name'].str.upper() # Convert names to uppercase
df['Name_Length'] = df['Name'].str.len() # Find length of names
df['Name'] = df['Name'].str.replace('A', '@') # Replace 'A' with '@'
9
Descriptive Statistics
• Basic Statistical Measures:
# Creating a sample DataFrame
data = {'A': [1, 2, 3, 4, 5], 'B': [5, 10, 15, 20,
25]}
df = pd.DataFrame(data)
# Descriptive statistics
print(df.describe())
10
• Calculating Specific Statistics:
# Mean
mean = df['A'].mean()
print('Mean:', mean)
# Standard Deviation
std = df['A'].std()
print('Standard Deviation:', std)
# Correlation
correlation = df.corr()
print('Correlation:\n', correlation)
11