Python Advanced and Statistics
Today's Lecture Topics
Pandas Library
• The most well-known open-source library
for the python programming language is
called pandas, and machine learning and
data science applications frequently use
pandas.
• # Install pandas using pip
• pip install pandas
(or)
• pip3 install pandas
Run Pandas From Command Line
• import pandas as pd
• pd.__version__
Output: '1.3.2'
Creating Series from NumPy Array
• import pandas as pd
• import numpy as np
• data = np.array(['python','php','java'])
• series = pd.Series(data)
• print (series)
Creating Series from Dict
• data = {'Courses' :"pandas", 'Fees' :
20000, 'Duration' : "30days"}
• s2 = pd.Series(data)
• print (s2)
Creating Series from List
• data = ['python','php','java']
• s2 = pd.Series(data, index=['r1', 'r2','r3'])
• print(s2)
Statistics using Python (NumPy)
• NumPy provides powerful tools for basic
statistical operations like mean, median,
and standard deviation.
• Code:
Why NumPy Matters
• Efficient Data Handling: NumPy provides
an efficient way to store and manipulate
data, making it ideal for tasks like data
cleaning and preprocessing.
• Speed: NumPy is written in C, which gives
it a significant speed advantage over pure
Python when performing numerical
computations.
• Compatibility: It seamlessly integrates
with other libraries like SciPy, Pandas, and
Matplotlib, forming the foundation of the
Python data stack.
• Array Operations: NumPy offers a wide
range of mathematical functions for
performing operations on arrays, such as
element-wise addition, subtraction, and
multiplication.
Getting Started with NumPy
• To begin using NumPy, you need to install
it first. You can install NumPy using the
following command:
• pip install numpy
• Once installed, you can import it into your
Python code using:
• import numpy as np
Creating NumPy Arrays
Array Operations
Advanced NumPy Techniques
Applications of NumPy
• Data Analysis: NumPy is used
extensively in data analysis to perform
operations like mean, median, and
standard deviation calculations.
• Machine Learning: Many machine
learning libraries, like scikit-learn, utilize
NumPy arrays to process and analyze
data.
• Image Processing: NumPy aids in
manipulating and processing images,
making it valuable in computer vision
tasks.
• Scientific Research: Scientists and
researchers use NumPy for simulations
and scientific computing.
Python Advanced: Functions
• Functions in Python help modularize code,
promote reuse, and improve readability.
• Code:
Preprocessing with Pandas
• Pandas is used for loading, manipulating,
and cleaning datasets efficiently.
• :
Maximum Likelihood Estimation (MLE)
• MLE helps in estimating parameters by
maximizing the likelihood of observed
data.
• Code:
Real World Use Cases
Use Case/ Models Description
Logistic Regression Parameters estimated via MLE
Naive Bayes Classifier Probability estimation
Hidden Markov Models Transition/emission probabilities
Variational Inference Approximate MLE using variational bounds
Visualization using SNS and Matplotlib
• sns (Seaborn) and matplotlib allow
comprehensive data visualization and
plotting.
• Code:
Output
Seaborn VS Matplot
Feature Seaborn 🐧 Matplotlib 📉
Medium – verbose for
Ease of Use High – concise, readable code
complex plots
Default Aesthetics Beautiful by default Basic (requires styling)
Statistical Plots Built-in (e.g., KDE, violin) Not built-in
Integration with Pandas Excellent Good but manual sometimes
Limited (delegated to
Customization Very flexible
matplotlib)
Slightly slower due to
Performance Slightly faster and more raw
abstraction