0% found this document useful (0 votes)

17 views26 pages

Lecture 5

The document discusses various aspects of machine learning, focusing on feature engineering, handling missing data, and the differences between supervised and unsupervised learning. It highlights techniques for data normalization, scaling, and the importance of training, testing, and validation sets, including K-fold cross-validation. Additionally, it addresses the issues of overfitting and underfitting in model training.

Uploaded by

usamasulemanleghari

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views26 pages

Lecture 5

Uploaded by

usamasulemanleghari

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 26

Special Topics of

Machine Learning in
Lecture

Cyber Security 05

Machine Learning Basics

Arslan Ali Khan arsl

an.ali@riphah.edu.pk
Department of Cyber-Security and Data
Science Riphah Institute of Systems
Engineering (RISE), Riphah International
University, Islamabad, Pakistan.
Feature
Engineering
• Dealing with Missing Data
Missing values are data points that are absent for a specific
variable in a dataset. They can be represented in various ways,
such as blank cells, null values, or special symbols like “NA” or
“unknown.” These missing data points pose a significant
challenge in data analysis and can lead to inaccurate or biased
results.
Feature
Engineering
• Dealing with Missing Data
Missing values can pose a significant challenge in data analysis, as
they can:
• Reduce the sample size: This can decrease the accuracy and
reliability of your analysis.
• Introduce bias: If the missing data is not handled properly, it
can bias the results of your analysis.
• Make it difficult to perform certain analysis: Some statistical
techniques require complete data for all variables, making
them inapplicable when missing values are present
Feature
Engineering
• Dealing with Missing Data
Using Estimated values:

• Replacing missing values with estimated values.

• Preserves sample size: Doesn’t reduce data points.

• Can introduce bias: Estimated values might not be accurate.

Use of Mean, Median, and Mode:

• Replace missing values with the mean, median, or mode of the relevant variable.

• Simple and efficient: Easy to implement.

• Can be inaccurate: Doesn’t consider the relationships between variables.

Feature
Engineering
• Handling Categorical Data
Categorical data is data that can be divided into groups or
categories, such as gender, hair color, or product type.
Feature
Engineering
• Normalizing Data
Normalization in machine learning is the process of translating
data into the range [0, 1] (or any other range).
• Feature Construction or Generation
Feature Generation (also known as feature construction, feature
extraction or feature engineering) is the process of transforming
features into new features that better relate to the target. This
can involve mapping a feature into a new feature using a
function like log, or creating a new feature from one or multiple
features using multiplication or addition.
Feature 5
6

Scaling
A technique often applied as part of data preparation for machine learning.
Goal: Change the values of numeric columns in the dataset to a common scale,
without
distorting differences in the ranges of values.

Normalization
Min-max normalization: Guarantees all features will have the exact same scale but
does not handle outliers well.

Z-score standardization: Handles outliers, but does not produce normalized data
with the
exact same scale.
Training, Testing and Validation 5
7

Sets
Training, Testing and 5
8

Validation Set
K-Fold Cross 5
9

Validation
K-fold cross-validation is a
technique for evaluating
predictive models.

The dataset is divided into k

subsets or folds. The model is
trained and evaluated k
times, using a diff erent fold
as the validation set each
time.

Performance metrics from

each fold are averaged to
estimate the model's
generalization performance.
K-Fold Cross 6
0

Validation
Under-fitting and Over- 6
1

fitting
• Overfitting occurs when the model fits the training data too well and does
Overfittin not generalize so it performs badly on the test data.
g • Its the result of an excessively complic ated model.

Underfitting occurs when the model does not fit the data well
Underfittin • enough.
Is result of an excessively simple model.
g•
Under-fitting and Over- 6
2

fitting

• Both overfitting and underfitting lead to poor predictions on new

datasets.

• A learning model that overfits or underfits does not generalize

well.
Supervised vs. Unsupervised
Learning
• Supervised learning (classification)
 Supervision: The training d a t a
(observations, measurements, etc.) are
a c c o m p a n i e d by labels indicating the
class of the observations
 New d a ta is classified based on the
training set
• Unsupervised learning (clustering)
 The class labels of training d a t a is unknown
 Given a set of measurements, observations,
4
etc. with the aim of establishing the
Machine
Learning
• Supervised: We are given input samples (X) a n d output
samples (y) of a function y = f(X). We would like to “learn”
f, a n d evaluate it on new data. Types:
 Classification: y is discrete (c lass la b e ls).
 Regression: y is c ontinuous, e.g. linear regression.

• Unsupervised: Given only samples X of the data, w e

c om p u te a
function f suc h that y = f(X) is “simpler”.
 Clustering: y is discrete
 Y is continuous: Matrix factorization, Kalman filtering, unsupervised
neural
networks.
Technique
s
• Supervised
Learning:
 Linear Regression
 Logistic Regression
 Decision Tree
 Naïve Bayes
 Random Forests
• Unsupervised
Learning:
 Clustering
 Factor analysis
 Topic Models
Regressi 7

on
Regression 8

Task
Regression 1
0

Task
Linear Regression Vs Logistic 1
1

Regression
Linear Regression Vs Logistic 1
2

Regression
Linear 1
3

Regression
Regression 1
4

Task
Linear 1
5

Regression

Y = mx +
c
Linear Regression 1
6

Example
Linear Regression 1
7

Example

Topic 2
No ratings yet
Topic 2
47 pages
Data Analyst Interview Questionaries
No ratings yet
Data Analyst Interview Questionaries
16 pages
ML Lectures Summary 2
No ratings yet
ML Lectures Summary 2
52 pages
Introduction To ML
No ratings yet
Introduction To ML
55 pages
Workflow of A Machine Learning Project
No ratings yet
Workflow of A Machine Learning Project
12 pages
Unit III - I
No ratings yet
Unit III - I
15 pages
Model Evaluation
No ratings yet
Model Evaluation
39 pages
Machine Learning
No ratings yet
Machine Learning
28 pages
Data Analytics Course (IIFT MBA) Full Course Summary - 27072023
No ratings yet
Data Analytics Course (IIFT MBA) Full Course Summary - 27072023
253 pages
Machine Learning Overview Guide
No ratings yet
Machine Learning Overview Guide
68 pages
Classification
No ratings yet
Classification
53 pages
ML & DL
No ratings yet
ML & DL
19 pages
Overfitting & Feature Engineering
No ratings yet
Overfitting & Feature Engineering
37 pages
Machine Learning
No ratings yet
Machine Learning
25 pages
ML 02 Dataset-Feature Selection PDF
No ratings yet
ML 02 Dataset-Feature Selection PDF
44 pages
Machine Learning: A Review of Classification and Combining Techniques
No ratings yet
Machine Learning: A Review of Classification and Combining Techniques
32 pages
Final ML
No ratings yet
Final ML
2 pages
Lecture 2 20022025 092902am
No ratings yet
Lecture 2 20022025 092902am
87 pages
Machine Learning Basics
No ratings yet
Machine Learning Basics
32 pages
QSRI Lecture1
No ratings yet
QSRI Lecture1
45 pages
Module 3 Data Science Machine Learning
No ratings yet
Module 3 Data Science Machine Learning
53 pages
نسخة من prep
No ratings yet
نسخة من prep
17 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
7 pages
Machine Learning Models: by Mayuri Bhandari
No ratings yet
Machine Learning Models: by Mayuri Bhandari
48 pages
Lecturenotes PDF
No ratings yet
Lecturenotes PDF
80 pages
Week11 - Regularization and Optimization
No ratings yet
Week11 - Regularization and Optimization
75 pages
Machine Learning
No ratings yet
Machine Learning
37 pages
Lecturenotes Cse176
No ratings yet
Lecturenotes Cse176
80 pages
Complete ML Concepts
No ratings yet
Complete ML Concepts
30 pages
Learning Progress Review Week 10
No ratings yet
Learning Progress Review Week 10
35 pages
Linear Regression Summary
No ratings yet
Linear Regression Summary
57 pages
Chapter Three
No ratings yet
Chapter Three
35 pages
DL Unit1
100% (2)
DL Unit1
79 pages
Lecture 4 Machine Learning - BCSC
No ratings yet
Lecture 4 Machine Learning - BCSC
45 pages
Model Evaluation
No ratings yet
Model Evaluation
29 pages
Machine Learning Qs
No ratings yet
Machine Learning Qs
10 pages
End SEM V IMP DSE 2
No ratings yet
End SEM V IMP DSE 2
9 pages
Fundamentals of ML Recap
No ratings yet
Fundamentals of ML Recap
21 pages
Machine Learning Essentials
No ratings yet
Machine Learning Essentials
86 pages
Supervised ML with Flask & Docker
No ratings yet
Supervised ML with Flask & Docker
30 pages
Data Science
No ratings yet
Data Science
64 pages
Slides On DataI
No ratings yet
Slides On DataI
33 pages
Machine Learning
No ratings yet
Machine Learning
6 pages
Machinelearning
No ratings yet
Machinelearning
59 pages
ML PYQs
No ratings yet
ML PYQs
32 pages
Types of Machine Learning Algorithms
No ratings yet
Types of Machine Learning Algorithms
14 pages
Ids Sem Ans U-Ii
No ratings yet
Ids Sem Ans U-Ii
10 pages
5.feauture Engineering
No ratings yet
5.feauture Engineering
34 pages
Data Science-Unit-4 - 05.10.23
No ratings yet
Data Science-Unit-4 - 05.10.23
59 pages
Unit I 2
No ratings yet
Unit I 2
78 pages
Machine Learning Juunit2.pdf Lands
No ratings yet
Machine Learning Juunit2.pdf Lands
7 pages
Chapter 2 Data Preprocessing
No ratings yet
Chapter 2 Data Preprocessing
23 pages
ML Interview Questions
No ratings yet
ML Interview Questions
60 pages
Machine Learning HC
No ratings yet
Machine Learning HC
4 pages
Lect 04 Preprocessing Structured
No ratings yet
Lect 04 Preprocessing Structured
39 pages
Intro To ML
No ratings yet
Intro To ML
26 pages
ML Unit 1
No ratings yet
ML Unit 1
73 pages
Data Classification Using Support Vector Machine: Durgesh K. Srivastava, Lekha Bhambhu
No ratings yet
Data Classification Using Support Vector Machine: Durgesh K. Srivastava, Lekha Bhambhu
7 pages
BUS 403: Business Decision Analysis Guide
No ratings yet
BUS 403: Business Decision Analysis Guide
210 pages
AI sp12 Final Solutions
No ratings yet
AI sp12 Final Solutions
19 pages
PDDL
No ratings yet
PDDL
14 pages
CAT1 (Design and Analysis of Algorithms)
No ratings yet
CAT1 (Design and Analysis of Algorithms)
6 pages
Econ f342 Appeco
No ratings yet
Econ f342 Appeco
3 pages
Digital Certificate and Signature
100% (5)
Digital Certificate and Signature
51 pages
EE 675 Lecture 27th March
No ratings yet
EE 675 Lecture 27th March
4 pages
Unit 5 - QB
No ratings yet
Unit 5 - QB
7 pages
System Engineering and Process Control: Exercises
No ratings yet
System Engineering and Process Control: Exercises
91 pages
Convolutional Coding Presentation
No ratings yet
Convolutional Coding Presentation
23 pages
Student - Grade Prediction
No ratings yet
Student - Grade Prediction
12 pages
Peng-Robinson Equation of State For A Pure Fluid: Properties Intermediate Calculations
No ratings yet
Peng-Robinson Equation of State For A Pure Fluid: Properties Intermediate Calculations
2 pages
MATLAB Impulse Response Guide
No ratings yet
MATLAB Impulse Response Guide
5 pages
Quantum Autoencoders With Enhanced Data Encoding
No ratings yet
Quantum Autoencoders With Enhanced Data Encoding
7 pages
Question Paper Code: 57236: Cseannauniv - Blogspot.in
No ratings yet
Question Paper Code: 57236: Cseannauniv - Blogspot.in
3 pages
Econometrics Toolkit for Educators
No ratings yet
Econometrics Toolkit for Educators
2 pages
Evolutionary Computing Basics
No ratings yet
Evolutionary Computing Basics
20 pages
Problem Set 02
No ratings yet
Problem Set 02
5 pages
Unit 10
No ratings yet
Unit 10
14 pages
LT CC Slides Lecture 1
No ratings yet
LT CC Slides Lecture 1
59 pages
Prims Minimum Spanning Tree
No ratings yet
Prims Minimum Spanning Tree
5 pages
Instrumentation Engineering
No ratings yet
Instrumentation Engineering
34 pages
Unsupervised Machine Learning
No ratings yet
Unsupervised Machine Learning
4 pages
Wireless Signal Detection with Uni-DNN
No ratings yet
Wireless Signal Detection with Uni-DNN
6 pages
Edge Adaptive Lossless Image Coding
No ratings yet
Edge Adaptive Lossless Image Coding
8 pages
A Time Series Is Worth 64 Words - Long-Term Forecasting With Transformers
No ratings yet
A Time Series Is Worth 64 Words - Long-Term Forecasting With Transformers
24 pages
Introduction To Digital Signal Processing
100% (2)
Introduction To Digital Signal Processing
21 pages
Systemsofequationsunit
No ratings yet
Systemsofequationsunit
3 pages
Data Mining Nostos
No ratings yet
Data Mining Nostos
4 pages

Lecture 5

Uploaded by

Lecture 5

Uploaded by

Special Topics of

Machine Learning Basics

Arslan Ali Khan arsl

• Replacing missing values with estimated values.

• Preserves sample size: Doesn’t reduce data points.

• Can introduce bias: Estimated values might not be accurate.

Use of Mean, Median, and Mode:

• Simple and efficient: Easy to implement.

• Can be inaccurate: Doesn’t consider the relationships between variables.

The dataset is divided into k

Performance metrics from

• Both overfitting and underfitting lead to poor predictions on new

• A learning model that overfits or underfits does not generalize

• Unsupervised: Given only samples X of the data, w e

You might also like