0% found this document useful (0 votes)

36 views70 pages

MLA TAB Lecture3

This document provides an overview of key concepts from Lecture 3 of a Machine Learning accelerator course, including optimization, gradient descent, linear and logistic regression, regularization, and boosting. Optimization techniques like gradient descent are used to minimize error functions and find optimal parameters for machine learning models. Regularization helps address overfitting by adding a penalty for model complexity. Boosting builds multiple weak models sequentially to boost overall performance by reducing errors from previous models.

Uploaded by

Lori Guerra

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

36 views70 pages

MLA TAB Lecture3

Uploaded by

Lori Guerra

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 70

MACHINE LEARNING ACCELERATOR

Tabular Data – Lecture 3

Course Overview
Lecture 1 Lecture 2 Lecture 3

• Introduction to ML • Feature Engineering • Optimization

• Model Evaluation • Tree-based Models • Regression Models

 Train-Validation-Test  Decision Tree • Regularization

 Overfitting  Random Forest • Boosting

• Exploratory Data Analysis • Hyperparameter Tuning • Neural Networks

• K Nearest Neighbors (KNN) • AWS AI/ML Services • AutoML

Optimization
Optimization in Machine Learning
• We build and train ML models, hoping for:

ML Model Features ML Model (Rules) ML Model Target

• In reality … error

ML Model Features ML Model (Rules) ML Model Prediction

• Learn better and better models, such that overall model error gets smaller
and smaller … ideally, as small as possible!
Optimization
• In ML, use optimization to minimize an error function of the ML model
 Error function: , where = input, = function, = output
 Optimizing the error function:
- Minimizing means finding the input that results in the lowest value
- Maximizing, means finding that gives the largest
Gradient Optimization
• Gradient: direction and rate of the fastest increase of a function.
 It can be calculated with partial derivatives of the function with respect
to each input variable in .
 Because it has a direction, the gradient is a “vector”.
Gradient Example
, with gradient vector
• Sign of the gradient shows direction the
function increases: + right and – left
Gradient Example
, with gradient vector
• Sign of the gradient shows direction the
function increases: + right and – left
Gradient Example
, with gradient vector
• Sign of the gradient shows direction the
function increases: + right and – left

• As we go towards to the bottom part of the

function, gradient gets smaller
Gradient Example
, with gradient vector
• Sign of the gradient shows direction the
function increases: + right and – left

• As we go towards to the bottom part of the

function, gradient gets smaller and becomes zero
(i.e., function can no longer change, can no longer
decrease – it reached the min!)
Gradient Descent Method
• Gradient Descent method uses gradients to find the minimum of a
function iteratively.
• Taking steps (proportional to the gradient size) towards the minimum, in
the opposite direction of the gradient.

• Gradient Descent Algorithm:

 Start at an initial point
 Update:
Gradient Descent Method

large Initial Values large

Global Minimum
Regression Models
Linear Regression
We use (linear) regression for
numerical value prediction.
Example: How does the price of a
house (target, outcome , ) change
relate to its square footage living
(feature, attribute )?

* Data source: King County, WA Housing Info. For ,

Multiple Linear Regression
Example: How does the price of a house (target, outcome ) change relate
to its square footage living (feature ), its number of bedrooms (feature
), its zip code ( ),…? That is, using multiple features…

Using the multiple linear regression equation:

• Assuming all other variables stay the same, an increase of by 1 foot
square, increases the price by
• Assuming all other variables stay the same, an increase of by 1
bedroom, increases the price by , and so on …
Linear Regression
Regression line , is
defined by: (intercept), (slope).
The vertical offset for each data point
from the line is the error between the
true label) and (the prediction based on
).
Best “line” (best , ) minimizes the
sum of squared errors (SSE):
Fitting a Model: Gradient Descent
• For a Linear Regression model:
,

with features , and parameters/weights

• Minimize the Mean Squared Error cost function:
: index; : number of samples
: output; : model prediction

• Iteratively update parameters/weights with Gradient Descent:

From Regression to Classification
Linear regression was useful when predicting continuous values

Can we use a similar approach to solve classification problems?

The most simple classification problem is a binary classification, where {0,
1}.
Examples:
Email: Spam or Not Spam
Text: Positive or Negative product review
Image: Cat or Not Cat
Logistic Regression
Idea: We can apply the Sigmoid function to
• Sigmoid (Logistic) function

“squishes” values to the 0 –1 range.

• Can define a “Decision boundary” at 0.5
- if 0.5, round down (class 0)
- if 0.5, round up (class 1)
• Our regression equation becomes:
Log-Loss (Binary Cross-Entropy)
Log-Loss: A numeric value that measures the performance of a binary
classifier when model output is a probability between 0 and 1:

: true class {0, 1}, = : probability of class, and : logarithm

• As the output of Logistic Regression is between 0 and 1, Log-Loss is a

suitable cost function for the Logistic Regression.
• To improve Logistic Regression model learning from data, minimize Log-
Loss.
Log-Loss (Binary Cross-Entropy)
Example: Let’s calculate the Log-Loss

for the following scenarios:

• : true class = 1, = 0.3

LogLoss
LogLoss
• : true class = 1, = 0.8 p=0.3
p=0.8

Better prediction gives smaller Log-Loss predicted probability

Fitting a Model: Gradient Descent
• For a Logistic Regression model:
,

with features , and parameters/weights

• Minimize the LogLoss cost function:
: index; : # samples
: output
: model prediction

• Iteratively update parameters/weights with Gradient Descent:

Regularization
Regularization
Underfitting: Model too simple, fewer features,
smaller weights, weak learning.
Overfitting: Model too complex, too many features,
larger weights, weak generalization.
‘Good Fit’ Model: Compromise between fit and
complexity (drop features, reduce weights).

Regularization does both: penalizes large weights,

sometimes reduced all the way to zero!
Regularization
• Tune model complexity by adding a penalty score for complexity to the
cost function (think error function, minimizing towards best fit!):

• Calibrate regularization strength by using a regularizer parameter,

• Standard regularization types:
 L2 regularization (Ridge): (L2: popular choice)
 L1 regularization (LASSO): (L1: useful as feature
selection, since most
 Both L2 and L1 (ElasticNet)
weights shrink to 0 -
sparsity)
• Note: Important to scale features first!
Regression in sklearn
LinearRegression: sklearn Linear Regression (and regularization)
LinearRegression()
Ridge(alpha=1.0), RidgeCV(alpha=1.0, cv=5)
Lasso(alpha=1.0), LassoCV(alpha=1.0, cv=5)
ElasticNet(alpha=1.0, l1_ratio=0.5), ElasticNetCV(cv=5)

LogisticRegression: sklearn Logistic Regression (and regularization)

LogisticRegression(penalty='l2', C=1.0, l1_ratio=None)
LogisticRegressionCV(penalty='l2', C=1.0, l1_ratio=None, cv=5)
Ensemble Methods: Boosting
Boosting
Boosting method: build multiple weak models sequentially, each
subsequent model attempting to boost performance overall, by
overcoming/reducing the errors of the previous model.
Data

Weak Model Weak Model Weak Model …

Prediction 1 Prediction 2 Prediction 2

Ensemble Prediction
Boosting
Boosting method: build multiple weak models sequentially, each
subsequent model attempting to boost performance overall, by
overcoming/reducing the errors of the previous model.

Data Data Data

Weak Model Weak Model Weak Model …

Prediction 1 Prediction 2 Prediction 3
far from target far from target far from target
Boosting
Boosting method: build multiple weak models sequentially, each
subsequent model attempting to boost performance overall, by
overcoming/reducing the errors of the previous model.

Data 1

Weak Model 1

Prediction large error

far from target

Ensemble
Prediction
Boosting
Boosting method: build multiple weak models sequentially, each
subsequent model attempting to boost performance overall, by
overcoming/reducing the errors of the previous model.

Data 1

Weak Model 1

Prediction large error

far from target

Ensemble
Prediction
Boosting
Boosting method: build multiple weak models sequentially, each
subsequent model attempting to boost performance overall, by
overcoming/reducing the errors of the previous model.

Data 1 Data 2

Weak Model 1 Weak Model 2

Prediction large error

far from target

Ensemble
Prediction
Boosting
Boosting method: build multiple weak models sequentially, each
subsequent model attempting to boost performance overall, by
overcoming/reducing the errors of the previous model.

Data 1 Data 2

Weak Model 1 Weak Model 2

Prediction large error Prediction still large error

far from target far from target

Ensemble
Prediction
Boosting
Boosting method: build multiple weak models sequentially, each
subsequent model attempting to boost performance overall, by
overcoming/reducing the errors of the previous model.

…
Data 1 Data 2

Weak Model 1 Weak Model 2 …

Prediction large error Prediction still large error …

far from target far from target

Ensemble …
Prediction
Gradient Boosting Machines (GBM)
Gradient Boosting Machines (GBM): Boosting trees
• Train a weak model on the given data, and make predictions with it
• Iteratively create a new model to learn to overcome prediction errors of the
previous model (use previous prediction error as new target)
Features Features Features Features

Target 2- Prediction 2
Target 1- Prediction 1

Target 3- Prediction 3
Target 1 Target 2 Target 3 … Target N

Tree 1 Tree 2 Tree 3 … Tree N

Prediction 1 Prediction 2 Prediction 3 … Prediction N

Prediction 1 + Prediction 2 + Prediction 3 + … + Prediction N

Gradient Boosting in Python
• sklearn GBM algorithms:
 GradientBoostingClassifier (Regressor)
 HistGradientBoostingClassifier (Regressor) – faster, experimental
• Additional third-party libraries provide computationally efficient alternate
GBM implementations, with better results in practice:
 XGBoost (Extreme Gradient Boosting): efficient compute, memory
 LightGBM: much faster
 CatBoost (Category Gradient Boosting): fast, supports categoricals
Gradient Boosting in sklearn
GradientBoostingClassifier: sklearn’s Gradient Boosting classifier
(there is also a Regressor version) - .fit(), .predict()

GradientBoostingClassifier(n_estimators=100, learning_rate = 0.1,

min_samples_split=2, min_samples_leaf=1, max_depth=3)

The full interface is larger.

Notice the mix of boosting-specific and tree-specific parameters.
Gradient Boosting in sklearn
HistGradientBoostingClassifier: sklearn’s Light GBM classifier (there
is also a Regressor version), in experimental stage - .fit(), .predict()

from sklearn.experimental import enable_hist_gradient_boosting

HistGradientBoostingClassifier(max_iter=100, learning_rate = 0.1,
max_leaf_nodes=31, min_samples_leaf=20, max_depth=None)

The full interface is larger.

Neural Networks
Looking back at Regression Models
Output Linear Regression*: Given { },
predict :

(sum)
(weights)
Input

* Basically assuming that the output depends only on

first order interactions of the inputs
Looking back at Regression Models
Output Linear Regression*: Given { },
predict :

where is the linear function:

Activation function
(sum)
(weights)
Input

* Linear activation function

Looking back at Regression Models
Output Logistic Regression*: Given { },
predict , where ::

where is the logistic function:

Activation function
(sum)
(weights)
Input

* Non-linear activation function / binary classifier

Perceptron (Rosenblatt, 1957)
Output Perceptron*: Given { }, predict ,
where :

where is the step function:

Activation function
(sum)
(weights)
Input

* Non-linear activation function / binary classifier

Artificial Neuron
Output Artificial Neuron*: Given { },
predict :

where is a nonlinear activation

function (sigmoid, tanh, ReLU, …)
Activation function
(sum)
(weights)
Input

* Similar to how neurons in the brain function

Artificial Neuron
Output
Artificial Neuron: Captures mostly
linear interactions in the data.

Question: Can we use a similar

approach to capture non-linear
Activation function
interactions in the data?
(sum)
(weights)
Input Not a very good classifier
…
Neural Network/Multilayer Perceptron
Output
Artificial Neuron: Captures mostly
linear interactions in the data.

Question: Can we use a similar

(3 weights)
approach to capture non-linear
interactions in the data?

(6 weights)
Input Much better!
Neural Network/Multilayer Perceptron
Artificial Neuron: Captures mostly
linear interactions in the data
Output Layer
Question: Can we use a similar
(3 weights)
approach to capture non-linear
Hidden Layer
interactions in the data?

MultiLayer Network: Two layers (one hidden layer, output layer), with five
hidden neurons in the hidden layer, and one output neuron.

MultiLayer Network: Two layers (one hidden layer, output layer), with five MultiLayer Network: Four layers (three hidden layer, output layer), with five-three-
hidden neurons in the hidden layer, and three output neurons. two hidden neurons in the hidden layers, and two output neurons.

More details
Build and Train a Neural Network

𝒐
(𝒐𝒖𝒕 ) We build a neural network for a binary
Output Layer
𝒐
(𝒊𝒏) classification task, with:

• (no bias, for simplicity)

• 2 inputs: = 0.5 and = 0.1
(𝒐𝒖𝒕 )
𝒉𝟏 𝒉𝟐
(𝒐𝒖𝒕 )
Hidden Layer • 1 hidden layer with 2 neurons
(𝒊𝒏) (𝒊𝒏)
𝒉𝟏 𝒉𝟐 • 1 output neuron in the output layer

Input Layer
Activation Functions
• “How to get from linear weighted sum input to non-linear output?”
Name Plot Function Description

1
The most common activation
Logistic (sigmoid) function. Squashes input to
0 x (0,1).

Hyperbolic tangent 1
Squashes input to (-1, 1).
(tanh) 0 x
-1
Popular activation function.
Rectified Linear Unit Anything less than 0, results
(ReLU) in zero activation.
0 x
Derivatives of these functions are also important (gradient descent).
Output Activations/Functions
• “How to output/predict a result”
Problem Description Name Function

Binary • Output probability for each class, in (0,1)

classification • Logistic regression of output of last layer Sigmoid

• Output probability for each class, in (0,1)

Multi-class
• Sum of outputs to be 1 (probability distribution)
classification • Training drives target class values up, others down Softmax

Regression Linear/ ReLU

Build and Train a Neural Network

𝒐
(𝒐𝒖𝒕 ) We build a neural network for a binary
Output Layer
𝒐
(𝒊𝒏) classification task, with:

• (no bias, for simplicity)

• 2 inputs: = 0.5 and = 0.1
(𝒐𝒖𝒕 )
𝒉𝟏 𝒉𝟐
(𝒐𝒖𝒕 )
Hidden Layer • 1 hidden layer with 2 neurons
(𝒊𝒏) (𝒊𝒏)
𝒉𝟏 𝒉𝟐 • 1 output neuron in the output layer
• All neurons have sigmoid activation function:

Input Layer
Forward Pass
(𝒐𝒖𝒕 )
𝒐 Output Layer
(𝒊𝒏)
𝒐

0.4 0.45

0 . 52 0 .53 Hidden Layer

0.1 0.13
0.25 0.2

0.15 0.4 Similarly,

0.5 0.1 Input Layer
Forward Pass

0 . 61
Output Layer
0.44

0.4 0.45

0 . 52 0 .53 Hidden Layer

0.1 0.13
0.25 0.2

0.15 0.4
For binary classification, we would
0.5 0.1 Input Layer classify this (0.5, 0.1) input data point, as
class 1 (as 0.61 > 0.5).
Cost Functions
• “How to compare the outputs with the truth?”
Problem Name Function Notes

Notations for Classification

Binary Cross entropy for • = training examples
classification logistic • = classes
• = prediction (probability)
• = true class (1/yes, 0/no)
Multi-class Cross entropy for
classification Softmax
Notations for Regression
• = training examples
Regression Mean Squared • = prediction (numeric, )
Error • = true value
Training Neural Networks
• Cost function is selected according to problem: Binary, Multi-class
Classification or Regression.
• Update network weights by applying the gradient descent method and
backpropagation. More details

• Weight update formula:

: Cost
Gradient with respect to
Dropout
• Regularization technique to prevent overfitting.
• Randomly removes some nodes with a fixed probability during the
training.

More details
Why Neural Networks?
• Automatically extract useful features
from input data.
• In recent years, deep learning has
achieved state-of-the art results in
many machine learning areas.

• Three pillars of deep learning:

 Data
 Compute
 Algorithms
Build and Train Neural Networks
• How to build and use these ML models?
• Can it be this simple?
Dive into Deep Learning

E-book on Deep Learning by Amazon Scientists, available here: https://d2l.ai

Related chapters:
Chapters 3: Linear Neural Networks: https://d2l.ai/chapter_linear-networks/index.html
Chapters 4: Multilayer Perceptrons: https://d2l.ai/chapter_multilayer-perceptrons/index.html
MXNet Hands-on
• Open source Deep Learning Library to train
and deploy neural networks.
• With the Gluon interface, we can define and
train neural networks easily.

MLA-TAB-Lecture3-MXNet.ipynb
Putting it all together: Lecture 3
• In this notebook, we continue to work with our review dataset to
predict the target field
• The notebook covers the following tasks:
 Exploratory Data Analysis
 Splitting dataset into training and test sets
 Data Balancing, categoricals encoding, text vectorization
 Train a Neural Network
 Check the performance metrics on test set

MLA-TAB-Lecture3-Neural-Networks.ipynb
AutoML
AutoML
AutoML helps automating some of the tasks related to ML model
development and training such as:
• Preprocessing and cleaning data
• Feature selection
• ML model selection
• Hyper-parameter optimization
Auto AutoML
• Open source AutoML Toolkit (AMLT) created by Amazon AI.
• Easy to Use – Built-in Application
Auto AutoML
With AutoGluon, state-of-the-art ML results can be achieved in a few
lines of Python code.
Auto AutoML
With AutoGluon, state-of-the-art ML results can be achieved in a few
lines of Python code.

MLA-TAB-Lecture3-AutoGluon.ipynb
THANK YOU

2.1 Supervised Regression
No ratings yet
2.1 Supervised Regression
26 pages
Machine Learning Guide 2017
No ratings yet
Machine Learning Guide 2017
15 pages
Linear Regression
No ratings yet
Linear Regression
130 pages
GradientDescent-Regression Slides
No ratings yet
GradientDescent-Regression Slides
26 pages
Regression
No ratings yet
Regression
56 pages
Lec1 PDF
No ratings yet
Lec1 PDF
56 pages
Forecasting and Learning Theory
No ratings yet
Forecasting and Learning Theory
46 pages
Linear Regression
No ratings yet
Linear Regression
91 pages
S1 - 25 (NSP) - ML - CS 34 - 10th17th Aug 2025
No ratings yet
S1 - 25 (NSP) - ML - CS 34 - 10th17th Aug 2025
89 pages
Linear Regression
No ratings yet
Linear Regression
15 pages
03 Linear Regression Intuition
No ratings yet
03 Linear Regression Intuition
23 pages
2EL1730 ML Lecture02 Linear and Logistic Regression
No ratings yet
2EL1730 ML Lecture02 Linear and Logistic Regression
65 pages
Machine Learning Shortnote
No ratings yet
Machine Learning Shortnote
14 pages
Module3 Ch1
No ratings yet
Module3 Ch1
83 pages
ML Week 4
No ratings yet
ML Week 4
5 pages
Linear Regression
No ratings yet
Linear Regression
38 pages
Linear Regression
No ratings yet
Linear Regression
89 pages
A Tutorial of Machine Learning
No ratings yet
A Tutorial of Machine Learning
16 pages
03 Linear Models
No ratings yet
03 Linear Models
46 pages
ML Linear Model
No ratings yet
ML Linear Model
10 pages
Linear Regression
No ratings yet
Linear Regression
20 pages
ML Lecture - 3
No ratings yet
ML Lecture - 3
47 pages
Week 04
No ratings yet
Week 04
101 pages
(Machine Learning Coursera) Lecture Note Week 1
No ratings yet
(Machine Learning Coursera) Lecture Note Week 1
8 pages
Mlfa Autumn 22 Lec 02
No ratings yet
Mlfa Autumn 22 Lec 02
24 pages
Regression
No ratings yet
Regression
16 pages
Lecture 7 - Part A - Mutli Class and Overfitting and Regularization
No ratings yet
Lecture 7 - Part A - Mutli Class and Overfitting and Regularization
43 pages
Regression Analysis
No ratings yet
Regression Analysis
54 pages
Chapter 6 Supervised Learning
No ratings yet
Chapter 6 Supervised Learning
6 pages
Unit II - Supervised Machine Learning Techniques
No ratings yet
Unit II - Supervised Machine Learning Techniques
131 pages
Lecture3 Supervised Learning I
No ratings yet
Lecture3 Supervised Learning I
84 pages
Linear Regression
No ratings yet
Linear Regression
62 pages
Bias and Variance Tradeoff:: High Bias Underfitting Low Training & Testing
No ratings yet
Bias and Variance Tradeoff:: High Bias Underfitting Low Training & Testing
12 pages
ML 1
No ratings yet
ML 1
24 pages
Intro To Machine Learning With PyTorch
No ratings yet
Intro To Machine Learning With PyTorch
48 pages
Week 1 Lecture Notes
No ratings yet
Week 1 Lecture Notes
7 pages
Machine Learning Basics for Beginners
No ratings yet
Machine Learning Basics for Beginners
53 pages
Unit 2
No ratings yet
Unit 2
35 pages
Linear Regression
No ratings yet
Linear Regression
61 pages
Essentials of Linear Regression in Python
No ratings yet
Essentials of Linear Regression in Python
23 pages
M02 Linear Regression Methods
No ratings yet
M02 Linear Regression Methods
40 pages
ML Models and When To Choose One Over Others
No ratings yet
ML Models and When To Choose One Over Others
7 pages
ML MU Unit 3RegressionTechniquespdf 2025 02-07-10!56!37
No ratings yet
ML MU Unit 3RegressionTechniquespdf 2025 02-07-10!56!37
115 pages
Lecture+Notes+-+Advanced+Regression
No ratings yet
Lecture+Notes+-+Advanced+Regression
12 pages
Linear - Regression - SGD
No ratings yet
Linear - Regression - SGD
71 pages
Linear Regression Lecture Notes
No ratings yet
Linear Regression Lecture Notes
34 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
54 pages
Introduction To Machine Learning Algorithms: Linear Regression
No ratings yet
Introduction To Machine Learning Algorithms: Linear Regression
1 page
Machine Learning Notes Cs229 1
No ratings yet
Machine Learning Notes Cs229 1
217 pages
Linear Regression Summary
No ratings yet
Linear Regression Summary
57 pages
ML: Introduction 1. What Is Machine Learning?
No ratings yet
ML: Introduction 1. What Is Machine Learning?
38 pages
CS550 Lec2
No ratings yet
CS550 Lec2
24 pages
02 - Linear Models - A
No ratings yet
02 - Linear Models - A
23 pages
Intro to Classification & Regression
No ratings yet
Intro to Classification & Regression
42 pages
Cp4252 ML Unit-II
No ratings yet
Cp4252 ML Unit-II
44 pages
ML 1 PPT Unit 1
No ratings yet
ML 1 PPT Unit 1
93 pages
Week11 - Regularization and Optimization
No ratings yet
Week11 - Regularization and Optimization
75 pages
Grade 2 Mental Maths Worksheet 2
100% (2)
Grade 2 Mental Maths Worksheet 2
2 pages
CBSE Class 10 Maths Notes
No ratings yet
CBSE Class 10 Maths Notes
7 pages
Grade 3 Mental Maths Worksheet 1 1
No ratings yet
Grade 3 Mental Maths Worksheet 1 1
2 pages
Grade 2 Mental Maths Worksheet 3
100% (1)
Grade 2 Mental Maths Worksheet 3
2 pages
Grade 4 Multiplication Practice
100% (2)
Grade 4 Multiplication Practice
2 pages
MLA TAB Lecture1
No ratings yet
MLA TAB Lecture1
81 pages
The SARIMAX Model: Full Name: Short Description
No ratings yet
The SARIMAX Model: Full Name: Short Description
7 pages
MLA TAB Lecture2
No ratings yet
MLA TAB Lecture2
84 pages
Data Set Description
No ratings yet
Data Set Description
5 pages
ML VN Unit1 1
No ratings yet
ML VN Unit1 1
27 pages
Intel Technology Journal
No ratings yet
Intel Technology Journal
14 pages
By Eesha Tur Razia Babar: 2/1/2021 Introduction To Data Mining, 2 Edition 1
No ratings yet
By Eesha Tur Razia Babar: 2/1/2021 Introduction To Data Mining, 2 Edition 1
63 pages
Lecture 11-Classification-M
No ratings yet
Lecture 11-Classification-M
33 pages
ML Practice Questions
No ratings yet
ML Practice Questions
6 pages
Applying Machine Learning Algorithms For The Classification of Sleep Disorders
No ratings yet
Applying Machine Learning Algorithms For The Classification of Sleep Disorders
12 pages
Sentiment Analysis On IMDB Movie Reviews Using Machine Learning and Deep Learning Algorithms
No ratings yet
Sentiment Analysis On IMDB Movie Reviews Using Machine Learning and Deep Learning Algorithms
6 pages
Microscope Image Processing 1st Edition Qiang Wu PDF Download
100% (1)
Microscope Image Processing 1st Edition Qiang Wu PDF Download
56 pages
Plag Check Report 2023 12 07T18 - 44 - 45
No ratings yet
Plag Check Report 2023 12 07T18 - 44 - 45
132 pages
Intro to Machine Learning Algorithms
No ratings yet
Intro to Machine Learning Algorithms
72 pages
ML Unit-1
No ratings yet
ML Unit-1
15 pages
DWDM Mid-1
No ratings yet
DWDM Mid-1
3 pages
Artificial Intelligence Applied To Stock Market Trading A Review
100% (2)
Artificial Intelligence Applied To Stock Market Trading A Review
20 pages
Ensemble Classifiers Overview
No ratings yet
Ensemble Classifiers Overview
37 pages
Determining Fake Statements Made by Public Figures by Means of Artificial Intelligence
No ratings yet
Determining Fake Statements Made by Public Figures by Means of Artificial Intelligence
8 pages
Developing and Deploying A Machine Learning Scenario For SAP HANA
No ratings yet
Developing and Deploying A Machine Learning Scenario For SAP HANA
29 pages
Chapter 1.1 Regression
No ratings yet
Chapter 1.1 Regression
47 pages
Customer Churn Prediction On E-Commerce Using Machine Learning
No ratings yet
Customer Churn Prediction On E-Commerce Using Machine Learning
8 pages
1725629890-Unit1 Machine Learning Introduction CU 3.0
No ratings yet
1725629890-Unit1 Machine Learning Introduction CU 3.0
38 pages
CS 229 - Deep Learning Cheatsheet
No ratings yet
CS 229 - Deep Learning Cheatsheet
6 pages
Machine Learning Notes: Bayesian Methods
No ratings yet
Machine Learning Notes: Bayesian Methods
12 pages
Lab 04 - Supervised ML Classification - Updated
No ratings yet
Lab 04 - Supervised ML Classification - Updated
21 pages
Lee Lewicki Ieee Tip 02
No ratings yet
Lee Lewicki Ieee Tip 02
10 pages
Artificial Intelligence
No ratings yet
Artificial Intelligence
8 pages
Lab Manual
No ratings yet
Lab Manual
19 pages
Supervised Learning
No ratings yet
Supervised Learning
9 pages
Btech-E Div Assignment1
No ratings yet
Btech-E Div Assignment1
28 pages
Post Op Weka Data Set Sample PDF
No ratings yet
Post Op Weka Data Set Sample PDF
8 pages
Ronel Arida Missinychrista - 24040124410017 - UAS QA QC
No ratings yet
Ronel Arida Missinychrista - 24040124410017 - UAS QA QC
7 pages
Data Warehousing & Mining Course
No ratings yet
Data Warehousing & Mining Course
2 pages

MLA TAB Lecture3

Uploaded by

MLA TAB Lecture3

Uploaded by

MACHINE LEARNING ACCELERATOR

Tabular Data – Lecture 3

• Introduction to ML • Feature Engineering • Optimization

• Model Evaluation • Tree-based Models • Regression Models

 Train-Validation-Test  Decision Tree • Regularization

 Overfitting  Random Forest • Boosting

• Exploratory Data Analysis • Hyperparameter Tuning • Neural Networks

• K Nearest Neighbors (KNN) • AWS AI/ML Services • AutoML

ML Model Features ML Model (Rules) ML Model Target

ML Model Features ML Model (Rules) ML Model Prediction

• As we go towards to the bottom part of the

• As we go towards to the bottom part of the

• Gradient Descent Algorithm:

large Initial Values large

* Data source: King County, WA Housing Info. For ,

Using the multiple linear regression equation:

with features , and parameters/weights

• Iteratively update parameters/weights with Gradient Descent:

Can we use a similar approach to solve classification problems?

“squishes” values to the 0 –1 range.

: true class {0, 1}, = : probability of class, and : logarithm

• As the output of Logistic Regression is between 0 and 1, Log-Loss is a

for the following scenarios:

Better prediction gives smaller Log-Loss predicted probability

with features , and parameters/weights

• Iteratively update parameters/weights with Gradient Descent:

Regularization does both: penalizes large weights,

• Calibrate regularization strength by using a regularizer parameter,

LogisticRegression: sklearn Logistic Regression (and regularization)

Weak Model Weak Model Weak Model …

Data Data Data

Weak Model Weak Model Weak Model …

Prediction large error

Prediction large error

Weak Model 1 Weak Model 2

Prediction large error

Weak Model 1 Weak Model 2

Prediction large error Prediction still large error

Weak Model 1 Weak Model 2 …

Prediction large error Prediction still large error …

Tree 1 Tree 2 Tree 3 … Tree N

Prediction 1 Prediction 2 Prediction 3 … Prediction N

Prediction 1 + Prediction 2 + Prediction 3 + … + Prediction N

GradientBoostingClassifier(n_estimators=100, learning_rate = 0.1,

The full interface is larger.

from sklearn.experimental import enable_hist_gradient_boosting

The full interface is larger.

* Basically assuming that the output depends only on

where is the linear function:

* Linear activation function

where is the logistic function:

* Non-linear activation function / binary classifier

where is the step function:

* Non-linear activation function / binary classifier

where is a nonlinear activation

* Similar to how neurons in the brain function

Question: Can we use a similar

Question: Can we use a similar

(6 weights) Neural Network/Multilayer

• (no bias, for simplicity)

Binary • Output probability for each class, in (0,1)

• Output probability for each class, in (0,1)

Regression Linear/ ReLU

• (no bias, for simplicity)

0 . 52 0 .53 Hidden Layer

0.15 0.4 Similarly,

0 . 52 0 .53 Hidden Layer

Notations for Classification

• Weight update formula:

• Three pillars of deep learning:

E-book on Deep Learning by Amazon Scientists, available here: https://d2l.ai

You might also like