0% found this document useful (0 votes)

31 views147 pages

Lecture 02

Uploaded by

james2000cheung

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

31 views147 pages

Lecture 02

Uploaded by

james2000cheung

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 147

Click to edit Master title style

ELEC 4010N - Artificial Intelligence for

Medical Image Analysis

Lecture 02: Basic Vision Models (Classification)

Feb 14, 2022

1
Click to edit Master title style
Assignments
• Assignment 1 is available now
• Submission deadline: Feb 28, 23:59:59

2
Click to edit Master title style
Assignments
Import packages

Define augmentations
transforms.Compose()

Define your datasets

Define dataloader

3
Click to edit Master title style
Assignments

Build model

Train your model Test your model

4
Click examples
Look at some to edit Master title style
Define dataset
Step 1. Preparing the Dataset (path,
Step 2. Building the Network augmentation,
Step 3. Training the Model and so on.)
Step 4. Evaluating the Model’s Performance
Randomly shuffle training data
Step 5. Continued Training from Checkpoints

Step 1. Preparing the Dataset

Total training epochs Set to False, test one by one.
Batch size in
training and test

Fix seed to make your

code reproducible

5
Examples are from https://nextjournal.com/gkoehler/pytorch-mnist
Click examples
Look at some to edit Master title style
Network initialization and setting an optimizer
Step 1. Preparing the Dataset
Network parameters
Step 2. Building the Network
Step 3. Training the Model
Step 4. Evaluating the Model’s Performance
Step 5. Continued Training from Checkpoints

Step 2. Build the Network

Do not know the

details? check Pytorch
website

6
Examples are from https://nextjournal.com/gkoehler/pytorch-mnist
Click examples
Look at some to edit Master title style
Test mode, paras are freezed.

Step 1. Preparing the Dataset Disabling gradient calculation is useful for inference
Step 2. Building the Network
Step 3. Training the Model
Step 4. Evaluating the Model’s Performance Take out the predictions
Step 5. Continued Training from Checkpoints

Step 3 & 4. Training and testing your model

Training mode, so para. can be changed.

Set the gradients to zeros because PyTorch accumulates the

gradients on subsequent backward passes
The negative log likelihood loss.
loss.backward() computes dloss/dx for every parameter x which has requires_grad=True.
Performs a single optimization step

7
Examples are from https://nextjournal.com/gkoehler/pytorch-mnist
Click examples
Look at some to edit Master title style
Step 1. Preparing the Dataset Show predictions
Step 2. Building the Network
Step 3. Training the Model
Step 4. Evaluating the Model’s Performance
Step 5. Continued Training from Checkpoints

Plot training curves

8
Examples are from https://nextjournal.com/gkoehler/pytorch-mnist
Click examples
Look at some to edit Master title style
Step 1. Preparing the Dataset
Step 2. Building the Network
Step 3. Training the Model
Step 4. Evaluating the Model’s Performance
Step 5. Continued Training from Checkpoints

Continued training from checkpoints

9
Examples are from https://nextjournal.com/gkoehler/pytorch-mnist
Outline Click to edit Master title style
• Review what we have learned last time
• Deep learning models for image classification
• Data considerations for image classification models
• Evaluating image classification models
• Case studies of CNNs for medical image classification

10
Outline Click to edit Master title style
• Review what we have learned last time
• Deep learning models for image classification
• Data considerations for image classification models
• Evaluating image classification models
• Case studies of CNNs for medical image classification

11
Click to edit Master title style
1. Last time
• Deep learning (a type of machine learning)
Traditional machine learning approaches

Input Feature Machine Output

extractor learning model
(e.g.,) (e.g., color and (e.g., support vector (e.g., presence or
texture histograms) machines and not or disease)
random forests)

Directly learns what are useful (and

Deep learning better!) features from the training data
Input
Deep Learning Model Output
(e.g.,)
(e.g., convolutional and (e.g., presence or
recurrent neural networks) not or disease) 12
Click to edit Master title style
1. Last time
• Defining a neural network and a loss function

Layer output(s)

Layer inputs
Loss functions are quantitative measures of how satisfactory the model predictions are (i.e.,
how “good” the model parameters are).
Mean square error (MSE) loss, which is standard for regression

MSE loss for a single example 𝑥 𝑖 , when the prediction is and the ground-truth is

1
MSE loss over a set of examples 𝑖 = 1, … , 𝑀 : 𝐿 = σ𝑖 𝐿𝑖 (𝑊) 13
𝑀
Click to edit Master title style
1. Last time
• Define a two-layer fully-connected neural network

Activation functions
Introduce non-linearity into the model --
allowing it to represent highly complex functions.

Dimensions of the weight matrix for each fully

connected layer is [output dim. x input dim.]

Dimensions of the bias vector is [output dim x 1]

14
Click to edit Master title style
1. Last time
Fully-connected layers: in graphical form

𝑦ො = 𝑊𝑥
Output Weights Input
(10 × 1) (10 × 3072) (3072 × 10)

15
Click to edit Master title style
1. Last time
Fully-connected layers: in graphical form

16
Click to edit Master title style
1. Last time
Convolutional layer

Input now has spatial height and width

dimensions!

In contrast to fully-connected layers,

want to preserve spatial structure when
processing with a convolutional layer

17
Click to edit Master title style
1. Last time
Convolutional layer Filters always extend the full
depth of the input volume

18
Click to edit Master title style
1. Last time
Convolutional layer

convolution kernels (network weights)

19
Click to edit Master title style
1. Last time
Convolutional layer

convolution kernels (network weights)

20
Click to edit Master title style
1. Last time
Convolutional layer
Activation map

21
Click to edit Master title style
1. Last time
Convolutional layer
Activation map

22
Click to edit Master title style
1. Last time
consider a second, green filter
Convolutional layer

23
Click to edit Master title style
1. Last time

24
Click to edit Master title style
1. Last time

25
Click to edit Master title style
1. Last time
Common settings:

W2 = (W1 –3+2)/1+1 W2 = (W1 – 5+2P)/2+1

26
Click to edit Master title style
1. Last time
Padding options: ‘valid’ does not pad, use ‘same’ to pad such
that input and output spatial dimensions are the same size

27
Click to edit Master title style
1. Last time

28
Click to edit Master title style
1. Last time

29
Click to edit Master title style
1. Last time

30
Click to edit Master title style
1. Last time

31
Click to edit Master title style
1. Last time

Common settings:
F = 2, S = 2
F = 3, S = 2

32
Click to edit Master title style
1. Last time
Validation loss
Model debugging Training loss
Healthy loss curve plateaus
Plateau may be bad Loss decreasing but slowly - -> try further learning rate
weight initialization > try higher learning rate decay at plateau point

If you further decay learning rate

too early, may look like this
-> inefficient learning vs. keeping
higher learning rate longer

33
Final metric is still improving -> keep training!
Click to edit Master title style
1. Last time
Validation loss
Early stopping: always do this Training loss

34
Click to edit Master title style
1. Last time

If trying to make network bigger (when underfitting) or smaller

(when overfitting), network depth and hidden layer size best to
adjust first. Don’t waste too much time early on fiddling with
choices that only minorly change architecture.

35
Click to edit Master title style
1. Last time

36
Click to edit Master title style
1. Last time

L2 most popular: low loss when all weights are relatively small.
More strongly penalizes large weights vs L1.
Expresses preference for simple models (need large coefficients
to fit a function to extreme outlier values).

Next: implicit regularizers that do not add an explicit

term; instead do something implicit in network to
prevent it from fitting too well to training data
37
Click to edit Master title style
1. Last time

Probability of “dropping out” each neuron

at a forward pass is hyperparameter p. 0.5
and 0.9 are common (high!).

Intuition: dropout is equivalent to training

a large ensemble of different models that
share parameters.

38
Click to edit Master title style
1. Last time

39
Click to edit Master title style
1. Last time

Intuition: batch normalization allows keeping the weights in a

healthy range. Also some randomness at training due to
different effect from each minibatch sampling -> regularization!

40
Click to edit Master title style
1. Last time

Think about the domain of your

data: what makes sense as
realistic augmentation
operations?

41
Click to edit Master title style
1. Last time

42
Click to edit Master title style
1. Last time

43
Click to edit Master title style
1. Last time

44
Click to edit Master title style
1. Last time
Useful debugging / sanity check: restrict to a
very small dataset first (e.g. 1 or 2 minibatches).
You should be able to severely overfit and drive
the loss to 0.

Common pitfall: making grid too small. Sample a

wide range of values to make sure you’ve explored
the space. (e.g. LRs from 1e0 to 1e-5.)

Aside: For LR, should sample e^x for x in Uniform [-5, 0]!

45
Click to edit Master title style
1. Last time

46
Click to edit Master title style
1. Last time
Model inference

47
Click to edit Master title style
1. Last time
Model ensembles

1. Train multiple independent models

2. At test time average their results

Enjoy 2% extra performance

48
Click to edit Master title style
1. Last time

49
Click to edit Master title style
1. Last time

50
Click to edit Master title style
1. Last time
Vanilla fully-connected neural networks
(MLPs) usually pretty shallow -- otherwise
too many parameters! ~2-3 layers.

Can have wide range in size of layers (16, 64,

256, 1000, etc.) depending on how much
data you have.

Will see different classes of neural networks

that leverage structure in data to reduce
parameters + increase network depth

Output layer - will differ for

different types of tasks (e.g.
regression). Should match
Input “Hidden” layers - will see lots of diversity with loss function.
layer in size (# neurons), type (linear,
convolutional, etc.), and activation
function (sigmoid, ReLU, etc.) 51
Click to edit Master title style
1. Last time

52
Click to edit Master title style
1. Last time

Typical in modern
CNNs and MLPs

53
Click to edit Master title style
1. Last time

Will see in recurrent

neural networks. Also
used in early MLPs and
54
CNNs.
Click to edit Master title style
1. Last time

55
Outline Click to edit Master title style
• Review what we have learned last time
• Deep learning models for image classification
• Data considerations for image classification models
• Evaluating image classification models
• Case studies of CNNs for medical image classification

56
Click tomodels
2. Deep learning edit Master title
for image style
classification

57
Click tomodels
2. Deep learning edit Master title
for image style
classification

58
Click tomodels
2. Deep learning edit Master title
for image style
classification

59
Click tomodels
2. Deep learning edit Master title
for image style
classification

60
Click tomodels
2. Deep learning edit Master title
for image style
classification

61
Click tomodels
2. Deep learning edit Master title
for image style
classification

62
Click tomodels
2. Deep learning edit Master title
for image style
classification

63
Click tomodels
2. Deep learning edit Master title
for image style
classification

64
Click tomodels
2. Deep learning edit Master title
for image style
classification

65
Click tomodels
2. Deep learning edit Master title
for image style
classification

66
Click tomodels
2. Deep learning edit Master title
for image style
classification

67
Click tomodels
2. Deep learning edit Master title
for image style
classification

68
Click tomodels
2. Deep learning edit Master title
for image style
classification

69
Click tomodels
2. Deep learning edit Master title
for image style
classification

70
Click tomodels
2. Deep learning edit Master title
for image style
classification

71
Click tomodels
2. Deep learning edit Master title
for image style
classification

72
Click tomodels
2. Deep learning edit Master title
for image style
classification

73
Click tomodels
2. Deep learning edit Master title
for image style
classification

74
Click tomodels
2. Deep learning edit Master title
for image style
classification

75
Click tomodels
2. Deep learning edit Master title
for image style
classification

76
Slide credit: BIODS 220
Click tomodels
2. Deep learning edit Master title
for image style
classification
Check website for state-of-the-art CNN architectures
More recent CNN architectures for image classification

EfficientNet rank 13th in the leaderboard.

77
Worth exploring for class projects!
Click tomodels
2. Deep learning edit Master title
for image style
classification
More on loss functions Equivalent to the negative log of the
probability of the correct ground truth class
being predicted. Think about what the
Common loss functions expression looks like when y_i = 1 vs. 0.
Minimize squared difference between
Regression prediction output and target Binary Cross-Entropy

Label is a continuous value.

Negative log of the probability of the Incurs lowest loss of 0 (what we want) if the score
true class y_i, as with the BCE loss. for the true class y_i is greater than the score for
each incorrect class j by a margin of 1

78
Slide credit: BIODS 220
Click tomodels
2. Deep learning edit Master title
for image style
classification
Common loss functions

79
Slide credit: BIODS 220
Outline Click to edit Master title style
• Review what we have learned last time
• Deep learning models for image classification
• Data considerations for image classification models
• Evaluating image classification models
• Case studies of CNNs for medical image classification

80
Click to editfor
3. Data considerations Master
imagetitle style models
classification

Training, validation, and test sets

Other splits, e.g., 60/20/20 also popular.

Balance sufficient data for training vs. informative performance
estimate on validation / testing.

81
Slide credit: BIODS 220
Click to editfor
3. Data considerations Master
imagetitle style models
classification

Maximizing training data for the final model

Once hyperparameters are selected

using the validation set, common to OK, since we can use non-test data
merge training and validation sets into however, we want during model
a larger “trainval” set to train a final development!
model using the hyperparameters.

82
Slide credit: BIODS 220
Click to editfor
3. Data considerations Master
imagetitle style models
classification

K-fold cross validation: for small datasets

Sometimes we have small labeled datasets in healthcare… in this case K-fold cross validation
(which is more computationally expensive) may be worthwhile.

Train model K times with a different fold as the validation set

Can also apply same concept
each time; then average the validation set results. Allows more
to test-time evaluation.
data to be used for each training of the model, while still using
enough data to get accurate validation result.
83
Slide credit: BIODS 220
Click to editfor
3. Data considerations Master
imagetitle style models
classification

Data preprocessing

84
Click to editfor
3. Data considerations Master
imagetitle style models
classification

Data preprocessing

85
Click to editfor
3. Data considerations Master
imagetitle style models
classification

Data preprocessing: for images

86
Click to editfor
3. Data considerations Master
imagetitle style models
classification

How much data do you need for deep learning?

Premise of deep learning uses many parameters

(e.g. millions) to fit complex functions
-> if the dataset is too small, easiest solution that
model ends up learning can be overfitting to
memorizing the labels of the training examples

ImageNet dataset consists of 1M images: 1000

classes with 1000 images each

87
Slide credit: BIODS 220
Click to editfor
3. Data considerations Master
imagetitle style models
classification

Transfer learning: amplifying training data

88
Click to editfor
3. Data considerations Master
imagetitle style models
classification

89
Click to editfor
3. Data considerations Master
imagetitle style models
classification

90
Click to editfor
3. Data considerations Master
imagetitle style models
classification

91
Click to editfor
3. Data considerations Master
imagetitle style models
classification

92
Click to editfor
3. Data considerations Master
imagetitle style models
classification

Often good idea to try this first, try fine-tuning all layers of the network 93
Click to editfor
3. Data considerations Master
imagetitle style models
classification
How much data do you need for deep learning?
Examples per class of your dataset, in addition to transfer
learning (take this with grain of salt, it really depends on
the problem):

➢ Low dozens: generally, too small to learn a meaningful

model, using standard supervised deep learning

➢ High dozens to low hundreds: may see models with

some predictive ability, unlikely to really wow or be
“superhuman” though

➢ High hundreds to thousands: “happy regime” for deep

learning

In general, deep learning is data hungry Almost always leverage transfer learning unless you have
-- the more data the better extremely different or huge (e.g., ImageNet-scale) dataset
94
Slide credit: BIODS 220
Click to editfor
3. Data considerations Master
imagetitle style models
classification
What counts as a data example?

Guidelines for amount of training data refers to # of unique instances representative of diversity
expected during testing / deployment. E.g. # of independent CT scans or surgery videos. Additional
correlated data (e.g. different slices of the same tumor or different suturing instances within the
same video) provide relatively less incremental value in comparison. 95
Slide credit: BIODS 220
Click to editfor
3. Data considerations Master
imagetitle style models
classification
Preview: advanced approaches for handling limited labeled data
Semi-supervised learning
Weakly-supervised learning
Domain adaptation

Will talk more about these in later lectures...

96
Slide credit: BIODS 220
Click to editfor
3. Data considerations Master
imagetitle style models
classification
What if there are multiple possible sources of data?
E.g., some with noisier / less accurate labels than others, from different hospital sites, etc.

✓ Expected diversity of data during deployment should be

reflected in both training and test sets
✓ Need to see these during training to learn how to handle them
✓ Need to see these during testing to accurately evaluate the model

97
Slide credit: BIODS 220
Clickimage
4. Evaluating to editclassification
Master title models
style
• Review what we have learned last time
• Deep learning models for image classification
• Data considerations for image classification models
• Evaluating image classification models
• Case studies of CNNs for medical image classification

98
Clickimage
4. Evaluating to editclassification
Master title models
style

Q: When might evaluating purely

accuracy be problematic?

A: Imbalanced datasets.

99
Figures from https://ihh300.github.io/banking/Techniques-Handling-Class-Imbalance-Credit-Card-Fraud/
Clickimage
4. Evaluating to editclassification
Master title models
style

TN FN

FP TP

Accuracy = (85285 + 118) / (85285 +

118 + 29 + 9)
=1

100
Figures from https://ihh300.github.io/banking/Techniques-Handling-Class-Imbalance-Credit-Card-Fraud/
Clickimage
4. Evaluating to editclassification
Master title models
style
We can trade-off different values of these metrics as we vary
our classifier’s score threshold to predict a positive

101
Slide credit: BIODS 220
Clickimage
4. Evaluating to editclassification
Master title models
style
Q: As prediction threshold increases, how does that generally
affect sensitivity? Specificity?

102
Slide credit: BIODS 220
Clickimage
4. Evaluating to editclassification
Master title models
style

103
Slide credit: BIODS 220
Clickimage
4. Evaluating to editclassification
Master title models
style

Precision = (TP)/ (TP+FP) =

TN FN (118)/(118+9) = 0.929

Sensitivity = (TP)/ (TP+FN) = aka Recall

(118)/(118+29) = 0.803

FP TP F1 score = 2 * (Recall * Precision)

/(Recall + Precision) = 0.861

Specificity = TN/ (TN+FP) =

(85285)/(85285+9) = 1

104
Clickimage
4. Evaluating to editclassification
Master title models
style

True Positive Rate (TPR)

False Positive Rate (FPR) 105

Slide credit: BIODS 220
Clickimage
4. Evaluating to editclassification
Master title models
style

106
Figures from https://zhuanlan.zhihu.com/p/58587448
Clickimage
4. Evaluating to editclassification
Master title models
style

107
Slide credit: BIODS 220
Clickimage
4. Evaluating to editclassification
Master title models
style
Also equal to distance above chance line for a balanced
dataset: sensitivity - (1 - specificity) = sensitivity + specificity - 1

108
Slide credit: BIODS 220
Clickimage
4. Evaluating to editclassification
Master title models
style
Also equal to distance above chance line for a balanced
dataset: sensitivity - (1 - specificity) = sensitivity + specificity - 1

But selected trade-off points

could also depend on application 109
Slide credit: BIODS 220
Outline Click to edit Master title style
• Review what we have learned last time
• Deep learning models for image classification
• Data considerations for image classification models
• Evaluating image classification models
• Case studies of CNNs for medical image classification

110
Click to edit Master title style
5. Case studies
Joint Diabetic Retinopathy (DR) and Diabetic Macular Edema (DME) Grading
Diabetic Retinopathy (DR)
• a consequence of microvascular changes
triggered by diabetes
• leading cause of blindness soft exudate Microaneurysms
Macula
Optic disc
Diabetic Macular Edema (DME)
• a complication of DR
• retinal thickening of fluid hard exudate Hemorrhage

• occur at any stage of DR Early pathological signs of DR

Grading:
• DR: the severity
• DME: shortest distance between macula and
hard exudates (0: no risk; 1: d < 1, 2: d> 1)

Li, X., Hu, X., Yu, L., Zhu, L., Fu, C.W. and Heng, P.A., 2019. CANet: cross-disease attention network for joint diabetic retinopathy and diabetic macular edema grading. IEEE 111
transactions on medical imaging, 39(5), pp.1483-1493.
Click to edit Master title style
5. Case studies
Joint Diabetic Retinopathy (DR) and Diabetic Macular Edema (DME) Grading

normal severe
DR: 0 DR: 1 DR: 2 DR: 3 DR: 4
DME: 0 DME: 0 DME: 1 DME: 2 DME: 2

Grading Clinical Importance of Grading

• DR: the severity. • DR/DME patients can receive tailored
• DME: shortest distance between macula and treatments.
hard exudates (0: no risk; 1: d < 1, 2: d> 1).

112
Click to edit Master title style
5. Case studies
Automatically learned features for DR and DME grading Multi-task learning
• the information among different
tasks is shared
• promote the performance of each
individual task
DR grading
relationship It also requires
• an understanding of each
disease
Fundus Image Neural Network • the internal relationship
between the two diseases.
DME grading

no published works for joint DR and DME grading.

[Gulshan et al. JAMA. 2016]; [Ren et al. Technology and Health Care. 2018]; [Krause et al. Ophthalmology. 2018]; [Liu et al. MICCAI 2018]
113
Click to edit Master title style
5. Case studies
Cross-disease Attention Network (CANet)
disease-specific attention block (disease-specific features) deep understanding of each disease
disease-dependent attention block (disease-dependent features) internal relationship between diseases

𝐅𝒊′ ∈ RC×H×W 𝐆𝒊 ∈ RC/r fc 𝐆𝒊′ ∈ RC/r ′

ℒ𝐷𝑅
Normal

AvgPool Mild NPDR

Moderate NPDR
Disease- Disease-
ℒ𝐷𝑅
specific fc dependent fc Moderate NPDR
attention attention
Severe NPDR
PDR

Disease- AvgPool Disease-

Normal
ℒ𝐷𝑀𝐸
𝐅 ∈ RC×H×W specific fc dependent fc Mild
attention attention
Severe ′
𝐅𝒋′ ∈ RC×H×W ℒ𝐷𝑀𝐸
𝐆𝒋 ∈ RC/r fc 𝐆𝒋′ ∈ RC/r

The overall architecture of cross-disease attention network (CANet).

114
Click to edit Master title style
5. Case studies
Cross-disease Attention Network (CANet)
disease-specific attention block (disease-specific features) deep understanding of each disease
disease-dependent attention block (disease-dependent features) internal relationship between diseases

𝐅𝒊′ ∈ RC×H×W 𝐆𝒊 ∈ RC/r fc ′

ℒ𝐷𝑅
Normal

AvgPool Mild NPDR

Moderate NPDR
Disease-
specific fc Moderate NPDR
attention
Severe NPDR
PDR

Disease- AvgPool Normal

𝐅 ∈ RC×H×W specific fc Mild
attention
Severe ′
𝐅𝒋′ ∈ RC×H×W ℒ𝐷𝑀𝐸
𝐆𝒋 ∈ RC/r fc

The overall architecture of cross-disease attention network (CANet).

115
Click to edit Master title style
5. Case studies
Cross-disease Attention Network (CANet)
disease-specific attention block (disease-specific features) deep understanding of each disease
disease-dependent attention block (disease-dependent features) internal relationship between diseases

𝐅𝒊′ ∈ RC×H×W 𝐆𝒊 ∈ RC/r fc 𝐆𝒊′ ∈ RC/r ′

ℒ𝐷𝑅
Normal

AvgPool Mild NPDR

Moderate NPDR
Disease- Disease-
ℒ𝐷𝑅
specific fc dependent fc Moderate NPDR
attention attention
Severe NPDR
PDR

Disease- AvgPool Disease-

Normal
ℒ𝐷𝑀𝐸
𝐅 ∈ RC×H×W specific fc dependent fc Mild
attention attention
Severe ′
𝐅𝒋′ ∈ RC×H×W ℒ𝐷𝑀𝐸
𝐆𝒋 ∈ RC/r fc 𝐆𝒋′ ∈ RC/r

The overall architecture of cross-disease attention network (CANet).

116
Click to edit Master title style
5. Case studies
Cross-disease Attention Network (CANet)
Channel-wise attention
𝑠
𝑐 𝐅𝑖,𝑎𝑣𝑔 ∈ RH×W
𝐅 ∈ RC×H×W 𝐅𝑎𝑣𝑔 ∈ RC 𝐅𝒊 ∈ RC×H×W 𝐀𝑠 ∈ RH×W 𝐅𝒊′ ∈ RC×H×W
𝐀 c ∈ RC
Sigmoid Conv
fc fc
Sigmoid

𝑐
𝐅𝑚𝑎𝑥 ∈ RC MLP 𝑠
𝐅𝑖,𝑚𝑎𝑥 ∈ RH×W

(a) Disease-specific attention module.

𝐆𝒊 ∈ RC/r
Sigmoid
fc fc
𝐀c
MLP

𝐆𝒋 ∈ RC/r 𝐆𝒋′ ∈ RC/r

(b) Disease-dependent attention module.

117
Click to edit Master title style
5. Case studies
Cross-disease Attention Network (CANet)
Spatial-wise attention
𝑠
𝑐 𝐅𝑖,𝑎𝑣𝑔 ∈ RH×W
𝐅 ∈ RC×H×W 𝐅𝑎𝑣𝑔 ∈ RC 𝐅𝒊 ∈ RC×H×W 𝐀𝑠 ∈ RH×W 𝐅𝒊′ ∈ RC×H×W
𝐀 c ∈ RC
Sigmoid Conv
fc fc
Sigmoid

𝑐
𝐅𝑚𝑎𝑥 ∈ RC MLP 𝑠
𝐅𝑖,𝑚𝑎𝑥 ∈ RH×W

(a) Disease-specific attention module.

𝐆𝒊 ∈ RC/r
Sigmoid
fc fc
𝐀c
MLP

𝐆𝒋 ∈ RC/r 𝐆𝒋′ ∈ RC/r

(b) Disease-dependent attention module.

118
Click to edit Master title style
5. Case studies
Cross-disease Attention Network (CANet)

𝑠
𝑐 𝐅𝑖,𝑎𝑣𝑔 ∈ RH×W
𝐅 ∈ RC×H×W 𝐅𝑎𝑣𝑔 ∈ RC 𝐅𝒊 ∈ RC×H×W 𝐀𝑠 ∈ RH×W 𝐅𝒊′ ∈ RC×H×W
𝐀 c ∈ RC
Sigmoid Conv
fc fc
Sigmoid

𝑐
𝐅𝑚𝑎𝑥 ∈ RC MLP 𝑠
𝐅𝑖,𝑚𝑎𝑥 ∈ RH×W

(a) Disease-specific attention module.

𝐆𝒊 ∈ RC/r
Sigmoid
fc fc
𝐀c
MLP
Channel-wise attention
𝐆𝒋 ∈ RC/r 𝐆𝒋′ ∈ RC/r

(b) Disease-dependent attention module.

119
Click to edit Master title style
5. Case studies
Cross-disease Attention Network (CANet)

𝐅𝒊′ ∈ RC×H×W 𝐆𝒊 ∈ RC/r fc 𝐆𝒊′ ∈ RC/r ′

ℒ𝐷𝑅
Normal

AvgPool Mild NPDR

Moderate NPDR
Disease- Disease-
ℒ𝐷𝑅
specific fc dependent fc Moderate NPDR
attention attention
Severe NPDR
PDR

Disease- AvgPool Disease-

Normal
ℒ𝐷𝑀𝐸
𝐅∈ RC×H×W specific fc dependent fc Mild
attention attention
Severe ′
𝐅𝒋′ ∈ RC×H×W ℒ𝐷𝑀𝐸
𝐆𝒋 ∈ RC/r fc 𝐆𝒋′ ∈ RC/r

The overall architecture of cross-disease attention network (CANet).

Loss function
Weighting factor
120
Click to edit Master title style
5. Case studies
Joint DR and DME grading results on the public Messidor dataset.

• Joint training outperforms individual training

121
Click to edit Master title style
5. Case studies
Joint DR and DME grading results on the public Messidor dataset.

• Our method outperforms Joint training under

the same level of model parameters
(effectiveness of our network design)

122
Click to edit Master title style
5. Case studies
Joint DR and DME grading results on the public Messidor dataset.

• Both disease-specific and disease-dependent

attentions can contribute to the performance

123
Click to edit Master title style
5. Case studies
Joint DR and DME grading results on the public Messidor dataset.

• Our method achieves best when 𝜆 = 0.25

124
Click to edit Master title style
5. Case studies
Comparisons with other multi-task learning methods.

Comparisons with state-of-the-art methods on the Messidor dataset.

• 2% higher than other

multi-task learning
methods.

• Clearly outperforms than

others.

125
Click to edit Master title style
5. Case studies
Results on the IDRiD challenge leaderboard. Ablation Study on the IDRiD challenge leaderboard.

• Both disease-specific and disease-

Results are from the IDRiD 2018 challenge website. dependent attentions are useful.

• Results keep consistent with those in

• Clearly outperforms than other results. the Messidor dataset.

126
Click to edit Master title style
5. Case studies
Joint DR and DME grading results on fundus photography on the Messidior dataset.

DR 3: 0.00 0.00 0.06 0.73 0.20 DR 2: 0.00 0.00 0.92 0.07 0.00 DR 0: 0.69 0.23 0.07 0.00 0.00 DR 2: 0.00 0.00 0.99 0.01 0.00
DME 2: 0.00 0.00 0.99 DME 2: 0.00 0.00 1.00 DME 0: 0.85 0.10 0.04 DME 2: 0.00 0.08 0.92

DR 2: 0.07 0.00 0.86 0.00 0.07 DR 3: 0.00 0.00 0.00 0.64 0.35 DR 4: 0.00 0.00 0.00 0.05 0.95 DR 2: 0.00 0.00 0.99 0.00 0.00
DME 1: 0.05 0.95 0.00 DME 2: 0.00 0.00 0.99 DME 2: 0.00 0.10 0.90 DME 2: 0.00 0.14 0.86

Ground-truth The output probabilities for DR, belonging

The output probabilities for DME,
provided by doctors to grade 0,1,2,3,4 respectively. 127
belonging to grade 0,1,2 respectively.
Click to edit Master title style
5. Case studies

Q: How might we approach

this problem?

128
Slide credit: BIODS 220
Click to edit Master title style
5. Case studies

129
Slide credit: BIODS 220
Click to edit Master title style
5. Case studies

130
Slide credit: BIODS 220
Click to edit Master title style
5. Case studies

131
Slide credit: BIODS 220
Click to edit Master title style
5. Case studies

132
Slide credit: BIODS 220
Click to edit Master title style
5. Case studies

133
Slide credit: BIODS 220
Click to edit Master title style
5. Case studies

134
Slide credit: BIODS 220
Click to edit Master title style
5. Case studies

135
Slide credit: BIODS 220
Click to edit Master title style
5. Case studies

136
Slide credit: BIODS 220
Click to edit Master title style
5. Case studies

137
Slide credit: BIODS 220
Click to edit Master title style
5. Case studies
Q: What could explain the difference in trends for reducing #
grades / image on training set vs. tuning set, on tuning set
performance?

138
Slide credit: BIODS 220
Click to edit Master title style
5. Case studies

139
Slide credit: BIODS 220
Click to edit Master title style
5. Case studies

140
Slide credit: BIODS 220
Click to edit Master title style
5. Case studies

141
Slide credit: BIODS 220
Click to edit Master title style
5. Case studies

All training images were resized to 256x256 and underwent base data
augmentation of random 227x227 cropping and mirror images. Additional
data augmentation experiments in results table.

142
Slide credit: BIODS 220
Click to edit Master title style
5. Case studies

All training images were resized to 256x256 and underwent base data Often resize to match input size of pre-trained
augmentation of random 227x227 cropping and mirror images. Additional networks. Also fine approach to making high-
data augmentation experiments in results table. res dataset easier to work with!

143
Slide credit: BIODS 220
Click to edit Master title style
5. Case studies
Performed further analysis at optimal
threshold determined by the Youden
Index.

144
Slide credit: BIODS 220
Click to edit Master title style
5. Case studies

145
Slide credit: BIODS 220
Click to edit Master title style
5. Case studies

146
Slide credit: BIODS 220
SummaryClick to edit Master title style
Today we saw:
• Deep learning models for image classification
• Data considerations for image classification models
• Evaluating image classification models
• Case studies

Next time: Advanced Vision Models (Detection and

Segmentation)

147

Computer Vision NN Architecture
No ratings yet
Computer Vision NN Architecture
19 pages
Machine Learning (ML) :: Aim: Analysis and Implementation of Deep Neural Network. Definitions
No ratings yet
Machine Learning (ML) :: Aim: Analysis and Implementation of Deep Neural Network. Definitions
6 pages
Train Your Image Classifier Model With PyTorch
No ratings yet
Train Your Image Classifier Model With PyTorch
6 pages
A Recipe For Training Neural Networks
No ratings yet
A Recipe For Training Neural Networks
15 pages
Optimization of Deep Networks
No ratings yet
Optimization of Deep Networks
84 pages
Gradient-Based Learning & Neural Networks
No ratings yet
Gradient-Based Learning & Neural Networks
72 pages
Domnic Object Detecion Basics
No ratings yet
Domnic Object Detecion Basics
62 pages
PyTorch For Deep Learning Zero To Mastery
No ratings yet
PyTorch For Deep Learning Zero To Mastery
6 pages
Deep Learning UNIT-II Part1
No ratings yet
Deep Learning UNIT-II Part1
48 pages
Deep Learning with Keras Basics
No ratings yet
Deep Learning with Keras Basics
58 pages
CNNs & Computer Vision with PyTorch
No ratings yet
CNNs & Computer Vision with PyTorch
29 pages
04 Pytorch Custom Datasets
No ratings yet
04 Pytorch Custom Datasets
17 pages
Tutorial 4
No ratings yet
Tutorial 4
6 pages
PDF Hyperparameter Tuning Batch Normalization
No ratings yet
PDF Hyperparameter Tuning Batch Normalization
11 pages
Lecture 14 Introduction To Pytorch
No ratings yet
Lecture 14 Introduction To Pytorch
45 pages
6-DeepVisualLearning L6
No ratings yet
6-DeepVisualLearning L6
82 pages
CNN13 7 25
No ratings yet
CNN13 7 25
175 pages
CNN13 7 25
No ratings yet
CNN13 7 25
175 pages
Fixing Neural Network Course 2 1659759284
No ratings yet
Fixing Neural Network Course 2 1659759284
30 pages
465-Lecture 10-11
No ratings yet
465-Lecture 10-11
79 pages
CS4442 - CS9542 - Part 2 - Lecture 5 - DNN - Intro
No ratings yet
CS4442 - CS9542 - Part 2 - Lecture 5 - DNN - Intro
113 pages
Deep Neural Network
No ratings yet
Deep Neural Network
60 pages
PyTorch Workflow Fundamentals - Zero To Mastery Learn PyTorch For Deep Learning
No ratings yet
PyTorch Workflow Fundamentals - Zero To Mastery Learn PyTorch For Deep Learning
43 pages
Neural Network Classification With
No ratings yet
Neural Network Classification With
25 pages
Designing Your Neural Networks - Towards Data Science
No ratings yet
Designing Your Neural Networks - Towards Data Science
15 pages
Lecture 10 Merged
No ratings yet
Lecture 10 Merged
14 pages
CNN Guide for Machine Learning Students
No ratings yet
CNN Guide for Machine Learning Students
37 pages
Unit-2 Improving-Deep-Neural-Networks
No ratings yet
Unit-2 Improving-Deep-Neural-Networks
18 pages
DeepLearning Unit-II
No ratings yet
DeepLearning Unit-II
70 pages
Deep Learning & Neural Networks Guide
No ratings yet
Deep Learning & Neural Networks Guide
87 pages
Creating and Training Custom Layers in TensorFlow 2 - by Arjun Sarkar - Towards Data Science
No ratings yet
Creating and Training Custom Layers in TensorFlow 2 - by Arjun Sarkar - Towards Data Science
11 pages
CNN Training Aspects Presentation
No ratings yet
CNN Training Aspects Presentation
26 pages
Gen Ai Mynotes
No ratings yet
Gen Ai Mynotes
12 pages
ImageNet Classification With Deep Convolutional Convolutional Neural Networks PDF
No ratings yet
ImageNet Classification With Deep Convolutional Convolutional Neural Networks PDF
37 pages
L7 Lecture Image - classification.DNN v4
No ratings yet
L7 Lecture Image - classification.DNN v4
61 pages
Convolutional Neural Network
No ratings yet
Convolutional Neural Network
37 pages
CNN13 7 25
No ratings yet
CNN13 7 25
175 pages
Pytorch Tutorial: Narges Honarvar Nazari January 30
No ratings yet
Pytorch Tutorial: Narges Honarvar Nazari January 30
29 pages
01 Neural Network Regression With Tensorflow
No ratings yet
01 Neural Network Regression With Tensorflow
20 pages
Video 4 - Introduction To Neural Networks
No ratings yet
Video 4 - Introduction To Neural Networks
18 pages
NN From Scratch
No ratings yet
NN From Scratch
5 pages
CNN 02 Batch Normalization
No ratings yet
CNN 02 Batch Normalization
19 pages
CV - T3 - Unit-7
No ratings yet
CV - T3 - Unit-7
36 pages
Introduction To Convolutional Neural Network (CNN) Using Tensorflow - by Govinda Dumane - Towards Data Science
No ratings yet
Introduction To Convolutional Neural Network (CNN) Using Tensorflow - by Govinda Dumane - Towards Data Science
17 pages
3 - DeepLearning - and - CNN v3
No ratings yet
3 - DeepLearning - and - CNN v3
50 pages
6 Batchnorm
No ratings yet
6 Batchnorm
30 pages
18 DL Regularization
No ratings yet
18 DL Regularization
41 pages
CV 2025 Spring 16
No ratings yet
CV 2025 Spring 16
53 pages
Convolutional Neural Network - Towards Data Science PDF
No ratings yet
Convolutional Neural Network - Towards Data Science PDF
10 pages
Fundamentals of Deep Learning
No ratings yet
Fundamentals of Deep Learning
195 pages
Deep Learning Cheatsheet Guide
No ratings yet
Deep Learning Cheatsheet Guide
14 pages
Week2 Lecture1 2
No ratings yet
Week2 Lecture1 2
113 pages
03 Convolution Neural Networks and Computer Vision With Tensorflow
No ratings yet
03 Convolution Neural Networks and Computer Vision With Tensorflow
21 pages
Chapter 2 - 4 Important Techniques
No ratings yet
Chapter 2 - 4 Important Techniques
34 pages
2 Deep Neural Network - 241120 - 095158
No ratings yet
2 Deep Neural Network - 241120 - 095158
47 pages
Chapter 3
No ratings yet
Chapter 3
26 pages
Fundamentals of Deep Learning
No ratings yet
Fundamentals of Deep Learning
26 pages
Lec14 CNNRNNModels
No ratings yet
Lec14 CNNRNNModels
64 pages
Applications of AI in Banking
No ratings yet
Applications of AI in Banking
20 pages
Ansys Sim Ai Ebook
No ratings yet
Ansys Sim Ai Ebook
17 pages
Complete Placement Preparation Masterclass Brochure
No ratings yet
Complete Placement Preparation Masterclass Brochure
88 pages
Cover Letter
No ratings yet
Cover Letter
1 page
Student's Academic Journey
No ratings yet
Student's Academic Journey
9 pages
Literature Review: Background of AI in Decision Making
No ratings yet
Literature Review: Background of AI in Decision Making
25 pages
AI Fundamentals Midterm Exam - Attempt Review
No ratings yet
AI Fundamentals Midterm Exam - Attempt Review
17 pages
Stress Detection Using Deep Neural Networks
No ratings yet
Stress Detection Using Deep Neural Networks
11 pages
Python & Gen AI Data Science Course
No ratings yet
Python & Gen AI Data Science Course
9 pages
Module8. Why Does The Future Not Need Us.v3
No ratings yet
Module8. Why Does The Future Not Need Us.v3
4 pages
Interpretable ML in Genomics
No ratings yet
Interpretable ML in Genomics
30 pages
ML Assignment 2
No ratings yet
ML Assignment 2
3 pages
Tuyls Et Al. - 2020 - Game Plan What AI Can Do For Football, and What F
No ratings yet
Tuyls Et Al. - 2020 - Game Plan What AI Can Do For Football, and What F
41 pages
MLT UNIT-2 Notes
No ratings yet
MLT UNIT-2 Notes
16 pages
Complete
No ratings yet
Complete
27 pages
Deep Learning
No ratings yet
Deep Learning
20 pages
Cyber Threat Detection Based On Artificial Neural Networks
No ratings yet
Cyber Threat Detection Based On Artificial Neural Networks
5 pages
Heart Attack Prediction with ML
No ratings yet
Heart Attack Prediction with ML
17 pages
Code Smell Severity Detection Using ML
No ratings yet
Code Smell Severity Detection Using ML
18 pages
Generative Ai Automation
No ratings yet
Generative Ai Automation
24 pages
Major Project
No ratings yet
Major Project
8 pages
Question Paper Soft Computing Mid Term April 2022
No ratings yet
Question Paper Soft Computing Mid Term April 2022
1 page
NLP Interview Questions 2025
No ratings yet
NLP Interview Questions 2025
4 pages
L5 Slides
No ratings yet
L5 Slides
23 pages
Teresa Scassa - Administrative Law and The Governance of Automated Decision-Making - Canada
No ratings yet
Teresa Scassa - Administrative Law and The Governance of Automated Decision-Making - Canada
29 pages
Hands On Machine Learning With Scikit Learn and TensorFlow Concepts Tools and Techniques To Build Intelligent Systems 1st Edition by Aurelien Geron ISBN 1491962291 9781491962299pdf Download
100% (10)
Hands On Machine Learning With Scikit Learn and TensorFlow Concepts Tools and Techniques To Build Intelligent Systems 1st Edition by Aurelien Geron ISBN 1491962291 9781491962299pdf Download
83 pages
Pranav Dharmarajan E: Personal Information Profile
No ratings yet
Pranav Dharmarajan E: Personal Information Profile
1 page
Handwriting Recognition Advances
No ratings yet
Handwriting Recognition Advances
6 pages
Engineering Student with ML & Data Science Skills
No ratings yet
Engineering Student with ML & Data Science Skills
2 pages
ML QP 1 CT-1
No ratings yet
ML QP 1 CT-1
2 pages