0% found this document useful (0 votes)
31 views147 pages

Lecture 02

Lecture 02

Uploaded by

james2000cheung
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views147 pages

Lecture 02

Lecture 02

Uploaded by

james2000cheung
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 147

Click to edit Master title style

ELEC 4010N - Artificial Intelligence for


Medical Image Analysis

Lecture 02: Basic Vision Models (Classification)

Feb 14, 2022


1
Click to edit Master title style
Assignments
• Assignment 1 is available now
• Submission deadline: Feb 28, 23:59:59

2
Click to edit Master title style
Assignments
Import packages

Define augmentations
transforms.Compose()

Define your datasets

Define dataloader

3
Click to edit Master title style
Assignments

Build model

Train your model Test your model


4
Click examples
Look at some to edit Master title style
Define dataset
Step 1. Preparing the Dataset (path,
Step 2. Building the Network augmentation,
Step 3. Training the Model and so on.)
Step 4. Evaluating the Model’s Performance
Randomly shuffle training data
Step 5. Continued Training from Checkpoints

Step 1. Preparing the Dataset


Total training epochs Set to False, test one by one.
Batch size in
training and test

Fix seed to make your


code reproducible

5
Examples are from https://nextjournal.com/gkoehler/pytorch-mnist
Click examples
Look at some to edit Master title style
Network initialization and setting an optimizer
Step 1. Preparing the Dataset
Network parameters
Step 2. Building the Network
Step 3. Training the Model
Step 4. Evaluating the Model’s Performance
Step 5. Continued Training from Checkpoints

Step 2. Build the Network

Do not know the


details? check Pytorch
website

6
Examples are from https://nextjournal.com/gkoehler/pytorch-mnist
Click examples
Look at some to edit Master title style
Test mode, paras are freezed.

Step 1. Preparing the Dataset Disabling gradient calculation is useful for inference
Step 2. Building the Network
Step 3. Training the Model
Step 4. Evaluating the Model’s Performance Take out the predictions
Step 5. Continued Training from Checkpoints

Step 3 & 4. Training and testing your model

Training mode, so para. can be changed.

Set the gradients to zeros because PyTorch accumulates the


gradients on subsequent backward passes
The negative log likelihood loss.
loss.backward() computes dloss/dx for every parameter x which has requires_grad=True.
Performs a single optimization step

7
Examples are from https://nextjournal.com/gkoehler/pytorch-mnist
Click examples
Look at some to edit Master title style
Step 1. Preparing the Dataset Show predictions
Step 2. Building the Network
Step 3. Training the Model
Step 4. Evaluating the Model’s Performance
Step 5. Continued Training from Checkpoints

Plot training curves

8
Examples are from https://nextjournal.com/gkoehler/pytorch-mnist
Click examples
Look at some to edit Master title style
Step 1. Preparing the Dataset
Step 2. Building the Network
Step 3. Training the Model
Step 4. Evaluating the Model’s Performance
Step 5. Continued Training from Checkpoints

Continued training from checkpoints

9
Examples are from https://nextjournal.com/gkoehler/pytorch-mnist
Outline Click to edit Master title style
• Review what we have learned last time
• Deep learning models for image classification
• Data considerations for image classification models
• Evaluating image classification models
• Case studies of CNNs for medical image classification

10
Outline Click to edit Master title style
• Review what we have learned last time
• Deep learning models for image classification
• Data considerations for image classification models
• Evaluating image classification models
• Case studies of CNNs for medical image classification

11
Click to edit Master title style
1. Last time
• Deep learning (a type of machine learning)
Traditional machine learning approaches

Input Feature Machine Output


extractor learning model
(e.g.,) (e.g., color and (e.g., support vector (e.g., presence or
texture histograms) machines and not or disease)
random forests)

Directly learns what are useful (and


Deep learning better!) features from the training data
Input
Deep Learning Model Output
(e.g.,)
(e.g., convolutional and (e.g., presence or
recurrent neural networks) not or disease) 12
Click to edit Master title style
1. Last time
• Defining a neural network and a loss function

Layer output(s)

Layer inputs
Loss functions are quantitative measures of how satisfactory the model predictions are (i.e.,
how “good” the model parameters are).
Mean square error (MSE) loss, which is standard for regression

MSE loss for a single example 𝑥 𝑖 , when the prediction is and the ground-truth is

1
MSE loss over a set of examples 𝑖 = 1, … , 𝑀 : 𝐿 = σ𝑖 𝐿𝑖 (𝑊) 13
𝑀
Click to edit Master title style
1. Last time
• Define a two-layer fully-connected neural network

Activation functions
Introduce non-linearity into the model --
allowing it to represent highly complex functions.

Dimensions of the weight matrix for each fully


connected layer is [output dim. x input dim.]

Dimensions of the bias vector is [output dim x 1]


14
Click to edit Master title style
1. Last time
Fully-connected layers: in graphical form

𝑦ො = 𝑊𝑥
Output Weights Input
(10 × 1) (10 × 3072) (3072 × 10)

15
Click to edit Master title style
1. Last time
Fully-connected layers: in graphical form

16
Click to edit Master title style
1. Last time
Convolutional layer

Input now has spatial height and width


dimensions!

In contrast to fully-connected layers,


want to preserve spatial structure when
processing with a convolutional layer

17
Click to edit Master title style
1. Last time
Convolutional layer Filters always extend the full
depth of the input volume

18
Click to edit Master title style
1. Last time
Convolutional layer

convolution kernels (network weights)


19
Click to edit Master title style
1. Last time
Convolutional layer

convolution kernels (network weights)


20
Click to edit Master title style
1. Last time
Convolutional layer
Activation map

21
Click to edit Master title style
1. Last time
Convolutional layer
Activation map

22
Click to edit Master title style
1. Last time
consider a second, green filter
Convolutional layer

23
Click to edit Master title style
1. Last time

24
Click to edit Master title style
1. Last time

25
Click to edit Master title style
1. Last time
Common settings:

W2 = (W1 –3+2)/1+1 W2 = (W1 – 5+2P)/2+1

26
Click to edit Master title style
1. Last time
Padding options: ‘valid’ does not pad, use ‘same’ to pad such
that input and output spatial dimensions are the same size

27
Click to edit Master title style
1. Last time

28
Click to edit Master title style
1. Last time

29
Click to edit Master title style
1. Last time

30
Click to edit Master title style
1. Last time

31
Click to edit Master title style
1. Last time

Common settings:
F = 2, S = 2
F = 3, S = 2

32
Click to edit Master title style
1. Last time
Validation loss
Model debugging Training loss
Healthy loss curve plateaus
Plateau may be bad Loss decreasing but slowly - -> try further learning rate
weight initialization > try higher learning rate decay at plateau point

If you further decay learning rate


too early, may look like this
-> inefficient learning vs. keeping
higher learning rate longer

33
Final metric is still improving -> keep training!
Click to edit Master title style
1. Last time
Validation loss
Early stopping: always do this Training loss

34
Click to edit Master title style
1. Last time

If trying to make network bigger (when underfitting) or smaller


(when overfitting), network depth and hidden layer size best to
adjust first. Don’t waste too much time early on fiddling with
choices that only minorly change architecture.

35
Click to edit Master title style
1. Last time

36
Click to edit Master title style
1. Last time

L2 most popular: low loss when all weights are relatively small.
More strongly penalizes large weights vs L1.
Expresses preference for simple models (need large coefficients
to fit a function to extreme outlier values).

Next: implicit regularizers that do not add an explicit


term; instead do something implicit in network to
prevent it from fitting too well to training data
37
Click to edit Master title style
1. Last time

Probability of “dropping out” each neuron


at a forward pass is hyperparameter p. 0.5
and 0.9 are common (high!).

Intuition: dropout is equivalent to training


a large ensemble of different models that
share parameters.

38
Click to edit Master title style
1. Last time

39
Click to edit Master title style
1. Last time

Intuition: batch normalization allows keeping the weights in a


healthy range. Also some randomness at training due to
different effect from each minibatch sampling -> regularization!

40
Click to edit Master title style
1. Last time

Think about the domain of your


data: what makes sense as
realistic augmentation
operations?

41
Click to edit Master title style
1. Last time

42
Click to edit Master title style
1. Last time

43
Click to edit Master title style
1. Last time

44
Click to edit Master title style
1. Last time
Useful debugging / sanity check: restrict to a
very small dataset first (e.g. 1 or 2 minibatches).
You should be able to severely overfit and drive
the loss to 0.

Common pitfall: making grid too small. Sample a


wide range of values to make sure you’ve explored
the space. (e.g. LRs from 1e0 to 1e-5.)

Aside: For LR, should sample e^x for x in Uniform [-5, 0]!

45
Click to edit Master title style
1. Last time

46
Click to edit Master title style
1. Last time
Model inference

47
Click to edit Master title style
1. Last time
Model ensembles

1. Train multiple independent models


2. At test time average their results

Enjoy 2% extra performance

48
Click to edit Master title style
1. Last time

49
Click to edit Master title style
1. Last time

50
Click to edit Master title style
1. Last time
Vanilla fully-connected neural networks
(MLPs) usually pretty shallow -- otherwise
too many parameters! ~2-3 layers.

Can have wide range in size of layers (16, 64,


256, 1000, etc.) depending on how much
data you have.

Will see different classes of neural networks


that leverage structure in data to reduce
parameters + increase network depth

Output layer - will differ for


different types of tasks (e.g.
regression). Should match
Input “Hidden” layers - will see lots of diversity with loss function.
layer in size (# neurons), type (linear,
convolutional, etc.), and activation
function (sigmoid, ReLU, etc.) 51
Click to edit Master title style
1. Last time

52
Click to edit Master title style
1. Last time

Typical in modern
CNNs and MLPs

53
Click to edit Master title style
1. Last time

Will see in recurrent


neural networks. Also
used in early MLPs and
54
CNNs.
Click to edit Master title style
1. Last time

55
Outline Click to edit Master title style
• Review what we have learned last time
• Deep learning models for image classification
• Data considerations for image classification models
• Evaluating image classification models
• Case studies of CNNs for medical image classification

56
Click tomodels
2. Deep learning edit Master title
for image style
classification

57
Click tomodels
2. Deep learning edit Master title
for image style
classification

58
Click tomodels
2. Deep learning edit Master title
for image style
classification

59
Click tomodels
2. Deep learning edit Master title
for image style
classification

60
Click tomodels
2. Deep learning edit Master title
for image style
classification

61
Click tomodels
2. Deep learning edit Master title
for image style
classification

62
Click tomodels
2. Deep learning edit Master title
for image style
classification

63
Click tomodels
2. Deep learning edit Master title
for image style
classification

64
Click tomodels
2. Deep learning edit Master title
for image style
classification

65
Click tomodels
2. Deep learning edit Master title
for image style
classification

66
Click tomodels
2. Deep learning edit Master title
for image style
classification

67
Click tomodels
2. Deep learning edit Master title
for image style
classification

68
Click tomodels
2. Deep learning edit Master title
for image style
classification

69
Click tomodels
2. Deep learning edit Master title
for image style
classification

70
Click tomodels
2. Deep learning edit Master title
for image style
classification

71
Click tomodels
2. Deep learning edit Master title
for image style
classification

72
Click tomodels
2. Deep learning edit Master title
for image style
classification

73
Click tomodels
2. Deep learning edit Master title
for image style
classification

74
Click tomodels
2. Deep learning edit Master title
for image style
classification

75
Click tomodels
2. Deep learning edit Master title
for image style
classification

76
Slide credit: BIODS 220
Click tomodels
2. Deep learning edit Master title
for image style
classification
Check website for state-of-the-art CNN architectures
More recent CNN architectures for image classification

EfficientNet rank 13th in the leaderboard.

77
Worth exploring for class projects!
Click tomodels
2. Deep learning edit Master title
for image style
classification
More on loss functions Equivalent to the negative log of the
probability of the correct ground truth class
being predicted. Think about what the
Common loss functions expression looks like when y_i = 1 vs. 0.
Minimize squared difference between
Regression prediction output and target Binary Cross-Entropy

Label is a continuous value.


Negative log of the probability of the Incurs lowest loss of 0 (what we want) if the score
true class y_i, as with the BCE loss. for the true class y_i is greater than the score for
each incorrect class j by a margin of 1

78
Slide credit: BIODS 220
Click tomodels
2. Deep learning edit Master title
for image style
classification
Common loss functions

79
Slide credit: BIODS 220
Outline Click to edit Master title style
• Review what we have learned last time
• Deep learning models for image classification
• Data considerations for image classification models
• Evaluating image classification models
• Case studies of CNNs for medical image classification

80
Click to editfor
3. Data considerations Master
imagetitle style models
classification

Training, validation, and test sets

Other splits, e.g., 60/20/20 also popular.


Balance sufficient data for training vs. informative performance
estimate on validation / testing.

81
Slide credit: BIODS 220
Click to editfor
3. Data considerations Master
imagetitle style models
classification

Maximizing training data for the final model

Once hyperparameters are selected


using the validation set, common to OK, since we can use non-test data
merge training and validation sets into however, we want during model
a larger “trainval” set to train a final development!
model using the hyperparameters.

82
Slide credit: BIODS 220
Click to editfor
3. Data considerations Master
imagetitle style models
classification

K-fold cross validation: for small datasets


Sometimes we have small labeled datasets in healthcare… in this case K-fold cross validation
(which is more computationally expensive) may be worthwhile.

Train model K times with a different fold as the validation set


Can also apply same concept
each time; then average the validation set results. Allows more
to test-time evaluation.
data to be used for each training of the model, while still using
enough data to get accurate validation result.
83
Slide credit: BIODS 220
Click to editfor
3. Data considerations Master
imagetitle style models
classification

Data preprocessing

84
Click to editfor
3. Data considerations Master
imagetitle style models
classification

Data preprocessing

85
Click to editfor
3. Data considerations Master
imagetitle style models
classification

Data preprocessing: for images

86
Click to editfor
3. Data considerations Master
imagetitle style models
classification

How much data do you need for deep learning?

Premise of deep learning uses many parameters


(e.g. millions) to fit complex functions
-> if the dataset is too small, easiest solution that
model ends up learning can be overfitting to
memorizing the labels of the training examples

ImageNet dataset consists of 1M images: 1000


classes with 1000 images each

87
Slide credit: BIODS 220
Click to editfor
3. Data considerations Master
imagetitle style models
classification

Transfer learning: amplifying training data

88
Click to editfor
3. Data considerations Master
imagetitle style models
classification

89
Click to editfor
3. Data considerations Master
imagetitle style models
classification

90
Click to editfor
3. Data considerations Master
imagetitle style models
classification

91
Click to editfor
3. Data considerations Master
imagetitle style models
classification

92
Click to editfor
3. Data considerations Master
imagetitle style models
classification

Often good idea to try this first, try fine-tuning all layers of the network 93
Click to editfor
3. Data considerations Master
imagetitle style models
classification
How much data do you need for deep learning?
Examples per class of your dataset, in addition to transfer
learning (take this with grain of salt, it really depends on
the problem):

➢ Low dozens: generally, too small to learn a meaningful


model, using standard supervised deep learning

➢ High dozens to low hundreds: may see models with


some predictive ability, unlikely to really wow or be
“superhuman” though

➢ High hundreds to thousands: “happy regime” for deep


learning

In general, deep learning is data hungry Almost always leverage transfer learning unless you have
-- the more data the better extremely different or huge (e.g., ImageNet-scale) dataset
94
Slide credit: BIODS 220
Click to editfor
3. Data considerations Master
imagetitle style models
classification
What counts as a data example?

Guidelines for amount of training data refers to # of unique instances representative of diversity
expected during testing / deployment. E.g. # of independent CT scans or surgery videos. Additional
correlated data (e.g. different slices of the same tumor or different suturing instances within the
same video) provide relatively less incremental value in comparison. 95
Slide credit: BIODS 220
Click to editfor
3. Data considerations Master
imagetitle style models
classification
Preview: advanced approaches for handling limited labeled data
Semi-supervised learning
Weakly-supervised learning
Domain adaptation

Will talk more about these in later lectures...

96
Slide credit: BIODS 220
Click to editfor
3. Data considerations Master
imagetitle style models
classification
What if there are multiple possible sources of data?
E.g., some with noisier / less accurate labels than others, from different hospital sites, etc.

✓ Expected diversity of data during deployment should be


reflected in both training and test sets
✓ Need to see these during training to learn how to handle them
✓ Need to see these during testing to accurately evaluate the model

97
Slide credit: BIODS 220
Clickimage
4. Evaluating to editclassification
Master title models
style
• Review what we have learned last time
• Deep learning models for image classification
• Data considerations for image classification models
• Evaluating image classification models
• Case studies of CNNs for medical image classification

98
Clickimage
4. Evaluating to editclassification
Master title models
style

Q: When might evaluating purely


accuracy be problematic?

A: Imbalanced datasets.

99
Figures from https://ihh300.github.io/banking/Techniques-Handling-Class-Imbalance-Credit-Card-Fraud/
Clickimage
4. Evaluating to editclassification
Master title models
style

TN FN

FP TP

Accuracy = (85285 + 118) / (85285 +


118 + 29 + 9)
=1

100
Figures from https://ihh300.github.io/banking/Techniques-Handling-Class-Imbalance-Credit-Card-Fraud/
Clickimage
4. Evaluating to editclassification
Master title models
style
We can trade-off different values of these metrics as we vary
our classifier’s score threshold to predict a positive

101
Slide credit: BIODS 220
Clickimage
4. Evaluating to editclassification
Master title models
style
Q: As prediction threshold increases, how does that generally
affect sensitivity? Specificity?

102
Slide credit: BIODS 220
Clickimage
4. Evaluating to editclassification
Master title models
style

103
Slide credit: BIODS 220
Clickimage
4. Evaluating to editclassification
Master title models
style

Precision = (TP)/ (TP+FP) =


TN FN (118)/(118+9) = 0.929

Sensitivity = (TP)/ (TP+FN) = aka Recall


(118)/(118+29) = 0.803

FP TP F1 score = 2 * (Recall * Precision)


/(Recall + Precision) = 0.861

Specificity = TN/ (TN+FP) =


(85285)/(85285+9) = 1

104
Clickimage
4. Evaluating to editclassification
Master title models
style

True Positive Rate (TPR)

False Positive Rate (FPR) 105


Slide credit: BIODS 220
Clickimage
4. Evaluating to editclassification
Master title models
style

106
Figures from https://zhuanlan.zhihu.com/p/58587448
Clickimage
4. Evaluating to editclassification
Master title models
style

107
Slide credit: BIODS 220
Clickimage
4. Evaluating to editclassification
Master title models
style
Also equal to distance above chance line for a balanced
dataset: sensitivity - (1 - specificity) = sensitivity + specificity - 1

108
Slide credit: BIODS 220
Clickimage
4. Evaluating to editclassification
Master title models
style
Also equal to distance above chance line for a balanced
dataset: sensitivity - (1 - specificity) = sensitivity + specificity - 1

But selected trade-off points


could also depend on application 109
Slide credit: BIODS 220
Outline Click to edit Master title style
• Review what we have learned last time
• Deep learning models for image classification
• Data considerations for image classification models
• Evaluating image classification models
• Case studies of CNNs for medical image classification

110
Click to edit Master title style
5. Case studies
Joint Diabetic Retinopathy (DR) and Diabetic Macular Edema (DME) Grading
Diabetic Retinopathy (DR)
• a consequence of microvascular changes
triggered by diabetes
• leading cause of blindness soft exudate Microaneurysms
Macula
Optic disc
Diabetic Macular Edema (DME)
• a complication of DR
• retinal thickening of fluid hard exudate Hemorrhage

• occur at any stage of DR Early pathological signs of DR

Grading:
• DR: the severity
• DME: shortest distance between macula and
hard exudates (0: no risk; 1: d < 1, 2: d> 1)

Li, X., Hu, X., Yu, L., Zhu, L., Fu, C.W. and Heng, P.A., 2019. CANet: cross-disease attention network for joint diabetic retinopathy and diabetic macular edema grading. IEEE 111
transactions on medical imaging, 39(5), pp.1483-1493.
Click to edit Master title style
5. Case studies
Joint Diabetic Retinopathy (DR) and Diabetic Macular Edema (DME) Grading

normal severe
DR: 0 DR: 1 DR: 2 DR: 3 DR: 4
DME: 0 DME: 0 DME: 1 DME: 2 DME: 2

Grading Clinical Importance of Grading


• DR: the severity. • DR/DME patients can receive tailored
• DME: shortest distance between macula and treatments.
hard exudates (0: no risk; 1: d < 1, 2: d> 1).

112
Click to edit Master title style
5. Case studies
Automatically learned features for DR and DME grading Multi-task learning
• the information among different
tasks is shared
• promote the performance of each
individual task
DR grading
relationship It also requires
• an understanding of each
disease
Fundus Image Neural Network • the internal relationship
between the two diseases.
DME grading

no published works for joint DR and DME grading.

[Gulshan et al. JAMA. 2016]; [Ren et al. Technology and Health Care. 2018]; [Krause et al. Ophthalmology. 2018]; [Liu et al. MICCAI 2018]
113
Click to edit Master title style
5. Case studies
Cross-disease Attention Network (CANet)
disease-specific attention block (disease-specific features) deep understanding of each disease
disease-dependent attention block (disease-dependent features) internal relationship between diseases

𝐅𝒊′ ∈ RC×H×W 𝐆𝒊 ∈ RC/r fc 𝐆𝒊′ ∈ RC/r ′


ℒ𝐷𝑅
Normal

AvgPool Mild NPDR


Moderate NPDR
Disease- Disease-
ℒ𝐷𝑅
specific fc dependent fc Moderate NPDR
attention attention
Severe NPDR
PDR

Disease- AvgPool Disease-


Normal
ℒ𝐷𝑀𝐸
𝐅 ∈ RC×H×W specific fc dependent fc Mild
attention attention
Severe ′
𝐅𝒋′ ∈ RC×H×W ℒ𝐷𝑀𝐸
𝐆𝒋 ∈ RC/r fc 𝐆𝒋′ ∈ RC/r

The overall architecture of cross-disease attention network (CANet).

114
Click to edit Master title style
5. Case studies
Cross-disease Attention Network (CANet)
disease-specific attention block (disease-specific features) deep understanding of each disease
disease-dependent attention block (disease-dependent features) internal relationship between diseases

𝐅𝒊′ ∈ RC×H×W 𝐆𝒊 ∈ RC/r fc ′


ℒ𝐷𝑅
Normal

AvgPool Mild NPDR


Moderate NPDR
Disease-
specific fc Moderate NPDR
attention
Severe NPDR
PDR

Disease- AvgPool Normal


𝐅 ∈ RC×H×W specific fc Mild
attention
Severe ′
𝐅𝒋′ ∈ RC×H×W ℒ𝐷𝑀𝐸
𝐆𝒋 ∈ RC/r fc

The overall architecture of cross-disease attention network (CANet).

115
Click to edit Master title style
5. Case studies
Cross-disease Attention Network (CANet)
disease-specific attention block (disease-specific features) deep understanding of each disease
disease-dependent attention block (disease-dependent features) internal relationship between diseases

𝐅𝒊′ ∈ RC×H×W 𝐆𝒊 ∈ RC/r fc 𝐆𝒊′ ∈ RC/r ′


ℒ𝐷𝑅
Normal

AvgPool Mild NPDR


Moderate NPDR
Disease- Disease-
ℒ𝐷𝑅
specific fc dependent fc Moderate NPDR
attention attention
Severe NPDR
PDR

Disease- AvgPool Disease-


Normal
ℒ𝐷𝑀𝐸
𝐅 ∈ RC×H×W specific fc dependent fc Mild
attention attention
Severe ′
𝐅𝒋′ ∈ RC×H×W ℒ𝐷𝑀𝐸
𝐆𝒋 ∈ RC/r fc 𝐆𝒋′ ∈ RC/r

The overall architecture of cross-disease attention network (CANet).

116
Click to edit Master title style
5. Case studies
Cross-disease Attention Network (CANet)
Channel-wise attention
𝑠
𝑐 𝐅𝑖,𝑎𝑣𝑔 ∈ RH×W
𝐅 ∈ RC×H×W 𝐅𝑎𝑣𝑔 ∈ RC 𝐅𝒊 ∈ RC×H×W 𝐀𝑠 ∈ RH×W 𝐅𝒊′ ∈ RC×H×W
𝐀 c ∈ RC
Sigmoid Conv
fc fc
Sigmoid

𝑐
𝐅𝑚𝑎𝑥 ∈ RC MLP 𝑠
𝐅𝑖,𝑚𝑎𝑥 ∈ RH×W

(a) Disease-specific attention module.


𝐆𝒊 ∈ RC/r
Sigmoid
fc fc
𝐀c
MLP

𝐆𝒋 ∈ RC/r 𝐆𝒋′ ∈ RC/r

(b) Disease-dependent attention module.

117
Click to edit Master title style
5. Case studies
Cross-disease Attention Network (CANet)
Spatial-wise attention
𝑠
𝑐 𝐅𝑖,𝑎𝑣𝑔 ∈ RH×W
𝐅 ∈ RC×H×W 𝐅𝑎𝑣𝑔 ∈ RC 𝐅𝒊 ∈ RC×H×W 𝐀𝑠 ∈ RH×W 𝐅𝒊′ ∈ RC×H×W
𝐀 c ∈ RC
Sigmoid Conv
fc fc
Sigmoid

𝑐
𝐅𝑚𝑎𝑥 ∈ RC MLP 𝑠
𝐅𝑖,𝑚𝑎𝑥 ∈ RH×W

(a) Disease-specific attention module.


𝐆𝒊 ∈ RC/r
Sigmoid
fc fc
𝐀c
MLP

𝐆𝒋 ∈ RC/r 𝐆𝒋′ ∈ RC/r

(b) Disease-dependent attention module.

118
Click to edit Master title style
5. Case studies
Cross-disease Attention Network (CANet)

𝑠
𝑐 𝐅𝑖,𝑎𝑣𝑔 ∈ RH×W
𝐅 ∈ RC×H×W 𝐅𝑎𝑣𝑔 ∈ RC 𝐅𝒊 ∈ RC×H×W 𝐀𝑠 ∈ RH×W 𝐅𝒊′ ∈ RC×H×W
𝐀 c ∈ RC
Sigmoid Conv
fc fc
Sigmoid

𝑐
𝐅𝑚𝑎𝑥 ∈ RC MLP 𝑠
𝐅𝑖,𝑚𝑎𝑥 ∈ RH×W

(a) Disease-specific attention module.


𝐆𝒊 ∈ RC/r
Sigmoid
fc fc
𝐀c
MLP
Channel-wise attention
𝐆𝒋 ∈ RC/r 𝐆𝒋′ ∈ RC/r

(b) Disease-dependent attention module.

119
Click to edit Master title style
5. Case studies
Cross-disease Attention Network (CANet)

𝐅𝒊′ ∈ RC×H×W 𝐆𝒊 ∈ RC/r fc 𝐆𝒊′ ∈ RC/r ′


ℒ𝐷𝑅
Normal

AvgPool Mild NPDR


Moderate NPDR
Disease- Disease-
ℒ𝐷𝑅
specific fc dependent fc Moderate NPDR
attention attention
Severe NPDR
PDR

Disease- AvgPool Disease-


Normal
ℒ𝐷𝑀𝐸
𝐅∈ RC×H×W specific fc dependent fc Mild
attention attention
Severe ′
𝐅𝒋′ ∈ RC×H×W ℒ𝐷𝑀𝐸
𝐆𝒋 ∈ RC/r fc 𝐆𝒋′ ∈ RC/r

The overall architecture of cross-disease attention network (CANet).

Loss function
Weighting factor
120
Click to edit Master title style
5. Case studies
Joint DR and DME grading results on the public Messidor dataset.

• Joint training outperforms individual training

121
Click to edit Master title style
5. Case studies
Joint DR and DME grading results on the public Messidor dataset.

• Our method outperforms Joint training under


the same level of model parameters
(effectiveness of our network design)

122
Click to edit Master title style
5. Case studies
Joint DR and DME grading results on the public Messidor dataset.

• Both disease-specific and disease-dependent


attentions can contribute to the performance

123
Click to edit Master title style
5. Case studies
Joint DR and DME grading results on the public Messidor dataset.

• Our method achieves best when 𝜆 = 0.25

124
Click to edit Master title style
5. Case studies
Comparisons with other multi-task learning methods.

Comparisons with state-of-the-art methods on the Messidor dataset.

• 2% higher than other


multi-task learning
methods.

• Clearly outperforms than


others.

125
Click to edit Master title style
5. Case studies
Results on the IDRiD challenge leaderboard. Ablation Study on the IDRiD challenge leaderboard.

• Both disease-specific and disease-


Results are from the IDRiD 2018 challenge website. dependent attentions are useful.

• Results keep consistent with those in


• Clearly outperforms than other results. the Messidor dataset.

126
Click to edit Master title style
5. Case studies
Joint DR and DME grading results on fundus photography on the Messidior dataset.

DR 3: 0.00 0.00 0.06 0.73 0.20 DR 2: 0.00 0.00 0.92 0.07 0.00 DR 0: 0.69 0.23 0.07 0.00 0.00 DR 2: 0.00 0.00 0.99 0.01 0.00
DME 2: 0.00 0.00 0.99 DME 2: 0.00 0.00 1.00 DME 0: 0.85 0.10 0.04 DME 2: 0.00 0.08 0.92

DR 2: 0.07 0.00 0.86 0.00 0.07 DR 3: 0.00 0.00 0.00 0.64 0.35 DR 4: 0.00 0.00 0.00 0.05 0.95 DR 2: 0.00 0.00 0.99 0.00 0.00
DME 1: 0.05 0.95 0.00 DME 2: 0.00 0.00 0.99 DME 2: 0.00 0.10 0.90 DME 2: 0.00 0.14 0.86

Ground-truth The output probabilities for DR, belonging


The output probabilities for DME,
provided by doctors to grade 0,1,2,3,4 respectively. 127
belonging to grade 0,1,2 respectively.
Click to edit Master title style
5. Case studies

Q: How might we approach


this problem?

128
Slide credit: BIODS 220
Click to edit Master title style
5. Case studies

129
Slide credit: BIODS 220
Click to edit Master title style
5. Case studies

130
Slide credit: BIODS 220
Click to edit Master title style
5. Case studies

131
Slide credit: BIODS 220
Click to edit Master title style
5. Case studies

132
Slide credit: BIODS 220
Click to edit Master title style
5. Case studies

133
Slide credit: BIODS 220
Click to edit Master title style
5. Case studies

134
Slide credit: BIODS 220
Click to edit Master title style
5. Case studies

135
Slide credit: BIODS 220
Click to edit Master title style
5. Case studies

136
Slide credit: BIODS 220
Click to edit Master title style
5. Case studies

137
Slide credit: BIODS 220
Click to edit Master title style
5. Case studies
Q: What could explain the difference in trends for reducing #
grades / image on training set vs. tuning set, on tuning set
performance?

138
Slide credit: BIODS 220
Click to edit Master title style
5. Case studies

139
Slide credit: BIODS 220
Click to edit Master title style
5. Case studies

140
Slide credit: BIODS 220
Click to edit Master title style
5. Case studies

141
Slide credit: BIODS 220
Click to edit Master title style
5. Case studies

All training images were resized to 256x256 and underwent base data
augmentation of random 227x227 cropping and mirror images. Additional
data augmentation experiments in results table.

142
Slide credit: BIODS 220
Click to edit Master title style
5. Case studies

All training images were resized to 256x256 and underwent base data Often resize to match input size of pre-trained
augmentation of random 227x227 cropping and mirror images. Additional networks. Also fine approach to making high-
data augmentation experiments in results table. res dataset easier to work with!

143
Slide credit: BIODS 220
Click to edit Master title style
5. Case studies
Performed further analysis at optimal
threshold determined by the Youden
Index.

144
Slide credit: BIODS 220
Click to edit Master title style
5. Case studies

145
Slide credit: BIODS 220
Click to edit Master title style
5. Case studies

146
Slide credit: BIODS 220
SummaryClick to edit Master title style
Today we saw:
• Deep learning models for image classification
• Data considerations for image classification models
• Evaluating image classification models
• Case studies

Next time: Advanced Vision Models (Detection and


Segmentation)

147

You might also like