Lecture 02
Lecture 02
2
Click to edit Master title style
Assignments
Import packages
Define augmentations
transforms.Compose()
Define dataloader
3
Click to edit Master title style
Assignments
Build model
5
Examples are from https://nextjournal.com/gkoehler/pytorch-mnist
Click examples
Look at some to edit Master title style
Network initialization and setting an optimizer
Step 1. Preparing the Dataset
Network parameters
Step 2. Building the Network
Step 3. Training the Model
Step 4. Evaluating the Model’s Performance
Step 5. Continued Training from Checkpoints
6
Examples are from https://nextjournal.com/gkoehler/pytorch-mnist
Click examples
Look at some to edit Master title style
Test mode, paras are freezed.
Step 1. Preparing the Dataset Disabling gradient calculation is useful for inference
Step 2. Building the Network
Step 3. Training the Model
Step 4. Evaluating the Model’s Performance Take out the predictions
Step 5. Continued Training from Checkpoints
7
Examples are from https://nextjournal.com/gkoehler/pytorch-mnist
Click examples
Look at some to edit Master title style
Step 1. Preparing the Dataset Show predictions
Step 2. Building the Network
Step 3. Training the Model
Step 4. Evaluating the Model’s Performance
Step 5. Continued Training from Checkpoints
8
Examples are from https://nextjournal.com/gkoehler/pytorch-mnist
Click examples
Look at some to edit Master title style
Step 1. Preparing the Dataset
Step 2. Building the Network
Step 3. Training the Model
Step 4. Evaluating the Model’s Performance
Step 5. Continued Training from Checkpoints
9
Examples are from https://nextjournal.com/gkoehler/pytorch-mnist
Outline Click to edit Master title style
• Review what we have learned last time
• Deep learning models for image classification
• Data considerations for image classification models
• Evaluating image classification models
• Case studies of CNNs for medical image classification
10
Outline Click to edit Master title style
• Review what we have learned last time
• Deep learning models for image classification
• Data considerations for image classification models
• Evaluating image classification models
• Case studies of CNNs for medical image classification
11
Click to edit Master title style
1. Last time
• Deep learning (a type of machine learning)
Traditional machine learning approaches
Layer output(s)
Layer inputs
Loss functions are quantitative measures of how satisfactory the model predictions are (i.e.,
how “good” the model parameters are).
Mean square error (MSE) loss, which is standard for regression
MSE loss for a single example 𝑥 𝑖 , when the prediction is and the ground-truth is
1
MSE loss over a set of examples 𝑖 = 1, … , 𝑀 : 𝐿 = σ𝑖 𝐿𝑖 (𝑊) 13
𝑀
Click to edit Master title style
1. Last time
• Define a two-layer fully-connected neural network
Activation functions
Introduce non-linearity into the model --
allowing it to represent highly complex functions.
𝑦ො = 𝑊𝑥
Output Weights Input
(10 × 1) (10 × 3072) (3072 × 10)
15
Click to edit Master title style
1. Last time
Fully-connected layers: in graphical form
16
Click to edit Master title style
1. Last time
Convolutional layer
17
Click to edit Master title style
1. Last time
Convolutional layer Filters always extend the full
depth of the input volume
18
Click to edit Master title style
1. Last time
Convolutional layer
21
Click to edit Master title style
1. Last time
Convolutional layer
Activation map
22
Click to edit Master title style
1. Last time
consider a second, green filter
Convolutional layer
23
Click to edit Master title style
1. Last time
24
Click to edit Master title style
1. Last time
25
Click to edit Master title style
1. Last time
Common settings:
26
Click to edit Master title style
1. Last time
Padding options: ‘valid’ does not pad, use ‘same’ to pad such
that input and output spatial dimensions are the same size
27
Click to edit Master title style
1. Last time
28
Click to edit Master title style
1. Last time
29
Click to edit Master title style
1. Last time
30
Click to edit Master title style
1. Last time
31
Click to edit Master title style
1. Last time
Common settings:
F = 2, S = 2
F = 3, S = 2
32
Click to edit Master title style
1. Last time
Validation loss
Model debugging Training loss
Healthy loss curve plateaus
Plateau may be bad Loss decreasing but slowly - -> try further learning rate
weight initialization > try higher learning rate decay at plateau point
33
Final metric is still improving -> keep training!
Click to edit Master title style
1. Last time
Validation loss
Early stopping: always do this Training loss
34
Click to edit Master title style
1. Last time
35
Click to edit Master title style
1. Last time
36
Click to edit Master title style
1. Last time
L2 most popular: low loss when all weights are relatively small.
More strongly penalizes large weights vs L1.
Expresses preference for simple models (need large coefficients
to fit a function to extreme outlier values).
38
Click to edit Master title style
1. Last time
39
Click to edit Master title style
1. Last time
40
Click to edit Master title style
1. Last time
41
Click to edit Master title style
1. Last time
42
Click to edit Master title style
1. Last time
43
Click to edit Master title style
1. Last time
44
Click to edit Master title style
1. Last time
Useful debugging / sanity check: restrict to a
very small dataset first (e.g. 1 or 2 minibatches).
You should be able to severely overfit and drive
the loss to 0.
Aside: For LR, should sample e^x for x in Uniform [-5, 0]!
45
Click to edit Master title style
1. Last time
46
Click to edit Master title style
1. Last time
Model inference
47
Click to edit Master title style
1. Last time
Model ensembles
48
Click to edit Master title style
1. Last time
49
Click to edit Master title style
1. Last time
50
Click to edit Master title style
1. Last time
Vanilla fully-connected neural networks
(MLPs) usually pretty shallow -- otherwise
too many parameters! ~2-3 layers.
52
Click to edit Master title style
1. Last time
Typical in modern
CNNs and MLPs
53
Click to edit Master title style
1. Last time
55
Outline Click to edit Master title style
• Review what we have learned last time
• Deep learning models for image classification
• Data considerations for image classification models
• Evaluating image classification models
• Case studies of CNNs for medical image classification
56
Click tomodels
2. Deep learning edit Master title
for image style
classification
57
Click tomodels
2. Deep learning edit Master title
for image style
classification
58
Click tomodels
2. Deep learning edit Master title
for image style
classification
59
Click tomodels
2. Deep learning edit Master title
for image style
classification
60
Click tomodels
2. Deep learning edit Master title
for image style
classification
61
Click tomodels
2. Deep learning edit Master title
for image style
classification
62
Click tomodels
2. Deep learning edit Master title
for image style
classification
63
Click tomodels
2. Deep learning edit Master title
for image style
classification
64
Click tomodels
2. Deep learning edit Master title
for image style
classification
65
Click tomodels
2. Deep learning edit Master title
for image style
classification
66
Click tomodels
2. Deep learning edit Master title
for image style
classification
67
Click tomodels
2. Deep learning edit Master title
for image style
classification
68
Click tomodels
2. Deep learning edit Master title
for image style
classification
69
Click tomodels
2. Deep learning edit Master title
for image style
classification
70
Click tomodels
2. Deep learning edit Master title
for image style
classification
71
Click tomodels
2. Deep learning edit Master title
for image style
classification
72
Click tomodels
2. Deep learning edit Master title
for image style
classification
73
Click tomodels
2. Deep learning edit Master title
for image style
classification
74
Click tomodels
2. Deep learning edit Master title
for image style
classification
75
Click tomodels
2. Deep learning edit Master title
for image style
classification
76
Slide credit: BIODS 220
Click tomodels
2. Deep learning edit Master title
for image style
classification
Check website for state-of-the-art CNN architectures
More recent CNN architectures for image classification
77
Worth exploring for class projects!
Click tomodels
2. Deep learning edit Master title
for image style
classification
More on loss functions Equivalent to the negative log of the
probability of the correct ground truth class
being predicted. Think about what the
Common loss functions expression looks like when y_i = 1 vs. 0.
Minimize squared difference between
Regression prediction output and target Binary Cross-Entropy
78
Slide credit: BIODS 220
Click tomodels
2. Deep learning edit Master title
for image style
classification
Common loss functions
79
Slide credit: BIODS 220
Outline Click to edit Master title style
• Review what we have learned last time
• Deep learning models for image classification
• Data considerations for image classification models
• Evaluating image classification models
• Case studies of CNNs for medical image classification
80
Click to editfor
3. Data considerations Master
imagetitle style models
classification
81
Slide credit: BIODS 220
Click to editfor
3. Data considerations Master
imagetitle style models
classification
82
Slide credit: BIODS 220
Click to editfor
3. Data considerations Master
imagetitle style models
classification
Data preprocessing
84
Click to editfor
3. Data considerations Master
imagetitle style models
classification
Data preprocessing
85
Click to editfor
3. Data considerations Master
imagetitle style models
classification
86
Click to editfor
3. Data considerations Master
imagetitle style models
classification
87
Slide credit: BIODS 220
Click to editfor
3. Data considerations Master
imagetitle style models
classification
88
Click to editfor
3. Data considerations Master
imagetitle style models
classification
89
Click to editfor
3. Data considerations Master
imagetitle style models
classification
90
Click to editfor
3. Data considerations Master
imagetitle style models
classification
91
Click to editfor
3. Data considerations Master
imagetitle style models
classification
92
Click to editfor
3. Data considerations Master
imagetitle style models
classification
Often good idea to try this first, try fine-tuning all layers of the network 93
Click to editfor
3. Data considerations Master
imagetitle style models
classification
How much data do you need for deep learning?
Examples per class of your dataset, in addition to transfer
learning (take this with grain of salt, it really depends on
the problem):
In general, deep learning is data hungry Almost always leverage transfer learning unless you have
-- the more data the better extremely different or huge (e.g., ImageNet-scale) dataset
94
Slide credit: BIODS 220
Click to editfor
3. Data considerations Master
imagetitle style models
classification
What counts as a data example?
Guidelines for amount of training data refers to # of unique instances representative of diversity
expected during testing / deployment. E.g. # of independent CT scans or surgery videos. Additional
correlated data (e.g. different slices of the same tumor or different suturing instances within the
same video) provide relatively less incremental value in comparison. 95
Slide credit: BIODS 220
Click to editfor
3. Data considerations Master
imagetitle style models
classification
Preview: advanced approaches for handling limited labeled data
Semi-supervised learning
Weakly-supervised learning
Domain adaptation
96
Slide credit: BIODS 220
Click to editfor
3. Data considerations Master
imagetitle style models
classification
What if there are multiple possible sources of data?
E.g., some with noisier / less accurate labels than others, from different hospital sites, etc.
97
Slide credit: BIODS 220
Clickimage
4. Evaluating to editclassification
Master title models
style
• Review what we have learned last time
• Deep learning models for image classification
• Data considerations for image classification models
• Evaluating image classification models
• Case studies of CNNs for medical image classification
98
Clickimage
4. Evaluating to editclassification
Master title models
style
A: Imbalanced datasets.
99
Figures from https://ihh300.github.io/banking/Techniques-Handling-Class-Imbalance-Credit-Card-Fraud/
Clickimage
4. Evaluating to editclassification
Master title models
style
TN FN
FP TP
100
Figures from https://ihh300.github.io/banking/Techniques-Handling-Class-Imbalance-Credit-Card-Fraud/
Clickimage
4. Evaluating to editclassification
Master title models
style
We can trade-off different values of these metrics as we vary
our classifier’s score threshold to predict a positive
101
Slide credit: BIODS 220
Clickimage
4. Evaluating to editclassification
Master title models
style
Q: As prediction threshold increases, how does that generally
affect sensitivity? Specificity?
102
Slide credit: BIODS 220
Clickimage
4. Evaluating to editclassification
Master title models
style
103
Slide credit: BIODS 220
Clickimage
4. Evaluating to editclassification
Master title models
style
104
Clickimage
4. Evaluating to editclassification
Master title models
style
106
Figures from https://zhuanlan.zhihu.com/p/58587448
Clickimage
4. Evaluating to editclassification
Master title models
style
107
Slide credit: BIODS 220
Clickimage
4. Evaluating to editclassification
Master title models
style
Also equal to distance above chance line for a balanced
dataset: sensitivity - (1 - specificity) = sensitivity + specificity - 1
108
Slide credit: BIODS 220
Clickimage
4. Evaluating to editclassification
Master title models
style
Also equal to distance above chance line for a balanced
dataset: sensitivity - (1 - specificity) = sensitivity + specificity - 1
110
Click to edit Master title style
5. Case studies
Joint Diabetic Retinopathy (DR) and Diabetic Macular Edema (DME) Grading
Diabetic Retinopathy (DR)
• a consequence of microvascular changes
triggered by diabetes
• leading cause of blindness soft exudate Microaneurysms
Macula
Optic disc
Diabetic Macular Edema (DME)
• a complication of DR
• retinal thickening of fluid hard exudate Hemorrhage
Grading:
• DR: the severity
• DME: shortest distance between macula and
hard exudates (0: no risk; 1: d < 1, 2: d> 1)
Li, X., Hu, X., Yu, L., Zhu, L., Fu, C.W. and Heng, P.A., 2019. CANet: cross-disease attention network for joint diabetic retinopathy and diabetic macular edema grading. IEEE 111
transactions on medical imaging, 39(5), pp.1483-1493.
Click to edit Master title style
5. Case studies
Joint Diabetic Retinopathy (DR) and Diabetic Macular Edema (DME) Grading
normal severe
DR: 0 DR: 1 DR: 2 DR: 3 DR: 4
DME: 0 DME: 0 DME: 1 DME: 2 DME: 2
112
Click to edit Master title style
5. Case studies
Automatically learned features for DR and DME grading Multi-task learning
• the information among different
tasks is shared
• promote the performance of each
individual task
DR grading
relationship It also requires
• an understanding of each
disease
Fundus Image Neural Network • the internal relationship
between the two diseases.
DME grading
[Gulshan et al. JAMA. 2016]; [Ren et al. Technology and Health Care. 2018]; [Krause et al. Ophthalmology. 2018]; [Liu et al. MICCAI 2018]
113
Click to edit Master title style
5. Case studies
Cross-disease Attention Network (CANet)
disease-specific attention block (disease-specific features) deep understanding of each disease
disease-dependent attention block (disease-dependent features) internal relationship between diseases
114
Click to edit Master title style
5. Case studies
Cross-disease Attention Network (CANet)
disease-specific attention block (disease-specific features) deep understanding of each disease
disease-dependent attention block (disease-dependent features) internal relationship between diseases
115
Click to edit Master title style
5. Case studies
Cross-disease Attention Network (CANet)
disease-specific attention block (disease-specific features) deep understanding of each disease
disease-dependent attention block (disease-dependent features) internal relationship between diseases
116
Click to edit Master title style
5. Case studies
Cross-disease Attention Network (CANet)
Channel-wise attention
𝑠
𝑐 𝐅𝑖,𝑎𝑣𝑔 ∈ RH×W
𝐅 ∈ RC×H×W 𝐅𝑎𝑣𝑔 ∈ RC 𝐅𝒊 ∈ RC×H×W 𝐀𝑠 ∈ RH×W 𝐅𝒊′ ∈ RC×H×W
𝐀 c ∈ RC
Sigmoid Conv
fc fc
Sigmoid
𝑐
𝐅𝑚𝑎𝑥 ∈ RC MLP 𝑠
𝐅𝑖,𝑚𝑎𝑥 ∈ RH×W
117
Click to edit Master title style
5. Case studies
Cross-disease Attention Network (CANet)
Spatial-wise attention
𝑠
𝑐 𝐅𝑖,𝑎𝑣𝑔 ∈ RH×W
𝐅 ∈ RC×H×W 𝐅𝑎𝑣𝑔 ∈ RC 𝐅𝒊 ∈ RC×H×W 𝐀𝑠 ∈ RH×W 𝐅𝒊′ ∈ RC×H×W
𝐀 c ∈ RC
Sigmoid Conv
fc fc
Sigmoid
𝑐
𝐅𝑚𝑎𝑥 ∈ RC MLP 𝑠
𝐅𝑖,𝑚𝑎𝑥 ∈ RH×W
118
Click to edit Master title style
5. Case studies
Cross-disease Attention Network (CANet)
𝑠
𝑐 𝐅𝑖,𝑎𝑣𝑔 ∈ RH×W
𝐅 ∈ RC×H×W 𝐅𝑎𝑣𝑔 ∈ RC 𝐅𝒊 ∈ RC×H×W 𝐀𝑠 ∈ RH×W 𝐅𝒊′ ∈ RC×H×W
𝐀 c ∈ RC
Sigmoid Conv
fc fc
Sigmoid
𝑐
𝐅𝑚𝑎𝑥 ∈ RC MLP 𝑠
𝐅𝑖,𝑚𝑎𝑥 ∈ RH×W
119
Click to edit Master title style
5. Case studies
Cross-disease Attention Network (CANet)
Loss function
Weighting factor
120
Click to edit Master title style
5. Case studies
Joint DR and DME grading results on the public Messidor dataset.
121
Click to edit Master title style
5. Case studies
Joint DR and DME grading results on the public Messidor dataset.
122
Click to edit Master title style
5. Case studies
Joint DR and DME grading results on the public Messidor dataset.
123
Click to edit Master title style
5. Case studies
Joint DR and DME grading results on the public Messidor dataset.
124
Click to edit Master title style
5. Case studies
Comparisons with other multi-task learning methods.
125
Click to edit Master title style
5. Case studies
Results on the IDRiD challenge leaderboard. Ablation Study on the IDRiD challenge leaderboard.
126
Click to edit Master title style
5. Case studies
Joint DR and DME grading results on fundus photography on the Messidior dataset.
DR 3: 0.00 0.00 0.06 0.73 0.20 DR 2: 0.00 0.00 0.92 0.07 0.00 DR 0: 0.69 0.23 0.07 0.00 0.00 DR 2: 0.00 0.00 0.99 0.01 0.00
DME 2: 0.00 0.00 0.99 DME 2: 0.00 0.00 1.00 DME 0: 0.85 0.10 0.04 DME 2: 0.00 0.08 0.92
DR 2: 0.07 0.00 0.86 0.00 0.07 DR 3: 0.00 0.00 0.00 0.64 0.35 DR 4: 0.00 0.00 0.00 0.05 0.95 DR 2: 0.00 0.00 0.99 0.00 0.00
DME 1: 0.05 0.95 0.00 DME 2: 0.00 0.00 0.99 DME 2: 0.00 0.10 0.90 DME 2: 0.00 0.14 0.86
128
Slide credit: BIODS 220
Click to edit Master title style
5. Case studies
129
Slide credit: BIODS 220
Click to edit Master title style
5. Case studies
130
Slide credit: BIODS 220
Click to edit Master title style
5. Case studies
131
Slide credit: BIODS 220
Click to edit Master title style
5. Case studies
132
Slide credit: BIODS 220
Click to edit Master title style
5. Case studies
133
Slide credit: BIODS 220
Click to edit Master title style
5. Case studies
134
Slide credit: BIODS 220
Click to edit Master title style
5. Case studies
135
Slide credit: BIODS 220
Click to edit Master title style
5. Case studies
136
Slide credit: BIODS 220
Click to edit Master title style
5. Case studies
137
Slide credit: BIODS 220
Click to edit Master title style
5. Case studies
Q: What could explain the difference in trends for reducing #
grades / image on training set vs. tuning set, on tuning set
performance?
138
Slide credit: BIODS 220
Click to edit Master title style
5. Case studies
139
Slide credit: BIODS 220
Click to edit Master title style
5. Case studies
140
Slide credit: BIODS 220
Click to edit Master title style
5. Case studies
141
Slide credit: BIODS 220
Click to edit Master title style
5. Case studies
All training images were resized to 256x256 and underwent base data
augmentation of random 227x227 cropping and mirror images. Additional
data augmentation experiments in results table.
142
Slide credit: BIODS 220
Click to edit Master title style
5. Case studies
All training images were resized to 256x256 and underwent base data Often resize to match input size of pre-trained
augmentation of random 227x227 cropping and mirror images. Additional networks. Also fine approach to making high-
data augmentation experiments in results table. res dataset easier to work with!
143
Slide credit: BIODS 220
Click to edit Master title style
5. Case studies
Performed further analysis at optimal
threshold determined by the Youden
Index.
144
Slide credit: BIODS 220
Click to edit Master title style
5. Case studies
145
Slide credit: BIODS 220
Click to edit Master title style
5. Case studies
146
Slide credit: BIODS 220
SummaryClick to edit Master title style
Today we saw:
• Deep learning models for image classification
• Data considerations for image classification models
• Evaluating image classification models
• Case studies
147