You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+5-5Lines changed: 5 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -249,9 +249,9 @@ In the ImageNet VGG-16 [shown previously](https://github.com/sgrvinod/a-PyTorch-
249
249
250
250
-`fc7` with an input size of `4096` (i.e. the output size of `fc6`) and an output size `4096` has parameters of dimensions `4096, 4096`. The input could be considered as a `1, 1` image with `4096` input channels. **The equivalent convolutional layer `conv7` has a `1, 1` kernel size and `4096` output channels, with reshaped parameters of dimensions `4096, 1, 1, 4096`.**
251
251
252
-
We can see now that `conv6` has `4096` filters, each with dimensions `7, 7, 512`, and `conv7` has `4096` filters, each with dimensions `1, 1, 4096`.
252
+
We can see that `conv6` has `4096` filters, each with dimensions `7, 7, 512`, and `conv7` has `4096` filters, each with dimensions `1, 1, 4096`.
253
253
254
-
These filters are numerous and large – and computationally expensive.
254
+
The filters are numerous and large – and computationally expensive.
255
255
256
256
To remedy this, the authors opt to **reduce both their number and the size of each filter by subsampling parameters** from the converted convolutional layers.
257
257
@@ -451,7 +451,7 @@ Remember, the nub of any supervised learning algorithm is that **we need to be a
451
451
452
452
For the model to learn _anything_, we'd need to structure the problem in a way that allows for comparisions between our predictions and the objects actually present in the image.
453
453
454
-
Priors enable us to do exactly this.
454
+
Priors enable us to do exactly this –
455
455
456
456
-**Find the Jaccard overlaps** between the 8732 priors and `N` ground truth objects. This will be a tensor of size `8732, N`.
457
457
@@ -523,7 +523,7 @@ For the SSD, however, the authors simply use `α = 1`, i.e. add the two losses.
523
523
524
524
After the model is trained, we can apply it to images. However, the predictions are still in their raw form – two tensors containing the offsets and class scores for 8732 priors. These would need to be processed to **obtain final, human-interpretable bounding boxes with labels.**
525
525
526
-
This entails the following.
526
+
This entails the following –
527
527
528
528
- We have 8732 predicted boxes represented as offsets `(g_c_x, g_c_y, g_w, g_h)` from their respective priors. Decode them to boundary coordinates, which are actually directly interpretable.
529
529
@@ -565,7 +565,7 @@ Thus, we've eliminated the rogue candidates – one of each animal.
565
565
566
566
This process is called __Non-Maximum Suppression (NMS)__ because when multiple candidates are found to overlap significantly with each other such that they could be referencing the same object, **we suppress all but the one with the maximum score**.
567
567
568
-
Algorithmically, it is carried out as follows.
568
+
Algorithmically, it is carried out as follows –
569
569
570
570
- Upon selecting candidades for each _non-background_ class,
0 commit comments