Skip to content

Commit 7dcdb79

Browse files
committed
added tutorial content
1 parent f51468a commit 7dcdb79

File tree

1 file changed

+5
-5
lines changed

1 file changed

+5
-5
lines changed

README.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -249,9 +249,9 @@ In the ImageNet VGG-16 [shown previously](https://github.com/sgrvinod/a-PyTorch-
249249

250250
- `fc7` with an input size of `4096` (i.e. the output size of `fc6`) and an output size `4096` has parameters of dimensions `4096, 4096`. The input could be considered as a `1, 1` image with `4096` input channels. **The equivalent convolutional layer `conv7` has a `1, 1` kernel size and `4096` output channels, with reshaped parameters of dimensions `4096, 1, 1, 4096`.**
251251

252-
We can see now that `conv6` has `4096` filters, each with dimensions `7, 7, 512`, and `conv7` has `4096` filters, each with dimensions `1, 1, 4096`.
252+
We can see that `conv6` has `4096` filters, each with dimensions `7, 7, 512`, and `conv7` has `4096` filters, each with dimensions `1, 1, 4096`.
253253

254-
These filters are numerous and large – and computationally expensive.
254+
The filters are numerous and large – and computationally expensive.
255255

256256
To remedy this, the authors opt to **reduce both their number and the size of each filter by subsampling parameters** from the converted convolutional layers.
257257

@@ -451,7 +451,7 @@ Remember, the nub of any supervised learning algorithm is that **we need to be a
451451

452452
For the model to learn _anything_, we'd need to structure the problem in a way that allows for comparisions between our predictions and the objects actually present in the image.
453453

454-
Priors enable us to do exactly this.
454+
Priors enable us to do exactly this
455455

456456
- **Find the Jaccard overlaps** between the 8732 priors and `N` ground truth objects. This will be a tensor of size `8732, N`.
457457

@@ -523,7 +523,7 @@ For the SSD, however, the authors simply use `α = 1`, i.e. add the two losses.
523523

524524
After the model is trained, we can apply it to images. However, the predictions are still in their raw form – two tensors containing the offsets and class scores for 8732 priors. These would need to be processed to **obtain final, human-interpretable bounding boxes with labels.**
525525

526-
This entails the following.
526+
This entails the following
527527

528528
- We have 8732 predicted boxes represented as offsets `(g_c_x, g_c_y, g_w, g_h)` from their respective priors. Decode them to boundary coordinates, which are actually directly interpretable.
529529

@@ -565,7 +565,7 @@ Thus, we've eliminated the rogue candidates – one of each animal.
565565

566566
This process is called __Non-Maximum Suppression (NMS)__ because when multiple candidates are found to overlap significantly with each other such that they could be referencing the same object, **we suppress all but the one with the maximum score**.
567567

568-
Algorithmically, it is carried out as follows.
568+
Algorithmically, it is carried out as follows
569569

570570
- Upon selecting candidades for each _non-background_ class,
571571

0 commit comments

Comments
 (0)