Skip to content

Commit aecda28

Browse files
Ofir Presssoumith
authored andcommitted
update attribution of weight tying in word_language_model (#109)
* Update model.py updated attribution of weight tying * Update README.md updated attribution of weight tying
1 parent bcea1f5 commit aecda28

File tree

2 files changed

+4
-3
lines changed

2 files changed

+4
-3
lines changed

word_language_model/README.md

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -53,6 +53,4 @@ python main.py --cuda --emsize 1500 --nhid 1500 --dropout 0.65 --epochs 40 --tie
5353

5454
These perplexities are equal or better than
5555
[Recurrent Neural Network Regularization (Zaremba et al. 2014)](https://arxiv.org/pdf/1409.2329.pdf)
56-
and are similar to
57-
[Tying Word Vectors and Word Classifiers: A Loss Framework for Language Modeling (Inan et al. 2016)](https://arxiv.org/pdf/1611.01462.pdf),
58-
though Inan et al. have improved perplexities by using a form of recurrent dropout (variational dropout).
56+
and are similar to [Using the Output Embedding to Improve Language Models (Press & Wolf 2016](https://arxiv.org/abs/1608.05859) and [Tying Word Vectors and Word Classifiers: A Loss Framework for Language Modeling (Inan et al. 2016)](https://arxiv.org/pdf/1611.01462.pdf), though both of these papers have improved perplexities by using a form of recurrent dropout [(variational dropout)](http://papers.nips.cc/paper/6241-a-theoretically-grounded-application-of-dropout-in-recurrent-neural-networks).

word_language_model/model.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,9 @@ def __init__(self, rnn_type, ntoken, ninp, nhid, nlayers, dropout=0.5, tie_weigh
2020
self.decoder = nn.Linear(nhid, ntoken)
2121

2222
# Optionally tie weights as in:
23+
# "Using the Output Embedding to Improve Language Models" (Press & Wolf 2016)
24+
# https://arxiv.org/abs/1608.05859
25+
# and
2326
# "Tying Word Vectors and Word Classifiers: A Loss Framework for Language Modeling" (Inan et al. 2016)
2427
# https://arxiv.org/abs/1611.01462
2528
if tie_weights:

0 commit comments

Comments
 (0)