pytorch
diff --git a/‎word_language_model/README.md‎
Lines changed: 1 addition & 3 deletions b/‎word_language_model/README.md‎
Lines changed: 1 addition & 3 deletions
diff --git a/‎word_language_model/model.py‎
Lines changed: 3 additions & 0 deletions b/‎word_language_model/model.py‎
Lines changed: 3 additions & 0 deletions
@@ -53,6 +53,4 @@ python main.py --cuda --emsize 1500 --nhid 1500 --dropout 0.65 --epochs 40 --tie
 
 These perplexities are equal or better than
 [Recurrent Neural Network Regularization (Zaremba et al. 2014)](https://arxiv.org/pdf/1409.2329.pdf)
-and are similar to
-[Tying Word Vectors and Word Classifiers: A Loss Framework for Language Modeling (Inan et al. 2016)](https://arxiv.org/pdf/1611.01462.pdf),
-though Inan et al. have improved perplexities by using a form of recurrent dropout (variational dropout).
+and are similar to [Using the Output Embedding to Improve Language Models (Press & Wolf 2016](https://arxiv.org/abs/1608.05859) and [Tying Word Vectors and Word Classifiers: A Loss Framework for Language Modeling (Inan et al. 2016)](https://arxiv.org/pdf/1611.01462.pdf), though both of these papers have improved perplexities by using a form of recurrent dropout [(variational dropout)](http://papers.nips.cc/paper/6241-a-theoretically-grounded-application-of-dropout-in-recurrent-neural-networks).
@@ -20,6 +20,9 @@ def __init__(self, rnn_type, ntoken, ninp, nhid, nlayers, dropout=0.5, tie_weigh
  self.decoder = nn.Linear(nhid, ntoken)
 
  # Optionally tie weights as in:
+ # "Using the Output Embedding to Improve Language Models" (Press & Wolf 2016)
+ # https://arxiv.org/abs/1608.05859
+ # and
  # "Tying Word Vectors and Word Classifiers: A Loss Framework for Language Modeling" (Inan et al. 2016)
  # https://arxiv.org/abs/1611.01462
  if tie_weights: