- Notifications
You must be signed in to change notification settings - Fork 5.4k
Backstitch rnnlm #2096
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Backstitch rnnlm #2096
Conversation
| fi | ||
| | ||
| if [ $stage -le 3 ]; then | ||
| backstitch_opt="--backstitch-scale $alpha --backstitch-interval $back_interval" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there should probably be a way to easily turn off the backstitch? e.g. a --use-backstitch false option for the script, and if set to false, backstitch_opt="" ?
Or maybe it's automatically turned off if alpha=0? In that case, maybe you can state this clearly in the code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding turning backstitch off: I prefer to have this level of script not be too configurable. But I think it would be good idea to rename the script to run_lstm_tdnn_bs_1a.sh and have the link from run_lstm_tdnn_bs.sh, to encode that it's a script with backstitch.
| valid_prob_strings = common_lib.get_command_stdout( | ||
| 'grep -e {0} {1}'.format(key, valid_prob_files)) | ||
| | ||
| # LOG |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is this comment for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are example strings to be matched for the regular expression.
scripts/rnnlm/train_rnnlm.sh Outdated
| embedding_l2=0.005 | ||
| embedding_lrate_factor=0.1 # the embedding learning rate is the | ||
| # nnet learning rate times this factor. | ||
| backstitch_scale=0.0 # backstitch training scale |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
similar to the comment above? does 0.0 mean no backstitch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes.
src/rnnlm/rnnlm-core-training.cc Outdated
| computer.Run(); // This is the forward pass. | ||
| | ||
| ProcessOutput(minibatch, derived, word_embedding, | ||
| ProcessOutput(true, minibatch, derived, word_embedding, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's probably easier to read if you write
bool is_backstitch_step = true; ProcessOutput(is_backstitch_step, ....)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| } | ||
| | ||
| void RnnlmCoreTrainer::TrainBackstitch( | ||
| bool is_backstitch_step1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why 1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 means the 1st step in the backstitch training.
src/rnnlm/rnnlm-core-training.h Outdated
| "related to parallelization by model averaging."); | ||
| opts->Register("backstitch-training-scale", &backstitch_training_scale, | ||
| "backstitch training factor. " | ||
| "if 0 then in the normal training mode. It is referred as " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
referred to something as
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| rnnlm-train \ | ||
| --rnnlm.max-param-change=$rnnlm_max_change \ | ||
| --rnnlm.l2_regularize_factor=$rnnlm_l2_factor \ | ||
| --embedding.max-param-change=$embedding_max_change \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did you forget to add the option for embedding training?
src/rnnlmbin/rnnlm-train.cc Outdated
| | ||
| core_config.backstitch_training_scale = backstitch_training_scale; | ||
| core_config.backstitch_training_interval = backstitch_training_interval; | ||
| embedding_config.backstitch_training_scale = backstitch_training_scale; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do you make them the same if you have separate options for core_config and embedding_config?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually i see you declared each option 3 times, but only the one defined in this .cc file take into effect. This is very weird.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that it's weird-- I think it might be clearer if you just have two separate versions of the options that are both set from the command line, and just set them to the same values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have made them two separate options in the top-level shell script.
hainan-xv left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed a couple of small issues.
89e17d0 to 9d0e700 Compare | | ||
| objf_info_.AddStats(weight, objf_num, objf_den, objf_den_exact); | ||
| if (is_backstitch_step1) | ||
| objf_info_.AddStats(weight, objf_num, objf_den, objf_den_exact); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason why we are only doing this for the back step?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doing this for the back step corresponds to the stats computed from the parameters updated after the whole 2-step update on a minibatch
| mic=sdm1 | ||
| stage=-10 | ||
| train_stage=0 | ||
| alpha=0.8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add some comments on what the 2 variables are?
| fi | ||
| | ||
| if [ $stage -le 3 ]; then | ||
| backstitch_opt="--rnnlm.backstitch-scale $alpha --rnnlm.backstitch-interval $back_interval --embedding.backstitch-scale $alpha --embedding.backstitch-interval $back_interval" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this line might be too long
| # nnet learning rate times this factor. | ||
| backstitch_scale=0.0 # backstitch training scale | ||
| backstitch_interval=1 # backstitch training interval | ||
| cmd=run.pl # you might want to set this to queue.pl |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just noticed this here - @danpovey should we just change the default to queue.pl then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer to always leave the default of cmd at run.pl and have it always passed in from the command line.
| backstitch_opt="--rnnlm.backstitch-scale $alpha --rnnlm.backstitch-interval $back_interval --embedding.backstitch-scale $alpha --embedding.backstitch-interval $back_interval" | ||
| rnnlm/train_rnnlm.sh --embedding_l2 $embedding_l2 \ | ||
| --stage $train_stage \ | ||
| --num-epochs $epochs --cmd "queue.pl" $backstitch_opt $dir |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this really work? It doesn't seem to me that rnnlm/train_rnnlm.sh has the 4 variables defined in this string. Shouldn't you use --backstitch-scale and --backstitch-interval here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry. Should be OK now.
hainan-xv left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2nd pass review. The most important issue probably has to do with options to rnnlm/train_rnnlm.sh
hainan-xv left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM now. Though I would suggest running the script for at least one iterations just to make sure it still runs after all these changes.
| Looks like I may have overlooked merging this. I assume this is still good to merge? |
| It needs a little bit more test. I will let you know when it is ready. …On Wed, Feb 28, 2018 at 6:40 PM, Daniel Povey ***@***.***> wrote: Looks like I may have overlooked merging this. I assume this is still good to merge? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#2096 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADWAkjFAnjHtksoqwreRWjOFIRhLa37mks5tZePigaJpZM4RKU3T> . -- Yiming Wang Department of Computer Science The Johns Hopkins University 3400 N. Charles St. Baltimore, MD 21218 |
e346b9e to fa9e7ef Compare | @danpovey I am done with this PR. |
fa9e7ef to 8047fc8 Compare 8047fc8 to 8111d5d Compare
@hainan-xv Please take a look.