- Notifications
You must be signed in to change notification settings - Fork 5.4k
End-to-end chain training #2072
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| Thanks! It may be a few days before I get to this. |
| That looks really interesting. |
| Thanks! |
danpovey left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few minor comments, mostly requests for clarification on the code.
I haven't 100% decided about whether to merge though.
| #include "chain/chain-kernels-ansi.h" | ||
| | ||
| namespace kaldi { | ||
| namespace chain { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to have a comment here explaining what this is all about.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a short comment in the .h file but I will add more.
This class (i.e. GenericNumeratorComputation) is responsible for Forward-Backward (to get the derivatives and total logprob) on a generic numerator graph (i.e. a training fst which can e.g. have self-loops).
src/chain/chain-generic-numerator.cc Outdated
| GenericNumeratorComputation::GenericNumeratorComputation( | ||
| const Supervision &supervision, | ||
| const CuMatrixBase<BaseFloat> &nnet_output): | ||
| // opts_(opts), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove comment
src/chain/chain-generic-numerator.cc Outdated
| const int32 num_sequences = supervision_.num_sequences, | ||
| num_states = max_num_hmm_states_; | ||
| // set alpha_0(0) for all sequences to 1.0 and leave the rest to be 0.0. | ||
| BFloat *first_frame_alpha = alpha_.RowData(0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess you mean BaseFloat... not sure where BFloat came from.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I defined BFloat (in the header file) so I could optionally change it to double to make some parts of computation double-precision. This was mostly meant for the initial experiments. I guess I can change it to BaseFloat now.
src/chain/chain-generic-numerator.h Outdated
| | ||
| private: | ||
| | ||
| enum { kMaxDerivTimeSteps = 4 }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need explanation of what this is.
src/chain/chain-generic-numerator.h Outdated
| // the derivs w.r.t. the nnet outputs (transposed) | ||
| Matrix<BaseFloat> nnet_output_deriv_transposed_; | ||
| | ||
| // forward and backward probs matrices |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How are these indexed and what is the dimension?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Almost all the variables with the same name in this class and in DenominatorComputation (like alpha, beta, kMaxDerivTimeSteps) have the exact same function. I will add some comments to make it clear.
src/chain/chain-supervision.h Outdated
| // label_dim). This should equal the NumPdfs() in the TransitionModel object. | ||
| // Included to avoid training on mismatched egs. | ||
| int32 label_dim; | ||
| bool e2e; // end to end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need explanation- not just of what this means, but of what's going on more generally. Also by declaration of 'e2e_fsts' below. E.g. what do they contain, what e2e training is or where we can find out, etc.
src/chain/chain-supervision.h Outdated
| const ProtoSupervision &proto_supervision, | ||
| Supervision *supervision); | ||
| | ||
| bool TrainingGraphToSupervision( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need documentation.
| leaky_hmm_coeff=0.1 | ||
| num_scale_opts="--transition-scale=1.0 --self-loop-scale=1.0" | ||
| shared_phones=true | ||
| train_set=train_si284_spEx_hires |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like capitals in these naems.
| common_lib.execute_command( | ||
| """copy-transition-model {tree_dir}/final.mdl \ | ||
| {dir}/0.trans_mdl""".format(dir=dir, tree_dir=tree_dir)) | ||
| if not os.path.exists('{dir}/0.trans_mdl'.format(dir=dir)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do this check? Isn't it okay to overwrite the transition model?
I don't think the transition model will ever change for a chain model (since the probabilities on the transitions are fixed), but in case it did (e.g., because the phone set changed but someone for some reason didn't change their experimental directory), shouldn't we prefer to always overwrite?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this change was for the initial experiments where I needed to copy my own transition model w/o being overwritten. I will probably need to revert this.
| '{0}/final.mdl'.format(tree_dir), '{0}/tree'.format(tree_dir), | ||
| '{0}/lat.1.gz'.format(lat_dir), '{0}/final.mdl'.format(lat_dir), | ||
| '{0}/num_jobs'.format(lat_dir), '{0}/splice_opts'.format(lat_dir)] | ||
| '{0}/num_jobs'.format(lat_dir)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is splice_opts no longer required?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it is not even required in regular chain training.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer not to remove this, though, for the sake of backwards compatibility.
src/gmmbin/Makefile Outdated
| BINFILES = gmm-init-mono gmm-est gmm-acc-stats-ali gmm-align \ | ||
| gmm-decode-faster gmm-decode-simple gmm-align-compiled \ | ||
| gmm-sum-accs gmm-est-regtree-fmllr gmm-acc-stats-twofeats \ | ||
| gmm-decode-faster gmm-decode-simple gmm-decode-nbest gmm-align-compiled \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you forget to add gmm-init-trans.cc and gmm-decode-nbest.cc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have made only one change in this file (i.e adding gmm-init-biphone)
The other changes must be due to a conflict.
src/gmmbin/gmm-init-biphone.cc Outdated
| | ||
| const std::vector<int32> &phones = topo.GetPhones(); | ||
| | ||
| std::vector<int32> phone2num_pdf_classes (1 + phones.back()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove whitespace here: s (1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this constructor doing? I don't understand it. Is the last element of phones guaranteed to be the largest phoneme id? Why are you adding plus 1? Is it because phones are indexed from 1 in Kaldi (not sure if that's true right this moment, but I recall something like that being true before).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes (to both questions).
BTW, such constructors are used in a couple of other places in Kaldi.
| ContextDependency *ctx_dep = NULL; | ||
| std::vector<std::vector<int32> > shared_phones; | ||
| if (shared_phones_rxfilename == "") { | ||
| for (size_t i = 0; i < phones.size(); i++) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add braces to this for loop?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a single-line statement, why are suggesting "braces"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a common best practice to always use curly braces, especially since your single-line statement (in my opinion) is rather long.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually the Google style guide doesn't mandate them for one-line statements, and in cases like this I prefer not to have them as they make the code longer.
| | ||
| std::string wav_rspecifier = po.GetArg(1); | ||
| std::string wav_wspecifier = po.GetArg(2); | ||
| if (ClassifyRspecifier(po.GetArg(1), NULL, NULL) != kNoRspecifier) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not very clear to me what you're doing here since it's so large. Since it looks like you made a new set of input options, can you change the usage string of this binary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am adding support for rxfilename and wxfilename so it can be used in a piped command.
I'll update the usage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, thanks!
egs/wsj/s5/local/run_end2end_char.sh Outdated
| | ||
| if [ $stage -le 5 ]; then | ||
| echo "$0: calling the flat-start chain recipe..." | ||
| loacl/chain/run_tdnn_lstm_flatstart.sh |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
loacl -> local
| | ||
| if [ $stage -le 5 ]; then | ||
| echo "$0: calling the flat-start chain recipe..." | ||
| loacl/chain/run_tdnn_flatstart.sh |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
loacl -> local
44d01d9 to 61d9cb6 Compare
danpovey left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I would be OK to commit this stuff, as it doesn't confuse or break anything, but I'd prefer it to be more cleanly separated directory-wise from the regular chain scripts, e.g. move from steps/nnet3/chain/ to steps/nnet3/chain/e2e, and from local/ to local/e2e/. You can also put README files in those e2e directories if you want, to explain what it's about.
| dest='left_deriv_truncate', | ||
| default=None, | ||
| help="Deprecated. Kept for back compatibility") | ||
| parser.add_argument("--chain.apply-deriv-weights", type=str, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a couple of args here seem to be repeated.
| @vimalmanohar, do you have time to give this a quick check before I merge? |
| Sure. I will go through it. …On Fri, Jan 19, 2018 at 6:42 PM Daniel Povey ***@***.***> wrote: @vimalmanohar <https://github.com/vimalmanohar>, do you have time to give this a quick check before I merge? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#2072 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AEATV3Q91CH8O_sx2V3POsO8Qv3DjK5Pks5tMShrgaJpZM4Q7rgT> . -- Vimal Manohar PhD Student Electrical & Computer Engineering Johns Hopkins University |
| | ||
| # training options | ||
| num_epochs=4.5 | ||
| initial_effective_lrate=0.001 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a lot of options are presented here. Some of them can be fixed and removed if they are not too important for tuning.
| for lmtype in tgpr bd_tgpr; do | ||
| steps/nnet3/decode.sh \ | ||
| --acwt 1.0 --post-decode-acwt 10.0 \ | ||
| --extra-left-context $chunk_left_context \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
left and right contexts can be removed for tdnn scripts.
| shared_phones=true | ||
| train_set=train_si284_spe2e_hires | ||
| test_sets="test_dev93 test_eval92" | ||
| first_layer_splice=-1,0,1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think such things can be fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree.
| if [ $stage -le 4 ]; then | ||
| echo "$0: estimating character language model for the denominator graph" | ||
| mkdir -p exp/chain/e2e_base | ||
| cat data/$trainset/text | \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably be run using $train_cmd as systems might not be configured to run such things locally.
| @@ -0,0 +1,93 @@ | |||
| #!/bin/sh | |||
| # Copyright 2017 Hossein Hadian | |||
| | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mention what this script is and also maybe mention similar scripts like _char.sh etc.
| set -o pipefail | ||
| | ||
| extract-segments scp:$srcdir/wav.scp $srcdir/segments ark,scp:$dir/data/segments.ark,$dir/segments.scp | ||
| cat $dir/segments.scp | awk '{ print $1 " wav-copy " $2 " - |" }' >$dir/wav.scp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you need wav-copy here. Programs can directly read from ark files.
egs/wsj/s5/utils/text_to_phones.py Outdated
| import math | ||
| import random | ||
| | ||
| parser = argparse.ArgumentParser(description="""This script reads |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can also be put into functions, but since this is a small script it may not be essential.
| #include "matrix/kaldi-matrix.h" | ||
| #include "hmm/transition-model.h" | ||
| #include "chain/chain-supervision.h" | ||
| #include "cudamatrix/cu-matrix.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you are including files that are not required or can be included in cc files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I guess these were for some unusual experiments.
src/chain/chain-training.cc Outdated
| } else { | ||
| GenericNumeratorComputation numerator(supervision, nnet_output); | ||
| num_logprob_weighted = numerator.Forward(); | ||
| KALDI_LOG << "Numerator logprob per frame: " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this was for debugging. You can probably remove it
src/chain/chain-training.cc Outdated
| KALDI_LOG << "Numerator logprob per frame: " | ||
| << num_logprob_weighted / (*weight); | ||
| num_ok = (num_logprob_weighted - num_logprob_weighted == 0); | ||
| KALDI_LOG << "Numerator Forward " << (num_ok ? "succeeded" : "failed"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also remove this. You can probably keep track of how many failed and print somewhere else. But logging all this would be very slow. At least use like KALDI_VLOG(2) or something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't worry about speed [it happens only once for each minibatch], but it might be unnecessary.
I could use VLOG.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that there should be a count maintained. If you expect it to normally succeed, maybe you could print something out if it failed?
Also I don't like the name num_ok, it's ambiguous. I prefer numerator_ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It never fails for WSJ. It could fail a few times (for long utterances, only during the first 10 or so iterations) on harder databases (like SWBD). I guess printing something if it fails is enough if you agree?
I guess maintaining a count involves changing a few different functions/classes (like ObjectiveFunctionInfo, UpdateStats, and even ComputeChainObjfAndDeriv). Should I do it?
I will fix the name num_ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, don't change those classes, since they interact with other code.
But there's something else that I'd like you to change, that I just noticed.
Your changes are making the function ComputeChainObjfAndDeriv quite complicated.
I'd like you to, in chain-training.cc but not the header, introduce a new function
ComputeChainObjfAndDerivE2e, and if supervision.e2e is true, call that function.
That will keep the main current code path simple and free of your changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do.
| print "Exporting to ", outdir, "..." | ||
| spks = {} | ||
| | ||
| f = open(os.path.join(outdir, 'text'), 'w') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While changing this file, also try to use with open(***, 'w') as f: approach.
| | ||
| durs = [] | ||
| for u in utts: | ||
| durs += [u.dur] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use durs.append(u.dur). It is easier to understand. += has ambiguous behavior and is hard to read. Same thing for list concatenations elsewhere.
| | ||
| durs = [] | ||
| d = start_dur | ||
| f = open(os.path.join(args.dir, 'allowed_durs.txt'), 'wb') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use with construct here too. You can give both handles in the same statement.
| utt2spk = read_kaldi_mapfile(os.path.join(dir, 'utt2spk')) | ||
| for utt in wav_scp: | ||
| if utt in text and utt in utt2dur and utt in utt2spk: | ||
| utts += [Utterance(utt, wav_scp[utt], utt2spk[utt], text[utt], utt2dur[utt])] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
utts.append(Utterance(***)) is the usual way to do it.
| f = open(os.path.join(args.dir, 'allowed_durs.txt'), 'wb') | ||
| f2 = open(os.path.join(args.dir, 'allowed_lengths.txt'), 'wb') | ||
| while d < end_dur: | ||
| length = int(d*1000 - 25) / 10 + 1 # for the most common length of frames and overlap which is 25ms and 10ms |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this script only work for 25ms and 10ms case? If not, you can comment that it should work for other reasonable cases. Alternatively, you can get these values from args.
| . ./path.sh | ||
| | ||
| if [ $# != 2 ]; then | ||
| echo "Usage: " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe extract_wav_segments_data_dir.sh to be clear you are extracting wav data.
egs/wsj/s5/utils/text_to_phones.py Outdated
| parser.add_argument('langdir', type=str) | ||
| parser.add_argument('lex', type=str, | ||
| help='lexicon.txt') | ||
| parser.add_argument('--be-silprob', type=float, default=0.8, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe needs different names like edge-silprob and between-silprob
| echo "$0: estimating phone language model for the denominator graph" | ||
| mkdir -p exp/chain/e2e_base | ||
| cat data/$trainset/text | \ | ||
| utils/text_to_phones.py data/lang_nosp data/local/dict_nosp/lexicon.txt | \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe you can use the lexicon in data/lang_nosp/phones/align_lexicon.txt. When you do this, you also don't need position-independent option because align_lexicon.txt already has that accounted for.
egs/wsj/s5/utils/text_to_phones.py Outdated
| and end.""") | ||
| parser.add_argument('--silprob', type=float, default=0.2, | ||
| help="Probability of optional silence between the words.") | ||
| parser.add_argument('--pos-independent', action='store_true') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my other comment about removing this option.
egs/wsj/s5/utils/text_to_phones.py Outdated
| sil = open(args.langdir + "/phones/optional_silence.txt").readline().strip() | ||
| | ||
| # unk | ||
| unk = open(args.langdir + "/oov.txt").readline().strip() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps you should name this as oov_word to be clear that it is the word and not the unk phone.
| Hi, I address some bugs in the following files: utils/data/perturb_speed_to_allowed_lengths.py line 212: line 230: line 241L In steps/libs/nnet3/train/chain_objf/acoustic_model.py line 138: I also found that there is no --add-deltas option for --egs.opts "--normalize-egs true --add-deltas false" in local/chain/e2e/run_tdnn_flatstart.sh Thanks |
| Thanks, I've already fixed all of them (and other issues) and I'll push them soon. |
0edcafc to 5ac5448 Compare | I addressed the issues. I guess this is ready. |
| Thanks, will try to look at this today or to-morrow. …On Sat, Jan 27, 2018 at 3:53 PM, Hossein Hadian ***@***.***> wrote: I addressed the issues. I guess this is ready. @danpovey <https://github.com/danpovey>, all the scripts directly dealing with e2e are in separate directories but there are 3 other shared scripts ( text_to_phones.py, perturb_speed_to_allowed_lengths.py, and extract_wav_segments_data_dir.sh) which are in utils and utils/data. Should I move them to e.g. utils/e2e too? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#2072 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADJVu7xBs4A-f60nSd8D0ts6uKTgfiBgks5tO4zLgaJpZM4Q7rgT> . |
danpovey left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A bunch of comments, none of them too major.
(But as we discussed offline, you'll make this much faster later on, after I merge.)
| frames_per_eg=$(cat $data/allowed_lengths.txt | tr '\n' , | sed 's/,$//') | ||
| | ||
| [ ! -f "$data/utt2len" ] && feat-to-len scp:$data/feats.scp ark,t:$data/utt2len | ||
| # TODO(222 --> most frequent len) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
get rid of TODO.. see the comment a few lines below also.
src/chain/chain-generic-numerator.cc Outdated
| // the extra B is for storing alpha sums | ||
| beta_.Resize(2, max_num_hmm_states_ * B, kSetZero); | ||
| | ||
| // init incmoing transitions for easy access |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo. And all comments should start with a capital. According to Google guidelines they should be proper sentences.
src/chain/chain-generic-numerator.cc Outdated
| KALDI_ASSERT(supervision.num_sequences * | ||
| supervision.frames_per_sequence == nnet_output.NumRows() && | ||
| supervision.label_dim == nnet_output.NumCols()); | ||
| exp_nnet_output_transposed_.ApplyExp(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would probably be more efficient to apply the exp on the GPU, to a copy of the matrix, and then copy to CPU.
src/chain/chain-supervision.cc Outdated
| ReadBasicType(is, binary, &label_dim); | ||
| if (!binary) { | ||
| ReadFstKaldi(is, binary, &fst); | ||
| ExpectToken(is, binary, "<End2End>"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On-disk formats of things like egs need to back-compatible. You can use PeekToken(); look in nnet-simple-compoent.cc for examples.
| e2e(other.e2e), e2e_fsts(other.e2e_fsts) { } | ||
| | ||
| void AppendSupervision(const std::vector<const Supervision*> &input, | ||
| bool compactify, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These functions AppendSupervision() and AddWeightToSupervisionFst() have become very long due to this e2e stuff. We try to avoid having functions longer than about 40 lines. I think the best way to keep the two parts of the code separate would be to have a statement near the beginning of those functions that checks if the 'e2e' flag is set, and if it is, recurses to a static function with a name like AppendSupervisionE2e().
| po.PrintUsage(); | ||
| exit(1); | ||
| } | ||
| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this file seems to have a bunch of whitespace lines that are probably excessive.
src/gmmbin/gmm-init-biphone.cc Outdated
| | ||
| const std::vector<int32> &phones = topo.GetPhones(); | ||
| | ||
| std::vector<int32> phone2num_pdf_classes (1 + phones.back()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove space before (.
| u3 = copy.deepcopy(u) | ||
| u3.id = 'pv3-' + u.id | ||
| u3.speaker = 'pv3-' + u.speaker | ||
| u3.wavefile = '{} extend-wav-with-silence --extra-silence-length={} - - | '.format(u.wavefile, delta) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a reason you can't just use 'sox pad'?
| @@ -0,0 +1,73 @@ | |||
| #!/usr/bin/env python | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this script is kind of specialized so I'd rather not put it here-- please move it to one of the 'e2e' directories in steps/.
src/chain/chain-supervision.cc Outdated
| delete compact_fst; | ||
| } | ||
| } | ||
| ExpectToken(is, binary, "</FSTs>"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer these tokens to be called Fsts rather than FSTs.
src/gmmbin/gmm-init-biphone.cc Outdated
| | ||
| | ||
| std::string topo_filename = po.GetArg(1); | ||
| int dim = atoi(po.GetArg(2).c_str()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And please use ConvertStringToInteger, e.g. if (!ConvertStringToInteger(...) || dim <= 0 || dim > 10000) KALDI_ERR...
| if utt in text and utt in utt2dur and utt in utt2spk: | ||
| utterances.append(Utterance(utt, wav_scp[utt], utt2spk[utt], | ||
| text[utt], utt2dur[utt])) | ||
| else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Try to add some count like num_success, num_fail and print them. Don't print warning for every utterance. There should also be a special check that the input data directory has been passed through extract_wav_segments script. Otherwise it would generate a lot of warnings without any information on what actually is the issue.
Also check how many utterances you got compared to the ones in wav_scp, and exit if its zero or less than some percentage.
| | ||
| if [ $# != 2 ]; then | ||
| echo "Usage: $0 <srcdir> <destdir>" | ||
| echo " This script copies data directory <srcdir> to <destdir> and gets" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This script will be very slow. Split the data and run the command in parallel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you do ad --nj and --cmd options to this script, don't set the default of --nj to more than 4, for I/O reasons.
| set -e -o pipefail | ||
| utils/copy_data_dir.sh $srcdir $dir | ||
| | ||
| extract-segments scp:$srcdir/wav.scp $srcdir/segments ark,scp:$dir/data/wav_segments.ark,$dir/segments.scp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better to not put the temporary files like $dir/segments.scp in $dir. Keep it in $dir/data.
Better to rename it to wav_segments.scp to avoid confusion with segments file.
| help="""Chain frame subsampling factor. | ||
| See steps/nnet3/chain/train.py""") | ||
| parser.add_argument('--speed-perturb', type=bool, default=True, | ||
| help="""If false, no speed perturbation will occur, i.e. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Type casting to bool will change "false" to True. You can instead type=str, choices=['true','false'] and then convert the string to bool yourself.
| Yes nj=4 and cmd=run.pl can be the default. On Fri, Feb 9, 2018 at 1:44 PM Daniel Povey ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In egs/wsj/s5/utils/data/extract_wav_segments_data_dir.sh <#2072 (comment)>: > @@ -0,0 +1,37 @@ +#!/bin/bash + +# Copyright 2017 Hossein Hadian +# Apache 2.0 + +# This script copies a data directory (which has a 'segments' file), extracting +# wav segments (according to the 'segments' file) +# so that the resulting data directory does not have a 'segments' file anymore. + +. utils/parse_options.sh +. ./path.sh + +if [ $# != 2 ]; then + echo "Usage: $0 <srcdir> <destdir>" + echo " This script copies data directory <srcdir> to <destdir> and gets" If you do ad --nj and --cmd options to this script, don't set the default of --nj to more than 4, for I/O reasons. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#2072 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AEATV6mXxXAf8LLBVT9cJCDwR6bry535ks5tTJIfgaJpZM4Q7rgT> . -- Vimal Manohar PhD Student Electrical & Computer Engineering Johns Hopkins University |
| ## Add two versions for the second allowed_duration | ||
| ## one version is by using speed modification using sox | ||
| ## the other is by extending by silence | ||
| if not args.speed_perturb: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There should be no not here.
| Hm, I thought this had been merged already. It seems not. I think there may be un-handled comments still though, and I see a conflict now. |
| This is almost ready (have not pushed yet). Two notes: About |
| Regarding extract_wav_segments.sh: Using "sox trim" will be extremely wasteful of disk I/O when the time comes to actually use the files, because it will have to read the entire file from disk for each segment. …On Sat, Mar 3, 2018 at 1:05 PM, Hossein Hadian ***@***.***> wrote: This is almost ready (have not pushed yet). Two notes: I didn't know there was a sox pad. I checked it now, but it seems that - unlike extend-wav-with-silence - it does not use the silence at the begin/end to make the padding (i.e. it just inserts some samples with 0 amplitude). About extract_wav_segments.sh, I guess a better way than make it use parallel jobs is to change it so that it does not do any actual wave processing and instead writes sox trim commands in wav.scp with the segment start/end times taken from the 'segments' file. It will be faster and it will save disk space. What would you think? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#2072 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADJVu1Rh8jfvs_J4XNIBN2PX7egdKoPSks5tatttgaJpZM4Q7rgT> . |
| All issues are addressed. |
This is a version of LF-MMI that doesn't require bootstrapping from a baseline GMM system. Conflicts: src/chain/chain-training.cc
This is a version of LF-MMI that doesn't require bootstrapping from a baseline GMM system.
This PR is for review to decide whether to merge or not.
Currently, I have added phone-based and lexicon-free (character-based) recipes for WSJ only.