End-to-end chain training #2072

hhadian · 2017-12-08T21:39:36Z

This PR is for review to decide whether to merge or not.
Currently, I have added phone-based and lexicon-free (character-based) recipes for WSJ only.

danpovey · 2017-12-08T21:41:34Z

Thanks! It may be a few days before I get to this.

vince62s · 2017-12-12T16:46:40Z

That looks really interesting.
Do you think that in this simplified training recipe, the ngram order of the phone LM would have an impact ? btw did you keep 3 as the default ngram order ?

hhadian · 2017-12-12T17:15:35Z

Thanks!
I am not sure about that because I didn't do any experiments regarding the denominator LM. But I plan to do a few basic experiments and I can share the results with you.
The default ngram order is 4 and the default no-prune ngram-order is 3 (similar to regular chain recipes).

danpovey

A few minor comments, mostly requests for clarification on the code.
I haven't 100% decided about whether to merge though.

danpovey · 2017-12-12T21:40:36Z

src/chain/chain-generic-numerator.cc

+#include "chain/chain-kernels-ansi.h"
+
+namespace kaldi {
+namespace chain {


It would be nice to have a comment here explaining what this is all about.

There is a short comment in the .h file but I will add more.
This class (i.e. GenericNumeratorComputation) is responsible for Forward-Backward (to get the derivatives and total logprob) on a generic numerator graph (i.e. a training fst which can e.g. have self-loops).

danpovey · 2017-12-12T21:40:43Z

src/chain/chain-generic-numerator.cc

+GenericNumeratorComputation::GenericNumeratorComputation(
+ const Supervision &supervision,
+ const CuMatrixBase<BaseFloat> &nnet_output):
+ // opts_(opts),


remove comment

danpovey · 2017-12-12T21:42:10Z

src/chain/chain-generic-numerator.cc

+ const int32 num_sequences = supervision_.num_sequences,
+ num_states = max_num_hmm_states_;
+ // set alpha_0(0) for all sequences to 1.0 and leave the rest to be 0.0.
+ BFloat *first_frame_alpha = alpha_.RowData(0);


I guess you mean BaseFloat... not sure where BFloat came from.

I defined BFloat (in the header file) so I could optionally change it to double to make some parts of computation double-precision. This was mostly meant for the initial experiments. I guess I can change it to BaseFloat now.

danpovey · 2017-12-12T21:43:20Z

src/chain/chain-generic-numerator.h

+
+ private:
+
+ enum { kMaxDerivTimeSteps = 4 };


need explanation of what this is.

danpovey · 2017-12-12T21:43:41Z

src/chain/chain-generic-numerator.h

+ // the derivs w.r.t. the nnet outputs (transposed)
+ Matrix<BaseFloat> nnet_output_deriv_transposed_;
+
+ // forward and backward probs matrices


How are these indexed and what is the dimension?

Almost all the variables with the same name in this class and in DenominatorComputation (like alpha, beta, kMaxDerivTimeSteps) have the exact same function. I will add some comments to make it clear.

danpovey · 2017-12-12T21:45:34Z

src/chain/chain-supervision.h

 // label_dim). This should equal the NumPdfs() in the TransitionModel object.
 // Included to avoid training on mismatched egs.
 int32 label_dim;
+ bool e2e; // end to end


Need explanation- not just of what this means, but of what's going on more generally. Also by declaration of 'e2e_fsts' below. E.g. what do they contain, what e2e training is or where we can find out, etc.

danpovey · 2017-12-12T21:45:45Z

src/chain/chain-supervision.h

 const ProtoSupervision &proto_supervision,
 Supervision *supervision);

+bool TrainingGraphToSupervision(


Need documentation.

danpovey · 2017-12-12T21:46:41Z

egs/wsj/s5/local/chain/run_tdnn_flatstart.sh

+leaky_hmm_coeff=0.1
+num_scale_opts="--transition-scale=1.0 --self-loop-scale=1.0"
+shared_phones=true
+train_set=train_si284_spEx_hires


I don't like capitals in these naems.

galv · 2017-12-12T22:59:33Z

egs/wsj/s5/steps/libs/nnet3/train/chain_objf/acoustic_model.py

- common_lib.execute_command(
- """copy-transition-model {tree_dir}/final.mdl \
- {dir}/0.trans_mdl""".format(dir=dir, tree_dir=tree_dir))
+ if not os.path.exists('{dir}/0.trans_mdl'.format(dir=dir)):


Why do this check? Isn't it okay to overwrite the transition model?

I don't think the transition model will ever change for a chain model (since the probabilities on the transitions are fixed), but in case it did (e.g., because the phone set changed but someone for some reason didn't change their experimental directory), shouldn't we prefer to always overwrite?

I guess this change was for the initial experiments where I needed to copy my own transition model w/o being overwritten. I will probably need to revert this.

galv · 2017-12-12T23:10:46Z

egs/wsj/s5/steps/libs/nnet3/train/chain_objf/acoustic_model.py

 '{0}/final.mdl'.format(tree_dir), '{0}/tree'.format(tree_dir),
 '{0}/lat.1.gz'.format(lat_dir), '{0}/final.mdl'.format(lat_dir),
- '{0}/num_jobs'.format(lat_dir), '{0}/splice_opts'.format(lat_dir)]
+ '{0}/num_jobs'.format(lat_dir)]


Why is splice_opts no longer required?

I guess it is not even required in regular chain training.

I prefer not to remove this, though, for the sake of backwards compatibility.

galv · 2017-12-12T23:14:23Z

src/gmmbin/Makefile

 BINFILES = gmm-init-mono gmm-est gmm-acc-stats-ali gmm-align \
- gmm-decode-faster gmm-decode-simple gmm-align-compiled \
- gmm-sum-accs gmm-est-regtree-fmllr gmm-acc-stats-twofeats \
+ gmm-decode-faster gmm-decode-simple gmm-decode-nbest gmm-align-compiled \


Did you forget to add gmm-init-trans.cc and gmm-decode-nbest.cc?

I have made only one change in this file (i.e adding gmm-init-biphone)
The other changes must be due to a conflict.

galv · 2017-12-12T23:17:10Z

src/gmmbin/gmm-init-biphone.cc

+
+ const std::vector<int32> &phones = topo.GetPhones();
+
+ std::vector<int32> phone2num_pdf_classes (1 + phones.back());


Remove whitespace here: s (1

What is this constructor doing? I don't understand it. Is the last element of phones guaranteed to be the largest phoneme id? Why are you adding plus 1? Is it because phones are indexed from 1 in Kaldi (not sure if that's true right this moment, but I recall something like that being true before).

Yes (to both questions).
BTW, such constructors are used in a couple of other places in Kaldi.

galv · 2017-12-12T23:45:32Z

src/gmmbin/gmm-init-biphone.cc

+ ContextDependency *ctx_dep = NULL;
+ std::vector<std::vector<int32> > shared_phones;
+ if (shared_phones_rxfilename == "") {
+ for (size_t i = 0; i < phones.size(); i++)


Can you add braces to this for loop?

It's a single-line statement, why are suggesting "braces"?

Just a common best practice to always use curly braces, especially since your single-line statement (in my opinion) is rather long.

Actually the Google style guide doesn't mandate them for one-line statements, and in cases like this I prefer not to have them as they make the code longer.

galv · 2017-12-12T23:49:01Z

src/online2bin/extend-wav-with-silence.cc


- std::string wav_rspecifier = po.GetArg(1);
- std::string wav_wspecifier = po.GetArg(2);
+ if (ClassifyRspecifier(po.GetArg(1), NULL, NULL) != kNoRspecifier) {


It's not very clear to me what you're doing here since it's so large. Since it looks like you made a new set of input options, can you change the usage string of this binary?

I am adding support for rxfilename and wxfilename so it can be used in a piped command.
I'll update the usage.

Great, thanks!

entn-at · 2017-12-12T23:55:26Z

egs/wsj/s5/local/run_end2end_char.sh

+
+if [ $stage -le 5 ]; then
+ echo "$0: calling the flat-start chain recipe..."
+ loacl/chain/run_tdnn_lstm_flatstart.sh


loacl -> local

entn-at · 2017-12-12T23:55:34Z

egs/wsj/s5/local/run_end2end_phone.sh

+
+if [ $stage -le 5 ]; then
+ echo "$0: calling the flat-start chain recipe..."
+ loacl/chain/run_tdnn_flatstart.sh


loacl -> local

danpovey

I think I would be OK to commit this stuff, as it doesn't confuse or break anything, but I'd prefer it to be more cleanly separated directory-wise from the regular chain scripts, e.g. move from steps/nnet3/chain/ to steps/nnet3/chain/e2e, and from local/ to local/e2e/. You can also put README files in those e2e directories if you want, to explain what it's about.

danpovey · 2017-12-31T00:02:32Z

egs/wsj/s5/steps/nnet3/chain/train_e2e.py

+ dest='left_deriv_truncate',
+ default=None,
+ help="Deprecated. Kept for back compatibility")
+ parser.add_argument("--chain.apply-deriv-weights", type=str,


a couple of args here seem to be repeated.

danpovey · 2018-01-19T23:42:26Z

@vimalmanohar, do you have time to give this a quick check before I merge?

vimalmanohar · 2018-01-19T23:59:42Z

Sure. I will go through it.

…

On Fri, Jan 19, 2018 at 6:42 PM Daniel Povey ***@***.***> wrote: @vimalmanohar <https://github.com/vimalmanohar>, do you have time to give this a quick check before I merge? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#2072 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AEATV3Q91CH8O_sx2V3POsO8Qv3DjK5Pks5tMShrgaJpZM4Q7rgT> .

-- Vimal Manohar PhD Student Electrical & Computer Engineering Johns Hopkins University

vimalmanohar · 2018-01-20T00:02:44Z

egs/wsj/s5/local/chain/e2e/run_tdnn_flatstart.sh

+
+# training options
+num_epochs=4.5
+initial_effective_lrate=0.001


I think a lot of options are presented here. Some of them can be fixed and removed if they are not too important for tuning.

vimalmanohar · 2018-01-20T00:04:36Z

egs/wsj/s5/local/chain/e2e/run_tdnn_flatstart.sh

+ for lmtype in tgpr bd_tgpr; do
+ steps/nnet3/decode.sh \
+ --acwt 1.0 --post-decode-acwt 10.0 \
+ --extra-left-context $chunk_left_context \


left and right contexts can be removed for tdnn scripts.

vimalmanohar · 2018-01-20T00:05:25Z

egs/wsj/s5/local/chain/e2e/run_tdnn_flatstart.sh

+shared_phones=true
+train_set=train_si284_spe2e_hires
+test_sets="test_dev93 test_eval92"
+first_layer_splice=-1,0,1


I think such things can be fixed.

vimalmanohar · 2018-01-20T00:07:28Z

egs/wsj/s5/local/e2e/run_end2end_char.sh

+if [ $stage -le 4 ]; then
+ echo "$0: estimating character language model for the denominator graph"
+ mkdir -p exp/chain/e2e_base
+ cat data/$trainset/text | \


This should probably be run using $train_cmd as systems might not be configured to run such things locally.

vimalmanohar · 2018-01-20T00:08:21Z

egs/wsj/s5/local/e2e/run_end2end_phone.sh

@@ -0,0 +1,93 @@
+#!/bin/sh
+# Copyright 2017 Hossein Hadian
+


Mention what this script is and also maybe mention similar scripts like _char.sh etc.

vimalmanohar · 2018-01-20T00:26:11Z

egs/wsj/s5/utils/data/segment_data.sh

+set -o pipefail
+
+extract-segments scp:$srcdir/wav.scp $srcdir/segments ark,scp:$dir/data/segments.ark,$dir/segments.scp
+cat $dir/segments.scp | awk '{ print $1 " wav-copy " $2 " - |" }' >$dir/wav.scp


I don't think you need wav-copy here. Programs can directly read from ark files.

vimalmanohar · 2018-01-20T00:27:57Z

egs/wsj/s5/utils/text_to_phones.py

+import math
+import random
+
+parser = argparse.ArgumentParser(description="""This script reads


This can also be put into functions, but since this is a small script it may not be essential.

vimalmanohar · 2018-01-20T00:29:57Z

src/chain/chain-generic-numerator.h

+#include "matrix/kaldi-matrix.h"
+#include "hmm/transition-model.h"
+#include "chain/chain-supervision.h"
+#include "cudamatrix/cu-matrix.h"


I think you are including files that are not required or can be included in cc files.

Yes, I guess these were for some unusual experiments.

vimalmanohar · 2018-01-20T00:34:04Z

src/chain/chain-training.cc

+ } else {
+ GenericNumeratorComputation numerator(supervision, nnet_output);
+ num_logprob_weighted = numerator.Forward();
+ KALDI_LOG << "Numerator logprob per frame: "


I guess this was for debugging. You can probably remove it

vimalmanohar · 2018-01-20T00:34:52Z

src/chain/chain-training.cc

+ KALDI_LOG << "Numerator logprob per frame: "
+ << num_logprob_weighted / (*weight);
+ num_ok = (num_logprob_weighted - num_logprob_weighted == 0);
+ KALDI_LOG << "Numerator Forward " << (num_ok ? "succeeded" : "failed");


Also remove this. You can probably keep track of how many failed and print somewhere else. But logging all this would be very slow. At least use like KALDI_VLOG(2) or something.

I wouldn't worry about speed [it happens only once for each minibatch], but it might be unnecessary.
I could use VLOG.

I agree that there should be a count maintained. If you expect it to normally succeed, maybe you could print something out if it failed?
Also I don't like the name num_ok, it's ambiguous. I prefer numerator_ok.

It never fails for WSJ. It could fail a few times (for long utterances, only during the first 10 or so iterations) on harder databases (like SWBD). I guess printing something if it fails is enough if you agree?
I guess maintaining a count involves changing a few different functions/classes (like ObjectiveFunctionInfo, UpdateStats, and even ComputeChainObjfAndDeriv). Should I do it?

I will fix the name num_ok.

No, don't change those classes, since they interact with other code.
But there's something else that I'd like you to change, that I just noticed.
Your changes are making the function ComputeChainObjfAndDeriv quite complicated.
I'd like you to, in chain-training.cc but not the header, introduce a new function
ComputeChainObjfAndDerivE2e, and if supervision.e2e is true, call that function.
That will keep the main current code path simple and free of your changes.

vimalmanohar · 2018-01-22T17:11:56Z

egs/wsj/s5/utils/data/perturb_speed_to_allowed_lengths.py

+ print "Exporting to ", outdir, "..."
+ spks = {}
+
+ f = open(os.path.join(outdir, 'text'), 'w')


While changing this file, also try to use with open(***, 'w') as f: approach.

vimalmanohar · 2018-01-22T17:21:49Z

egs/wsj/s5/utils/data/perturb_speed_to_allowed_lengths.py

+
+durs = []
+for u in utts:
+ durs += [u.dur]


Use durs.append(u.dur). It is easier to understand. += has ambiguous behavior and is hard to read. Same thing for list concatenations elsewhere.

vimalmanohar · 2018-01-22T17:23:26Z

egs/wsj/s5/utils/data/perturb_speed_to_allowed_lengths.py

+
+durs = []
+d = start_dur
+f = open(os.path.join(args.dir, 'allowed_durs.txt'), 'wb')


Use with construct here too. You can give both handles in the same statement.

vimalmanohar · 2018-01-22T17:29:11Z

egs/wsj/s5/utils/data/perturb_speed_to_allowed_lengths.py

+ utt2spk = read_kaldi_mapfile(os.path.join(dir, 'utt2spk'))
+ for utt in wav_scp:
+ if utt in text and utt in utt2dur and utt in utt2spk:
+ utts += [Utterance(utt, wav_scp[utt], utt2spk[utt], text[utt], utt2dur[utt])]


utts.append(Utterance(***)) is the usual way to do it.

vimalmanohar · 2018-01-22T17:30:42Z

egs/wsj/s5/utils/data/perturb_speed_to_allowed_lengths.py

+f = open(os.path.join(args.dir, 'allowed_durs.txt'), 'wb')
+f2 = open(os.path.join(args.dir, 'allowed_lengths.txt'), 'wb')
+while d < end_dur:
+ length = int(d*1000 - 25) / 10 + 1 # for the most common length of frames and overlap which is 25ms and 10ms


Does this script only work for 25ms and 10ms case? If not, you can comment that it should work for other reasonable cases. Alternatively, you can get these values from args.

vimalmanohar · 2018-01-22T17:44:07Z

egs/wsj/s5/utils/data/segment_data.sh

+. ./path.sh
+
+if [ $# != 2 ]; then
+ echo "Usage: "


Maybe extract_wav_segments_data_dir.sh to be clear you are extracting wav data.

vimalmanohar · 2018-01-22T17:46:44Z

egs/wsj/s5/utils/text_to_phones.py

+parser.add_argument('langdir', type=str)
+parser.add_argument('lex', type=str,
+ help='lexicon.txt')
+parser.add_argument('--be-silprob', type=float, default=0.8,


Maybe needs different names like edge-silprob and between-silprob

vimalmanohar · 2018-01-22T18:05:19Z

egs/wsj/s5/local/e2e/run_end2end_phone.sh

+ echo "$0: estimating phone language model for the denominator graph"
+ mkdir -p exp/chain/e2e_base
+ cat data/$trainset/text | \
+ utils/text_to_phones.py data/lang_nosp data/local/dict_nosp/lexicon.txt | \


I believe you can use the lexicon in data/lang_nosp/phones/align_lexicon.txt. When you do this, you also don't need position-independent option because align_lexicon.txt already has that accounted for.

vimalmanohar · 2018-01-22T18:07:31Z

egs/wsj/s5/utils/text_to_phones.py

+ and end.""")
+parser.add_argument('--silprob', type=float, default=0.2,
+ help="Probability of optional silence between the words.")
+parser.add_argument('--pos-independent', action='store_true')


See my other comment about removing this option.

vimalmanohar · 2018-01-22T18:09:08Z

egs/wsj/s5/utils/text_to_phones.py

+sil = open(args.langdir + "/phones/optional_silence.txt").readline().strip()
+
+# unk
+unk = open(args.langdir + "/oov.txt").readline().strip()


Perhaps you should name this as oov_word to be clear that it is the word and not the unk phone.

misbullah · 2018-01-26T10:54:34Z

Hi,

I address some bugs in the following files:

utils/data/perturb_speed_to_allowed_lengths.py

line 212:
original: u1.wavefile = '{} sox -t wav - -t wav - speed {} | '.format(u.wavefile, speed)
need to change to be: u1.wavefile = 'sox -t wav {} -t wav - speed {} | '.format(u.wavefile, speed)

line 230:
original: u2.wavefile = '{} sox -t wav -t wav - speed {} | '.format(u.wavefile, speed)
need to change to be: u2.wavefile = 'sox -t wav {} -t wav - speed {} | '.format(u.wavefile, speed)

line 241L
original: u3.wavefile = '{} extend-wav-with-silence --extra-silence-length={} - - | '.format(u.wavefile, delta)
need to change to be: u3.wavefile = 'extend-wav-with-silence --extra-silence-length={} {} - | '.format(delta, u.wavefile)

In steps/libs/nnet3/train/chain_objf/acoustic_model.py

line 138:
original: steps/nnet3/chain/get_egs_e2e.sh
need to change to be: steps/nnet3/chain/e2e/get_egs_e2e.sh

I also found that there is no --add-deltas option for --egs.opts "--normalize-egs true --add-deltas false" in local/chain/e2e/run_tdnn_flatstart.sh

Thanks
Alim

hhadian · 2018-01-26T10:57:11Z

Thanks, I've already fixed all of them (and other issues) and I'll push them soon.

hhadian · 2018-01-27T20:53:27Z

I addressed the issues. I guess this is ready.
@danpovey, all the scripts directly dealing with e2e are in separate directories but there are 3 other shared scripts (text_to_phones.py, perturb_speed_to_allowed_lengths.py, and extract_wav_segments_data_dir.sh) which are in utils and utils/data. Should I move them to e.g. utils/e2e too?

danpovey · 2018-01-27T20:59:09Z

Thanks, will try to look at this today or to-morrow.

…

On Sat, Jan 27, 2018 at 3:53 PM, Hossein Hadian ***@***.***> wrote: I addressed the issues. I guess this is ready. @danpovey <https://github.com/danpovey>, all the scripts directly dealing with e2e are in separate directories but there are 3 other shared scripts ( text_to_phones.py, perturb_speed_to_allowed_lengths.py, and extract_wav_segments_data_dir.sh) which are in utils and utils/data. Should I move them to e.g. utils/e2e too? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#2072 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADJVu7xBs4A-f60nSd8D0ts6uKTgfiBgks5tO4zLgaJpZM4Q7rgT> .

danpovey

A bunch of comments, none of them too major.
(But as we discussed offline, you'll make this much faster later on, after I merge.)

danpovey · 2018-02-01T05:37:59Z

egs/wsj/s5/steps/nnet3/chain/e2e/get_egs_e2e.sh

+frames_per_eg=$(cat $data/allowed_lengths.txt | tr '\n' , | sed 's/,$//')
+
+[ ! -f "$data/utt2len" ] && feat-to-len scp:$data/feats.scp ark,t:$data/utt2len
+# TODO(222 --> most frequent len)


get rid of TODO.. see the comment a few lines below also.

danpovey · 2018-02-01T05:44:02Z

src/chain/chain-generic-numerator.cc

+ // the extra B is for storing alpha sums
+ beta_.Resize(2, max_num_hmm_states_ * B, kSetZero);
+
+ // init incmoing transitions for easy access


typo. And all comments should start with a capital. According to Google guidelines they should be proper sentences.

danpovey · 2018-02-01T21:54:27Z

src/chain/chain-generic-numerator.cc

+ KALDI_ASSERT(supervision.num_sequences *
+ supervision.frames_per_sequence == nnet_output.NumRows() &&
+ supervision.label_dim == nnet_output.NumCols());
+ exp_nnet_output_transposed_.ApplyExp();


It would probably be more efficient to apply the exp on the GPU, to a copy of the matrix, and then copy to CPU.

danpovey · 2018-02-01T21:56:28Z

src/chain/chain-supervision.cc

 ReadBasicType(is, binary, &label_dim);
- if (!binary) {
- ReadFstKaldi(is, binary, &fst);
+ ExpectToken(is, binary, "<End2End>");


On-disk formats of things like egs need to back-compatible. You can use PeekToken(); look in nnet-simple-compoent.cc for examples.

danpovey · 2018-02-01T22:01:05Z

src/chain/chain-supervision.cc

+ e2e(other.e2e), e2e_fsts(other.e2e_fsts) { }

 void AppendSupervision(const std::vector<const Supervision*> &input,
 bool compactify,


These functions AppendSupervision() and AddWeightToSupervisionFst() have become very long due to this e2e stuff. We try to avoid having functions longer than about 40 lines. I think the best way to keep the two parts of the code separate would be to have a statement near the beginning of those functions that checks if the 'e2e' flag is set, and if it is, recurses to a static function with a name like AppendSupervisionE2e().

danpovey · 2018-02-01T22:07:45Z

src/gmmbin/gmm-init-biphone.cc

+ po.PrintUsage();
+ exit(1);
+ }
+


this file seems to have a bunch of whitespace lines that are probably excessive.

danpovey · 2018-02-01T22:09:35Z

src/gmmbin/gmm-init-biphone.cc

+
+ const std::vector<int32> &phones = topo.GetPhones();
+
+ std::vector<int32> phone2num_pdf_classes (1 + phones.back());


remove space before (.

danpovey · 2018-02-01T22:12:02Z

egs/wsj/s5/utils/data/perturb_speed_to_allowed_lengths.py

+ u3 = copy.deepcopy(u)
+ u3.id = 'pv3-' + u.id
+ u3.speaker = 'pv3-' + u.speaker
+ u3.wavefile = '{} extend-wav-with-silence --extra-silence-length={} - - | '.format(u.wavefile, delta)


is there a reason you can't just use 'sox pad'?

danpovey · 2018-02-01T22:13:31Z

egs/wsj/s5/utils/text_to_phones.py

@@ -0,0 +1,73 @@
+#!/usr/bin/env python


this script is kind of specialized so I'd rather not put it here-- please move it to one of the 'e2e' directories in steps/.

danpovey · 2018-02-01T22:16:28Z

src/chain/chain-supervision.cc

+ delete compact_fst;
+ }
+ }
+ ExpectToken(is, binary, "</FSTs>");


I'd prefer these tokens to be called Fsts rather than FSTs.

danpovey · 2018-02-01T22:55:13Z

src/gmmbin/gmm-init-biphone.cc

+
+
+ std::string topo_filename = po.GetArg(1);
+ int dim = atoi(po.GetArg(2).c_str());


And please use ConvertStringToInteger, e.g. if (!ConvertStringToInteger(...) || dim <= 0 || dim > 10000) KALDI_ERR...

vimalmanohar · 2018-02-09T17:42:23Z

egs/wsj/s5/utils/data/perturb_speed_to_allowed_lengths.py

+ if utt in text and utt in utt2dur and utt in utt2spk:
+ utterances.append(Utterance(utt, wav_scp[utt], utt2spk[utt],
+ text[utt], utt2dur[utt]))
+ else:


Try to add some count like num_success, num_fail and print them. Don't print warning for every utterance. There should also be a special check that the input data directory has been passed through extract_wav_segments script. Otherwise it would generate a lot of warnings without any information on what actually is the issue.
Also check how many utterances you got compared to the ones in wav_scp, and exit if its zero or less than some percentage.

vimalmanohar · 2018-02-09T17:48:02Z

egs/wsj/s5/utils/data/extract_wav_segments_data_dir.sh

+
+if [ $# != 2 ]; then
+ echo "Usage: $0 <srcdir> <destdir>"
+ echo " This script copies data directory <srcdir> to <destdir> and gets"


This script will be very slow. Split the data and run the command in parallel.

If you do ad --nj and --cmd options to this script, don't set the default of --nj to more than 4, for I/O reasons.

vimalmanohar · 2018-02-09T17:57:11Z

egs/wsj/s5/utils/data/extract_wav_segments_data_dir.sh

+set -e -o pipefail
+utils/copy_data_dir.sh $srcdir $dir
+
+extract-segments scp:$srcdir/wav.scp $srcdir/segments ark,scp:$dir/data/wav_segments.ark,$dir/segments.scp


Better to not put the temporary files like $dir/segments.scp in $dir. Keep it in $dir/data.
Better to rename it to wav_segments.scp to avoid confusion with segments file.

vimalmanohar · 2018-02-09T19:01:33Z

egs/wsj/s5/utils/data/perturb_speed_to_allowed_lengths.py

+ help="""Chain frame subsampling factor.
+ See steps/nnet3/chain/train.py""")
+ parser.add_argument('--speed-perturb', type=bool, default=True,
+ help="""If false, no speed perturbation will occur, i.e.


Type casting to bool will change "false" to True. You can instead type=str, choices=['true','false'] and then convert the string to bool yourself.

vimalmanohar · 2018-02-09T19:04:39Z

Yes nj=4 and cmd=run.pl can be the default.

On Fri, Feb 9, 2018 at 1:44 PM Daniel Povey ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In egs/wsj/s5/utils/data/extract_wav_segments_data_dir.sh <#2072 (comment)>: > @@ -0,0 +1,37 @@ +#!/bin/bash + +# Copyright 2017 Hossein Hadian +# Apache 2.0 + +# This script copies a data directory (which has a 'segments' file), extracting +# wav segments (according to the 'segments' file) +# so that the resulting data directory does not have a 'segments' file anymore. + +. utils/parse_options.sh +. ./path.sh + +if [ $# != 2 ]; then + echo "Usage: $0 <srcdir> <destdir>" + echo " This script copies data directory <srcdir> to <destdir> and gets" If you do ad --nj and --cmd options to this script, don't set the default of --nj to more than 4, for I/O reasons. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#2072 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AEATV6mXxXAf8LLBVT9cJCDwR6bry535ks5tTJIfgaJpZM4Q7rgT> .

-- Vimal Manohar PhD Student Electrical & Computer Engineering Johns Hopkins University

vimalmanohar · 2018-02-09T19:28:54Z

egs/wsj/s5/utils/data/perturb_speed_to_allowed_lengths.py

+ ## Add two versions for the second allowed_duration
+ ## one version is by using speed modification using sox
+ ## the other is by extending by silence
+ if not args.speed_perturb:


There should be no not here.

danpovey · 2018-02-28T23:36:25Z

Hm, I thought this had been merged already. It seems not. I think there may be un-handled comments still though, and I see a conflict now.

hhadian · 2018-03-03T18:05:29Z

This is almost ready (have not pushed yet). Two notes:
I didn't know there was a sox pad. I checked it now, but it seems that - unlike extend-wav-with-silence - it does not use the silence at the begin/end to make the padding (i.e. it just inserts some samples with 0 amplitude).

About extract_wav_segments.sh, I guess a better way than make it use parallel jobs is to change it so that it does not do any actual wave processing and instead writes sox trim commands in wav.scp with the segment start/end times taken from the 'segments' file. It will be faster and it will save disk space. What would you think?

danpovey · 2018-03-03T18:43:51Z

Regarding extract_wav_segments.sh: Using "sox trim" will be extremely wasteful of disk I/O when the time comes to actually use the files, because it will have to read the entire file from disk for each segment.

…

On Sat, Mar 3, 2018 at 1:05 PM, Hossein Hadian ***@***.***> wrote: This is almost ready (have not pushed yet). Two notes: I didn't know there was a sox pad. I checked it now, but it seems that - unlike extend-wav-with-silence - it does not use the silence at the begin/end to make the padding (i.e. it just inserts some samples with 0 amplitude). About extract_wav_segments.sh, I guess a better way than make it use parallel jobs is to change it so that it does not do any actual wave processing and instead writes sox trim commands in wav.scp with the segment start/end times taken from the 'segments' file. It will be faster and it will save disk space. What would you think? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#2072 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADJVu1Rh8jfvs_J4XNIBN2PX7egdKoPSks5tatttgaJpZM4Q7rgT> .

hhadian · 2018-03-05T15:20:41Z

All issues are addressed.

This is a version of LF-MMI that doesn't require bootstrapping from a baseline GMM system. Conflicts: src/chain/chain-training.cc

This is a version of LF-MMI that doesn't require bootstrapping from a baseline GMM system.

danpovey reviewed Dec 12, 2017

View reviewed changes

galv reviewed Dec 12, 2017

View reviewed changes

entn-at reviewed Dec 12, 2017

View reviewed changes

hhadian force-pushed the clean_end2end branch from 44d01d9 to 61d9cb6 Compare December 16, 2017 20:28

danpovey reviewed Dec 31, 2017

View reviewed changes

vimalmanohar reviewed Jan 20, 2018

View reviewed changes

vimalmanohar reviewed Jan 22, 2018

View reviewed changes

hhadian force-pushed the clean_end2end branch from 0edcafc to 5ac5448 Compare January 27, 2018 00:53

danpovey reviewed Feb 1, 2018

View reviewed changes

vimalmanohar reviewed Feb 9, 2018

View reviewed changes

hhadian added 3 commits March 2, 2018 11:55

Add end-to-end chain codes + WSJ example - not fully tested yet

dac9d94

Add a character-based (lexicon-free) recipe

2b9287b

Add some fixes + results

f020f51

hhadian added 13 commits March 2, 2018 11:55

Some minor fixes

b11c65a

Add more docs; minor fixes

926f45b

fix gmmbin Makefile

a025705

More minor fixes

9ae723c

Remove unnecessary binary from Makefile

7490063

minor indent fix

157475f

Add more docs + isolate the e2e scripts

a3889f6

Minor fix

1228749

Add various minor fixes + some code cleaning

109853a

Revert any changes to chain-lib (i.e. acoustic_model.py)

c37efd0

More minor fixes

0ca5433

Update results

7534076

More minor fixes

73bf8fa

Address different minor issues in the PR review

b349bf7

hhadian force-pushed the clean_end2end branch from ee0e1cf to b349bf7 Compare March 3, 2018 20:43

danpovey merged commit e74b918 into kaldi-asr:master Mar 6, 2018

LvHang pushed a commit to LvHang/kaldi that referenced this pull request Apr 14, 2018

[src,scripts,egs] End-to-end chain training (kaldi-asr#2072)

d0fe434

This is a version of LF-MMI that doesn't require bootstrapping from a baseline GMM system. Conflicts: src/chain/chain-training.cc

Skaiste pushed a commit to Skaiste/idlak that referenced this pull request Sep 26, 2018

[src,scripts,egs] End-to-end chain training (kaldi-asr#2072)

329b128

This is a version of LF-MMI that doesn't require bootstrapping from a baseline GMM system.


		const std::vector<int32> &phones = topo.GetPhones();

		std::vector<int32> phone2num_pdf_classes (1 + phones.back());



		std::string topo_filename = po.GetArg(1);
		int dim = atoi(po.GetArg(2).c_str());

End-to-end chain training #2072

End-to-end chain training #2072

Uh oh!

Conversation

hhadian commented Dec 8, 2017

danpovey commented Dec 8, 2017

vince62s commented Dec 12, 2017

hhadian commented Dec 12, 2017

danpovey left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hhadian Dec 13, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

danpovey left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

danpovey commented Jan 19, 2018

vimalmanohar commented Jan 19, 2018 via email

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

misbullah commented Jan 26, 2018

hhadian commented Jan 26, 2018

hhadian commented Jan 27, 2018

danpovey commented Jan 27, 2018 via email

danpovey left a comment

Choose a reason for hiding this comment

hhadian Dec 13, 2017 •

edited

Loading