Skip to content

Conversation

@hhadian
Copy link
Contributor

@hhadian hhadian commented Dec 8, 2017

This PR is for review to decide whether to merge or not.
Currently, I have added phone-based and lexicon-free (character-based) recipes for WSJ only.

@danpovey
Copy link
Contributor

danpovey commented Dec 8, 2017

Thanks! It may be a few days before I get to this.

@vince62s
Copy link
Contributor

That looks really interesting.
Do you think that in this simplified training recipe, the ngram order of the phone LM would have an impact ? btw did you keep 3 as the default ngram order ?

@hhadian
Copy link
Contributor Author

hhadian commented Dec 12, 2017

Thanks!
I am not sure about that because I didn't do any experiments regarding the denominator LM. But I plan to do a few basic experiments and I can share the results with you.
The default ngram order is 4 and the default no-prune ngram-order is 3 (similar to regular chain recipes).

Copy link
Contributor

@danpovey danpovey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few minor comments, mostly requests for clarification on the code.
I haven't 100% decided about whether to merge though.

#include "chain/chain-kernels-ansi.h"

namespace kaldi {
namespace chain {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to have a comment here explaining what this is all about.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a short comment in the .h file but I will add more.
This class (i.e. GenericNumeratorComputation) is responsible for Forward-Backward (to get the derivatives and total logprob) on a generic numerator graph (i.e. a training fst which can e.g. have self-loops).

GenericNumeratorComputation::GenericNumeratorComputation(
const Supervision &supervision,
const CuMatrixBase<BaseFloat> &nnet_output):
// opts_(opts),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove comment

const int32 num_sequences = supervision_.num_sequences,
num_states = max_num_hmm_states_;
// set alpha_0(0) for all sequences to 1.0 and leave the rest to be 0.0.
BFloat *first_frame_alpha = alpha_.RowData(0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess you mean BaseFloat... not sure where BFloat came from.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I defined BFloat (in the header file) so I could optionally change it to double to make some parts of computation double-precision. This was mostly meant for the initial experiments. I guess I can change it to BaseFloat now.


private:

enum { kMaxDerivTimeSteps = 4 };
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need explanation of what this is.

// the derivs w.r.t. the nnet outputs (transposed)
Matrix<BaseFloat> nnet_output_deriv_transposed_;

// forward and backward probs matrices
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How are these indexed and what is the dimension?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost all the variables with the same name in this class and in DenominatorComputation (like alpha, beta, kMaxDerivTimeSteps) have the exact same function. I will add some comments to make it clear.

// label_dim). This should equal the NumPdfs() in the TransitionModel object.
// Included to avoid training on mismatched egs.
int32 label_dim;
bool e2e; // end to end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need explanation- not just of what this means, but of what's going on more generally. Also by declaration of 'e2e_fsts' below. E.g. what do they contain, what e2e training is or where we can find out, etc.

const ProtoSupervision &proto_supervision,
Supervision *supervision);

bool TrainingGraphToSupervision(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need documentation.

leaky_hmm_coeff=0.1
num_scale_opts="--transition-scale=1.0 --self-loop-scale=1.0"
shared_phones=true
train_set=train_si284_spEx_hires
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like capitals in these naems.

common_lib.execute_command(
"""copy-transition-model {tree_dir}/final.mdl \
{dir}/0.trans_mdl""".format(dir=dir, tree_dir=tree_dir))
if not os.path.exists('{dir}/0.trans_mdl'.format(dir=dir)):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do this check? Isn't it okay to overwrite the transition model?

I don't think the transition model will ever change for a chain model (since the probabilities on the transitions are fixed), but in case it did (e.g., because the phone set changed but someone for some reason didn't change their experimental directory), shouldn't we prefer to always overwrite?

Copy link
Contributor Author

@hhadian hhadian Dec 13, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this change was for the initial experiments where I needed to copy my own transition model w/o being overwritten. I will probably need to revert this.

'{0}/final.mdl'.format(tree_dir), '{0}/tree'.format(tree_dir),
'{0}/lat.1.gz'.format(lat_dir), '{0}/final.mdl'.format(lat_dir),
'{0}/num_jobs'.format(lat_dir), '{0}/splice_opts'.format(lat_dir)]
'{0}/num_jobs'.format(lat_dir)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is splice_opts no longer required?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it is not even required in regular chain training.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer not to remove this, though, for the sake of backwards compatibility.

BINFILES = gmm-init-mono gmm-est gmm-acc-stats-ali gmm-align \
gmm-decode-faster gmm-decode-simple gmm-align-compiled \
gmm-sum-accs gmm-est-regtree-fmllr gmm-acc-stats-twofeats \
gmm-decode-faster gmm-decode-simple gmm-decode-nbest gmm-align-compiled \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you forget to add gmm-init-trans.cc and gmm-decode-nbest.cc?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have made only one change in this file (i.e adding gmm-init-biphone)
The other changes must be due to a conflict.


const std::vector<int32> &phones = topo.GetPhones();

std::vector<int32> phone2num_pdf_classes (1 + phones.back());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove whitespace here: s (1

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this constructor doing? I don't understand it. Is the last element of phones guaranteed to be the largest phoneme id? Why are you adding plus 1? Is it because phones are indexed from 1 in Kaldi (not sure if that's true right this moment, but I recall something like that being true before).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes (to both questions).
BTW, such constructors are used in a couple of other places in Kaldi.

ContextDependency *ctx_dep = NULL;
std::vector<std::vector<int32> > shared_phones;
if (shared_phones_rxfilename == "") {
for (size_t i = 0; i < phones.size(); i++)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add braces to this for loop?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a single-line statement, why are suggesting "braces"?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a common best practice to always use curly braces, especially since your single-line statement (in my opinion) is rather long.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually the Google style guide doesn't mandate them for one-line statements, and in cases like this I prefer not to have them as they make the code longer.


std::string wav_rspecifier = po.GetArg(1);
std::string wav_wspecifier = po.GetArg(2);
if (ClassifyRspecifier(po.GetArg(1), NULL, NULL) != kNoRspecifier) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not very clear to me what you're doing here since it's so large. Since it looks like you made a new set of input options, can you change the usage string of this binary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am adding support for rxfilename and wxfilename so it can be used in a piped command.
I'll update the usage.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks!


if [ $stage -le 5 ]; then
echo "$0: calling the flat-start chain recipe..."
loacl/chain/run_tdnn_lstm_flatstart.sh
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

loacl -> local


if [ $stage -le 5 ]; then
echo "$0: calling the flat-start chain recipe..."
loacl/chain/run_tdnn_flatstart.sh
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

loacl -> local

Copy link
Contributor

@danpovey danpovey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I would be OK to commit this stuff, as it doesn't confuse or break anything, but I'd prefer it to be more cleanly separated directory-wise from the regular chain scripts, e.g. move from steps/nnet3/chain/ to steps/nnet3/chain/e2e, and from local/ to local/e2e/. You can also put README files in those e2e directories if you want, to explain what it's about.

dest='left_deriv_truncate',
default=None,
help="Deprecated. Kept for back compatibility")
parser.add_argument("--chain.apply-deriv-weights", type=str,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a couple of args here seem to be repeated.

@danpovey
Copy link
Contributor

@vimalmanohar, do you have time to give this a quick check before I merge?

@vimalmanohar
Copy link
Contributor

vimalmanohar commented Jan 19, 2018 via email


# training options
num_epochs=4.5
initial_effective_lrate=0.001
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a lot of options are presented here. Some of them can be fixed and removed if they are not too important for tuning.

for lmtype in tgpr bd_tgpr; do
steps/nnet3/decode.sh \
--acwt 1.0 --post-decode-acwt 10.0 \
--extra-left-context $chunk_left_context \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left and right contexts can be removed for tdnn scripts.

shared_phones=true
train_set=train_si284_spe2e_hires
test_sets="test_dev93 test_eval92"
first_layer_splice=-1,0,1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think such things can be fixed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree.

if [ $stage -le 4 ]; then
echo "$0: estimating character language model for the denominator graph"
mkdir -p exp/chain/e2e_base
cat data/$trainset/text | \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be run using $train_cmd as systems might not be configured to run such things locally.

@@ -0,0 +1,93 @@
#!/bin/sh
# Copyright 2017 Hossein Hadian

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mention what this script is and also maybe mention similar scripts like _char.sh etc.

set -o pipefail

extract-segments scp:$srcdir/wav.scp $srcdir/segments ark,scp:$dir/data/segments.ark,$dir/segments.scp
cat $dir/segments.scp | awk '{ print $1 " wav-copy " $2 " - |" }' >$dir/wav.scp
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you need wav-copy here. Programs can directly read from ark files.

import math
import random

parser = argparse.ArgumentParser(description="""This script reads
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can also be put into functions, but since this is a small script it may not be essential.

#include "matrix/kaldi-matrix.h"
#include "hmm/transition-model.h"
#include "chain/chain-supervision.h"
#include "cudamatrix/cu-matrix.h"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you are including files that are not required or can be included in cc files.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I guess these were for some unusual experiments.

} else {
GenericNumeratorComputation numerator(supervision, nnet_output);
num_logprob_weighted = numerator.Forward();
KALDI_LOG << "Numerator logprob per frame: "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this was for debugging. You can probably remove it

KALDI_LOG << "Numerator logprob per frame: "
<< num_logprob_weighted / (*weight);
num_ok = (num_logprob_weighted - num_logprob_weighted == 0);
KALDI_LOG << "Numerator Forward " << (num_ok ? "succeeded" : "failed");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also remove this. You can probably keep track of how many failed and print somewhere else. But logging all this would be very slow. At least use like KALDI_VLOG(2) or something.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't worry about speed [it happens only once for each minibatch], but it might be unnecessary.
I could use VLOG.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that there should be a count maintained. If you expect it to normally succeed, maybe you could print something out if it failed?
Also I don't like the name num_ok, it's ambiguous. I prefer numerator_ok.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It never fails for WSJ. It could fail a few times (for long utterances, only during the first 10 or so iterations) on harder databases (like SWBD). I guess printing something if it fails is enough if you agree?
I guess maintaining a count involves changing a few different functions/classes (like ObjectiveFunctionInfo, UpdateStats, and even ComputeChainObjfAndDeriv). Should I do it?

I will fix the name num_ok.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, don't change those classes, since they interact with other code.
But there's something else that I'd like you to change, that I just noticed.
Your changes are making the function ComputeChainObjfAndDeriv quite complicated.
I'd like you to, in chain-training.cc but not the header, introduce a new function
ComputeChainObjfAndDerivE2e, and if supervision.e2e is true, call that function.
That will keep the main current code path simple and free of your changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.

print "Exporting to ", outdir, "..."
spks = {}

f = open(os.path.join(outdir, 'text'), 'w')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While changing this file, also try to use with open(***, 'w') as f: approach.


durs = []
for u in utts:
durs += [u.dur]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use durs.append(u.dur). It is easier to understand. += has ambiguous behavior and is hard to read. Same thing for list concatenations elsewhere.


durs = []
d = start_dur
f = open(os.path.join(args.dir, 'allowed_durs.txt'), 'wb')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use with construct here too. You can give both handles in the same statement.

utt2spk = read_kaldi_mapfile(os.path.join(dir, 'utt2spk'))
for utt in wav_scp:
if utt in text and utt in utt2dur and utt in utt2spk:
utts += [Utterance(utt, wav_scp[utt], utt2spk[utt], text[utt], utt2dur[utt])]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utts.append(Utterance(***)) is the usual way to do it.

f = open(os.path.join(args.dir, 'allowed_durs.txt'), 'wb')
f2 = open(os.path.join(args.dir, 'allowed_lengths.txt'), 'wb')
while d < end_dur:
length = int(d*1000 - 25) / 10 + 1 # for the most common length of frames and overlap which is 25ms and 10ms
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this script only work for 25ms and 10ms case? If not, you can comment that it should work for other reasonable cases. Alternatively, you can get these values from args.

. ./path.sh

if [ $# != 2 ]; then
echo "Usage: "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe extract_wav_segments_data_dir.sh to be clear you are extracting wav data.

parser.add_argument('langdir', type=str)
parser.add_argument('lex', type=str,
help='lexicon.txt')
parser.add_argument('--be-silprob', type=float, default=0.8,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe needs different names like edge-silprob and between-silprob

echo "$0: estimating phone language model for the denominator graph"
mkdir -p exp/chain/e2e_base
cat data/$trainset/text | \
utils/text_to_phones.py data/lang_nosp data/local/dict_nosp/lexicon.txt | \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe you can use the lexicon in data/lang_nosp/phones/align_lexicon.txt. When you do this, you also don't need position-independent option because align_lexicon.txt already has that accounted for.

and end.""")
parser.add_argument('--silprob', type=float, default=0.2,
help="Probability of optional silence between the words.")
parser.add_argument('--pos-independent', action='store_true')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my other comment about removing this option.

sil = open(args.langdir + "/phones/optional_silence.txt").readline().strip()

# unk
unk = open(args.langdir + "/oov.txt").readline().strip()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps you should name this as oov_word to be clear that it is the word and not the unk phone.

@misbullah
Copy link
Contributor

Hi,

I address some bugs in the following files:

utils/data/perturb_speed_to_allowed_lengths.py

line 212:
original: u1.wavefile = '{} sox -t wav - -t wav - speed {} | '.format(u.wavefile, speed)
need to change to be: u1.wavefile = 'sox -t wav {} -t wav - speed {} | '.format(u.wavefile, speed)

line 230:
original: u2.wavefile = '{} sox -t wav -t wav - speed {} | '.format(u.wavefile, speed)
need to change to be: u2.wavefile = 'sox -t wav {} -t wav - speed {} | '.format(u.wavefile, speed)

line 241L
original: u3.wavefile = '{} extend-wav-with-silence --extra-silence-length={} - - | '.format(u.wavefile, delta)
need to change to be: u3.wavefile = 'extend-wav-with-silence --extra-silence-length={} {} - | '.format(delta, u.wavefile)

In steps/libs/nnet3/train/chain_objf/acoustic_model.py

line 138:
original: steps/nnet3/chain/get_egs_e2e.sh
need to change to be: steps/nnet3/chain/e2e/get_egs_e2e.sh

I also found that there is no --add-deltas option for --egs.opts "--normalize-egs true --add-deltas false" in local/chain/e2e/run_tdnn_flatstart.sh

Thanks
Alim

@hhadian
Copy link
Contributor Author

hhadian commented Jan 26, 2018

Thanks, I've already fixed all of them (and other issues) and I'll push them soon.

@hhadian
Copy link
Contributor Author

hhadian commented Jan 27, 2018

I addressed the issues. I guess this is ready.
@danpovey, all the scripts directly dealing with e2e are in separate directories but there are 3 other shared scripts (text_to_phones.py, perturb_speed_to_allowed_lengths.py, and extract_wav_segments_data_dir.sh) which are in utils and utils/data. Should I move them to e.g. utils/e2e too?

@danpovey
Copy link
Contributor

danpovey commented Jan 27, 2018 via email

Copy link
Contributor

@danpovey danpovey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bunch of comments, none of them too major.
(But as we discussed offline, you'll make this much faster later on, after I merge.)

frames_per_eg=$(cat $data/allowed_lengths.txt | tr '\n' , | sed 's/,$//')

[ ! -f "$data/utt2len" ] && feat-to-len scp:$data/feats.scp ark,t:$data/utt2len
# TODO(222 --> most frequent len)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get rid of TODO.. see the comment a few lines below also.

// the extra B is for storing alpha sums
beta_.Resize(2, max_num_hmm_states_ * B, kSetZero);

// init incmoing transitions for easy access
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo. And all comments should start with a capital. According to Google guidelines they should be proper sentences.

KALDI_ASSERT(supervision.num_sequences *
supervision.frames_per_sequence == nnet_output.NumRows() &&
supervision.label_dim == nnet_output.NumCols());
exp_nnet_output_transposed_.ApplyExp();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would probably be more efficient to apply the exp on the GPU, to a copy of the matrix, and then copy to CPU.

ReadBasicType(is, binary, &label_dim);
if (!binary) {
ReadFstKaldi(is, binary, &fst);
ExpectToken(is, binary, "<End2End>");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On-disk formats of things like egs need to back-compatible. You can use PeekToken(); look in nnet-simple-compoent.cc for examples.

e2e(other.e2e), e2e_fsts(other.e2e_fsts) { }

void AppendSupervision(const std::vector<const Supervision*> &input,
bool compactify,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These functions AppendSupervision() and AddWeightToSupervisionFst() have become very long due to this e2e stuff. We try to avoid having functions longer than about 40 lines. I think the best way to keep the two parts of the code separate would be to have a statement near the beginning of those functions that checks if the 'e2e' flag is set, and if it is, recurses to a static function with a name like AppendSupervisionE2e().

po.PrintUsage();
exit(1);
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this file seems to have a bunch of whitespace lines that are probably excessive.


const std::vector<int32> &phones = topo.GetPhones();

std::vector<int32> phone2num_pdf_classes (1 + phones.back());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove space before (.

u3 = copy.deepcopy(u)
u3.id = 'pv3-' + u.id
u3.speaker = 'pv3-' + u.speaker
u3.wavefile = '{} extend-wav-with-silence --extra-silence-length={} - - | '.format(u.wavefile, delta)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a reason you can't just use 'sox pad'?

@@ -0,0 +1,73 @@
#!/usr/bin/env python
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this script is kind of specialized so I'd rather not put it here-- please move it to one of the 'e2e' directories in steps/.

delete compact_fst;
}
}
ExpectToken(is, binary, "</FSTs>");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer these tokens to be called Fsts rather than FSTs.



std::string topo_filename = po.GetArg(1);
int dim = atoi(po.GetArg(2).c_str());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And please use ConvertStringToInteger, e.g. if (!ConvertStringToInteger(...) || dim <= 0 || dim > 10000) KALDI_ERR...

if utt in text and utt in utt2dur and utt in utt2spk:
utterances.append(Utterance(utt, wav_scp[utt], utt2spk[utt],
text[utt], utt2dur[utt]))
else:
Copy link
Contributor

@vimalmanohar vimalmanohar Feb 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try to add some count like num_success, num_fail and print them. Don't print warning for every utterance. There should also be a special check that the input data directory has been passed through extract_wav_segments script. Otherwise it would generate a lot of warnings without any information on what actually is the issue.
Also check how many utterances you got compared to the ones in wav_scp, and exit if its zero or less than some percentage.


if [ $# != 2 ]; then
echo "Usage: $0 <srcdir> <destdir>"
echo " This script copies data directory <srcdir> to <destdir> and gets"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This script will be very slow. Split the data and run the command in parallel.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you do ad --nj and --cmd options to this script, don't set the default of --nj to more than 4, for I/O reasons.

set -e -o pipefail
utils/copy_data_dir.sh $srcdir $dir

extract-segments scp:$srcdir/wav.scp $srcdir/segments ark,scp:$dir/data/wav_segments.ark,$dir/segments.scp
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to not put the temporary files like $dir/segments.scp in $dir. Keep it in $dir/data.
Better to rename it to wav_segments.scp to avoid confusion with segments file.

help="""Chain frame subsampling factor.
See steps/nnet3/chain/train.py""")
parser.add_argument('--speed-perturb', type=bool, default=True,
help="""If false, no speed perturbation will occur, i.e.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Type casting to bool will change "false" to True. You can instead type=str, choices=['true','false'] and then convert the string to bool yourself.

@vimalmanohar
Copy link
Contributor

vimalmanohar commented Feb 9, 2018 via email

## Add two versions for the second allowed_duration
## one version is by using speed modification using sox
## the other is by extending by silence
if not args.speed_perturb:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be no not here.

@danpovey
Copy link
Contributor

Hm, I thought this had been merged already. It seems not. I think there may be un-handled comments still though, and I see a conflict now.

@hhadian
Copy link
Contributor Author

hhadian commented Mar 3, 2018

This is almost ready (have not pushed yet). Two notes:
I didn't know there was a sox pad. I checked it now, but it seems that - unlike extend-wav-with-silence - it does not use the silence at the begin/end to make the padding (i.e. it just inserts some samples with 0 amplitude).

About extract_wav_segments.sh, I guess a better way than make it use parallel jobs is to change it so that it does not do any actual wave processing and instead writes sox trim commands in wav.scp with the segment start/end times taken from the 'segments' file. It will be faster and it will save disk space. What would you think?

@danpovey
Copy link
Contributor

danpovey commented Mar 3, 2018 via email

@hhadian
Copy link
Contributor Author

hhadian commented Mar 5, 2018

All issues are addressed.

@danpovey danpovey merged commit e74b918 into kaldi-asr:master Mar 6, 2018
LvHang pushed a commit to LvHang/kaldi that referenced this pull request Apr 14, 2018
This is a version of LF-MMI that doesn't require bootstrapping from a baseline GMM system. Conflicts:	src/chain/chain-training.cc
Skaiste pushed a commit to Skaiste/idlak that referenced this pull request Sep 26, 2018
This is a version of LF-MMI that doesn't require bootstrapping from a baseline GMM system.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

7 participants