Skip to content

Conversation

@jtrmal
Copy link
Contributor

@jtrmal jtrmal commented Mar 7, 2018

This is a joint work with @sw005320 and me
Dan, we are still working on the chain/nnet trainining but we thought we will go ahead and create PR now, so that at least the data preparation and gmm can go through review

Copy link
Contributor

@danpovey danpovey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments.

@@ -0,0 +1,50 @@
#BeamformIt sample configuration file for AMI data (http://groups.inf.ed.ac.uk/ami/download/)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you rename to beamformit_chime5.cfg to clarify that it relates to that?

@@ -0,0 +1,2 @@
beam=11.0 # beam for decoding. Was 13.0 in the scripts.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should probably delete this (also from the source of these scripts) unless it's being used. I believe it is not used unless an option like "--config conf/decode.config" is given to decoding scripts.

@@ -0,0 +1,283 @@
#!/bin/bash

# 1e is as 1d but instead of the --proportional-shrink option, using
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you don't have letters a through d in this PR, please rename to 1a.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

num_targets=$(tree-info $tree_dir/tree |grep num-pdfs|awk '{print $2}')
learning_rate_factor=$(echo "print 0.5/$xent_regularize" | python)
opts="l2-regularize=0.05"
output_opts="l2-regularize=0.01"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may find that adding "bottleneck-dim=320" (or maybe 256) to output_opts helps.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just put "bottleneck-dim=320" in the current script for now (I used it in my old setup, but it was removed during some merge steps). It will be tuned later.

adir=$1
jdir=$2
dir=$3

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add some basic check of the inputs, so the messages are informative if the user gives the wrong inputs.

echo "-------------------"
echo "Maxent 3grams"
echo "-------------------"
sed 's/'${oov_symbol}'/<unk>/g' $tgtdir/train.txt | \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are these needed? I notice you're not treating it as an error if LIBLBFGS is not defined. It seems to me, either it's important (in which case it should be an error if not defined), or it's not (in which case this could be deleted).

If you want to have a script that tries a bunch of LMs automatically, like this, then IMO this shouldn't be in local/, it should be a generic script called from local.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That script is fairly general because there was a lot of experimentation with the training set size and yet, it's not general enough, as it contains some chime5 specific filtering (as it is probable, that the training set could contain duplicate transcriptions of the same utterance (via a different channel). IMO the MaxEnt models are usually the most robust way for arpa-format LM -- you don't have to mess with discounting and cut-offs, but they are not very widely accepted, so I wanted to provide a comparison against KN and GT baselines.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jtrmal, for these max-ent LMs, are they the ones that are actually chosen by the script?
I believe they are quite slow to estimate, and I'm not sure how they behave when pruned.
I'd prefer to get rid of the dependency if there isn't really a good reason to use them.

# chime5 main directory path
# please change the path accordingly
chime5_corpus=/export/corpora4/CHiME5
json_dir=${chime5_corpus}/data/transcriptions

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe json_dir=${chime5_corpus}/transcriptions
there is no data dir in unzipped CHiME5 dir

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

# please change the path accordingly
chime5_corpus=/export/corpora4/CHiME5
json_dir=${chime5_corpus}/data/transcriptions
audio_dir=${chime5_corpus}/data/audio

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@jtrmal
Copy link
Contributor Author

jtrmal commented Mar 8, 2018

@danpovey @ShigekiKarita @sw005320 I'm addressing the comments, will push today -- I want to run the data preparation pipeline to make sure it works with the new corpus location

@sw005320
Copy link
Contributor

sw005320 commented Mar 8, 2018

@jtrmal @ShigekiKarita, I just fixed wrong path issues. Now I confirmed that it's working by stage 16.

@jtrmal jtrmal force-pushed the chime5_baseline branch from 73333e3 to 5a3b8b1 Compare March 8, 2018 16:11
@jtrmal
Copy link
Contributor Author

jtrmal commented Mar 9, 2018

The non-addressed comments are related on TDNN/chain script, shinji is still waiting for his training to finish.

Copy link

@kamo-naoyuki kamo-naoyuki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't you have to use cuda_cmd?

@sw005320
Copy link
Contributor

sw005320 commented Mar 9, 2018

@kamo-naoyuki which part are you talking about?

fi

steps/nnet3/chain/train.py --stage=$train_stage \
--cmd="$decode_cmd" \

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, here. $decode_cmd should be replace to $cuda_cmd?

@jtrmal
Copy link
Contributor Author

jtrmal commented Mar 9, 2018 via email

@jtrmal
Copy link
Contributor Author

jtrmal commented Mar 9, 2018 via email

…nd right channel information according to the other channel information format
@kamo-naoyuki
Copy link

ok, I understood.

@jtrmal
Copy link
Contributor Author

jtrmal commented Mar 10, 2018 via email

@danpovey
Copy link
Contributor

danpovey commented Mar 10, 2018 via email

@sw005320
Copy link
Contributor

@jtrmal @danpovey I'm just reporting the current status. I removed stage 13 (lexicon update) to strictly follow the challenge regulation, and also added a location tag for future scoring. Now, I'm checking whether the recipe is working from scratch. Now it's working by the data cleaning stage, and will move to the chain model. I already confirmed it's working in the previous setup, and I think the check will be smoothly finished in the weekend. (I hope. I believe)

@sw005320
Copy link
Contributor

@danpovey we've finished the recipe check, and updated the latest results. If this is no problem, please merge it.

@danpovey danpovey merged commit 5eb57cc into kaldi-asr:master Mar 13, 2018
@sw005320
Copy link
Contributor

Thanks!!!!

LvHang pushed a commit to LvHang/kaldi that referenced this pull request Apr 14, 2018
Skaiste pushed a commit to Skaiste/idlak that referenced this pull request Sep 26, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

5 participants