Skip to content

Conversation

@francoishernandez
Copy link
Contributor

Here is the recipe coming with the recent release of TED-LIUM corpus version 3.

I copied the ability to download the LMs from David's PR, and made the same for the RNNLM models. Could you add these models to kaldi-asr.org? (I added a dummy link in local/ted_download_rnnlm.sh for now.)

I did not add what Dan is mentioning about the ivector training here yet.

Also, please note that the "join_suffix" normalization script present in r2 has been applied to the LM data available in the r3 tarball, which explains its disappearing from this recipe.

@danpovey
Copy link
Contributor

@david-ryan-snyder, do you have time to deal with the model upload?

export train_cmd="queue.pl"
export decode_cmd="queue.pl --mem 4G"

host=$(hostname -f)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove the stuff from this line from this file. We really shouldn't be putting these cluster-specific things in these files. (I know this was copied from elsewhere).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry about that, copied it from s5_r2, you might want to remove it from there as well.

# this does some data-cleaning. It actually degrades the GMM-level results
# slightly, but the cleaned data should be useful when we add the neural net and chain
# systems. If not we'll remove this stage.
local/run_cleanup_segmentation.sh
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you ever test whether this data cleanup is helpful?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not yet.

@david-ryan-snyder
Copy link
Contributor

david-ryan-snyder commented May 23, 2018

Sure, I'll upload the LM.

@david-ryan-snyder
Copy link
Contributor

@francoishernandez,

You can find the model here: http://kaldi-asr.org/models/m5
Here's a direct link: http://kaldi-asr.org/models/5/tedlium_rnnlm.tgz

I can update the Kaldi version (that currently says TODO) after this is merged.

@xiaohui-zhang
Copy link
Contributor

@francoishernandez @vince62s any comment? I know it might not be easy to recover those s...

BWT, I've got results:
With cleanup (it helps. So we'd better keep the cleanup stage)
%WER 7.53 [ 1339 / 17783, 215 ins, 329 del, 795 sub ] exp/chain_cleaned/tdnnf_1a/decode_dev_rescore/wer_10
%WER 7.87 [ 2164 / 27500, 297 ins, 610 del, 1257 sub ] exp/chain_cleaned/tdnnf_1a/decode_test_rescore/wer_9

Without cleanup:
%WER 7.93 [ 1411 / 17783, 210 ins, 422 del, 779 sub ] exp/chain/tdnnf_1a/decode_dev_rescore/wer_9
%WER 8.23 [ 2262 / 27500, 292 ins, 790 del, 1180 sub ] exp/chain/tdnnf_1a/decode_test_rescore/wer_9

My results are still worse than the reported numbers though... (7.2/7.5 on page 6)

@francoishernandez
Copy link
Contributor Author

Hi,
I agree it might not be easy to recover those <unk>. Removing those at beginning and end, it remains 100k of them -- for a global corpus of 4.8M words. I had a look at a few random files and compared them with the original unaligned transcripts. It seems a lot of those 'intra-utterance' <unk> are punctuation so it might not interfere much.
You're right about the lexicon being noisy, this is not helping. Your work about lexicon learning looks very interesting to optimize such systems.

About the results, I checked our original run and I see two possible sources of difference:

  1. the changes in the ivector training routine suggested by Dan (LDA --> PCA)
  2. the LMs are downloaded from David's run here and they seem a bit different from the ones we have trained

Did you change anything else in the pipeline?

Also, I checked the tri3 numbers for the same run where we got 7.2/7.5:
%WER 18.4 | 507 17783 | 84.8 10.8 4.4 3.2 18.4 91.7 | -0.056 | exp/tri3/decode_dev_rescore/score_14_0.0/ctm.filt.filt.sys
%WER 16.4 | 1155 27500 | 86.1 10.6 3.3 2.5 16.4 86.4 | -0.025 | exp/tri3/decode_test_rescore/score_14_0.0/ctm.filt.filt.sys

So, we're also 0.4/0.3 points below, but that might be less of an issue than on tdnnf.

@francoishernandez
Copy link
Contributor Author

Hey,
In fact it might come from the LM. The default script bypasses the ted_train_lm step by downloading David's LM. But in the ted_train_lm script we also take into account (new) data from the audio train set (x2 compared to s5_r2).
Could you try with our LMs instead?

@xiaohui-zhang
Copy link
Contributor

hi @francoishernandez, with your LM I re-decoded and here are the results:
xzhang@b02:/export/b04/xzhang/kaldi/egs/tedlium/s5_r3$ cat exp/tri3/decode_test_rescore/we* | utils/best_wer.sh
%WER 16.64 [ 4575 / 27500, 679 ins, 915 del, 2981 sub ]
xzhang@b02:/export/b04/xzhang/kaldi/egs/tedlium/s5_r3$ cat exp/tri3/decode_dev_rescore/we* | utils/best_wer.sh
%WER 18.95 [ 3369 / 17783, 567 ins, 841 del, 1961 sub ]

xzhang@b02:/export/b04/xzhang/kaldi/egs/tedlium/s5_r3$ cat exp/chain_cleaned/tdnnf_1a/decode_dev_rescore/we* | utils/best_wer.sh
%WER 7.52 [ 1337 / 17783, 214 ins, 329 del, 794 sub ]
xzhang@b02:/export/b04/xzhang/kaldi/egs/tedlium/s5_r3$ cat exp/chain_cleaned/tdnnf_1a/decode_test_rescore/we* | utils/best_wer.sh
%WER 7.87 [ 2164 / 27500, 297 ins, 610 del, 1257 sub ]

Not better than before. I guess we should first try to match the tri3 numbers.... I didn't make other changes. Could you try starting a clean run with the checked-in scripts also?

@francoishernandez
Copy link
Contributor Author

francoishernandez commented Jun 12, 2018

Ok thanks for these results.
Yes I already started a new run with the checked-in recipe. It's at tri2 right now, I'll keep you posted.

@francoishernandez
Copy link
Contributor Author

Ok guys,

I reran the checked in recipe with the released data, and got the exact same tri3 results as @xiaohui-zhang:
%WER 18.84 [ 3351 / 17783, 553 ins, 852 del, 1946 sub ] exp/tri3/decode_dev_rescore/wer_17
%WER 16.68 [ 4586 / 27500, 682 ins, 907 del, 2997 sub ] exp/tri3/decode_test_rescore/wer_15

I noticed a difference in scores between sclite and score_basic on the same lattices (sclite being better). The original run from the paper gives these results for tri3 with score_basic:
%WER 18.75 [ 3334 / 17783, 591 ins, 792 del, 1951 sub ] exp/tri3/decode_dev_rescore/wer_14
%WER 16.58 [ 4559 / 27500, 667 ins, 909 del, 2983 sub ] exp/tri3/decode_test_rescore/wer_15

As a reminder, they were the following with sclite:
%WER 18.4 | 507 17783 | 84.8 10.8 4.4 3.2 18.4 91.7 | -0.056 | exp/tri3/decode_dev_rescore/score_14_0.0/ctm.filt.filt.sys
%WER 16.4 | 1155 27500 | 86.1 10.6 3.3 2.5 16.4 86.4 | -0.025 | exp/tri3/decode_test_rescore/score_14_0.0/ctm.filt.filt.sys

So, we're almost closing the gap here.

The thing is, right before submitting the paper and releasing the corpus, we chose to remove a few of the talks which represented unnecessary weight in the archive for only a few noisy utterances (around 200, almost empty or speech inside songs). The small remaining difference might come from there.

As for the chain results, here are the ones of the run from the paper, with score_basic:
exp/chain_cleaned/tdnnf_1a/decode_dev_rescore/wer_10
%WER 7.37 [ 1310 / 17783, 213 ins, 316 del, 781 sub ]
exp/chain_cleaned/tdnnf_1a/decode_test_rescore/wer_9
%WER 7.55 [ 2075 / 27500, 280 ins, 588 del, 1207 sub ]

We're not closing the gap yet here. I suspect it might come from the modification in the ivector training strategy. I launched a run starting from the checked-in tri3, with the original ivector strategy to see where that leads us.

Once this is done I'll update the results based on score_basic to avoid confusion.

@jtrmal
Copy link
Contributor

jtrmal commented Jun 15, 2018 via email

@francoishernandez
Copy link
Contributor Author

Thanks Yenda for mentioning this!
I made a full run from scratch and updated the results accordingly. I will update the paper as well (final deadline is the 24th).
We chose to keep sclite scores in the paper, so I added a column in the run_tdnnf_1a.sh header to mention both sclite and score_basic results.

@xiaohui-zhang
Copy link
Contributor

Thanks @francoishernandez for updating. Maybe you can also remove those "garbage" sentences from the releases dataset?

@francoishernandez
Copy link
Contributor Author

Yes that's what we did. (Original run from paper had 2397 files, released data has 2351)

@xiaohui-zhang
Copy link
Contributor

cool thanks for clarifying

@vince62s
Copy link
Contributor

Let us know if you guys need anything else to merge.
Cheers.
Vincent.

Copy link
Contributor

@danpovey danpovey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked through it quickly and noticed a few issues.

# JHU cluster (or most clusters using GridEngine, with a suitable
# conf/queue.conf).
export train_cmd="queue.pl"
export decode_cmd="queue.pl --mem 4G" No newline at end of file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add missing newline

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed

@@ -0,0 +1,233 @@
#!/bin/bash

# See run_tdnnf_1a.sh for comparative results.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should still always include the current result in any given example script, even if there is nothing to compare with. But please don't make TDNN and TDNN-F part of different sequences: call them both 'tdnn'.

# Final train prob -0.0802 -0.0899
# Final valid prob -0.0980 -0.0974
# Final train prob (xent) -1.1450 -0.9449
# Final valid prob (xent) -1.2498 -1.0002
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please have the compare.sh script also output the number of parameters.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed


echo "$0: creating neural net configs using the xconfig parser";

num_targets=$(tree-info $tree_dir/tree |grep num-pdfs|awk '{print $2}')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought I mentioned this before, but this example script is kind of broken. The xconfig refers to shell variables $opts, $linear_opts, $output_opts (and maybe others) which are not defined.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed these in 75e9d60

--feat.cmvn-opts "--norm-means=false --norm-vars=false" \
--chain.xent-regularize 0.1 \
--chain.leaky-hmm-coefficient 0.1 \
--chain.l2-regularize 0.00005 \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you probably want this set to zero. (Especially after fixing the accidental omission of l2 regularziation at the model level).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed


. ./path.sh

export LC_ALL=C
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be unnecessary, as path.sh should export LC_ALL=C.

dir=data/local/dict_nosp
mkdir -p $dir

srcdict=db//TEDLIUM_release-3/TEDLIUM.152k.dic
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove double slash.

@danpovey
Copy link
Contributor

Are you OK to wait to get results with the fixed TDNN-F training run, or do you want me to merge it now and update the results later?

@francoishernandez
Copy link
Contributor Author

francoishernandez commented Jun 29, 2018 via email

@danpovey
Copy link
Contributor

danpovey commented Jun 29, 2018 via email

@francoishernandez
Copy link
Contributor Author

francoishernandez commented Jun 29, 2018 via email

@vince62s
Copy link
Contributor

Dan, if you don't mind I think it is good to merge as is. It will match the paper and be the baseline.
It will always be good to have baseline (even not perfect) that people can play with and try to improve.
Paper will be presented by François at SPECOM.
Thanks.

@danpovey
Copy link
Contributor

Can you at least put a warning at the top of the TDNN-F script that the script has some problems (you can mention the specific problems f you want) and should not be used as an example to copy from?
I'll try to run a more up-to-date (and fixed) setup soon.

@danpovey danpovey merged commit fdb6774 into kaldi-asr:master Jul 12, 2018
dpriver pushed a commit to dpriver/kaldi that referenced this pull request Sep 13, 2018
Skaiste pushed a commit to Skaiste/idlak that referenced this pull request Sep 26, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

7 participants