- Notifications
You must be signed in to change notification settings - Fork 5.4k
[wip] Tedlium r3 recipe #2442
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[wip] Tedlium r3 recipe #2442
Conversation
| @david-ryan-snyder, do you have time to deal with the model upload? |
egs/tedlium/s5_r3/cmd.sh Outdated
| export train_cmd="queue.pl" | ||
| export decode_cmd="queue.pl --mem 4G" | ||
| | ||
| host=$(hostname -f) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove the stuff from this line from this file. We really shouldn't be putting these cluster-specific things in these files. (I know this was copied from elsewhere).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry about that, copied it from s5_r2, you might want to remove it from there as well.
| # this does some data-cleaning. It actually degrades the GMM-level results | ||
| # slightly, but the cleaned data should be useful when we add the neural net and chain | ||
| # systems. If not we'll remove this stage. | ||
| local/run_cleanup_segmentation.sh |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did you ever test whether this data cleanup is helpful?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not yet.
| Sure, I'll upload the LM. |
| You can find the model here: http://kaldi-asr.org/models/m5 I can update the Kaldi version (that currently says TODO) after this is merged. |
| @francoishernandez @vince62s any comment? I know it might not be easy to recover those s... BWT, I've got results: Without cleanup: My results are still worse than the reported numbers though... (7.2/7.5 on page 6) |
| Hi, About the results, I checked our original run and I see two possible sources of difference:
Did you change anything else in the pipeline? Also, I checked the tri3 numbers for the same run where we got 7.2/7.5: So, we're also 0.4/0.3 points below, but that might be less of an issue than on tdnnf. |
| Hey, |
| hi @francoishernandez, with your LM I re-decoded and here are the results: xzhang@b02:/export/b04/xzhang/kaldi/egs/tedlium/s5_r3$ cat exp/chain_cleaned/tdnnf_1a/decode_dev_rescore/we* | utils/best_wer.sh Not better than before. I guess we should first try to match the tri3 numbers.... I didn't make other changes. Could you try starting a clean run with the checked-in scripts also? |
| Ok thanks for these results. |
| Ok guys, I reran the checked in recipe with the released data, and got the exact same tri3 results as @xiaohui-zhang: I noticed a difference in scores between sclite and score_basic on the same lattices (sclite being better). The original run from the paper gives these results for tri3 with score_basic: As a reminder, they were the following with sclite: So, we're almost closing the gap here. The thing is, right before submitting the paper and releasing the corpus, we chose to remove a few of the talks which represented unnecessary weight in the archive for only a few noisy utterances (around 200, almost empty or speech inside songs). The small remaining difference might come from there. As for the chain results, here are the ones of the run from the paper, with score_basic: We're not closing the gap yet here. I suspect it might come from the modification in the ivector training strategy. I launched a run starting from the checked-in tri3, with the original ivector strategy to see where that leads us. Once this is done I'll update the results based on score_basic to avoid confusion. |
| if you are comparing sctk vs kaldi scoring, the differences in scores do not have correspond to one AM being better than other -- sctk can be smart when it comes to scoring fragments and unks, the default kaldi scoring script not so much. I'd suggest looking/comparing the sctk alignment files (*pra?) vs the kaldi_scoring/wer_details/per_utt y. …On Fri, Jun 15, 2018 at 1:55 PM François Hernandez ***@***.***> wrote: Ok guys, I reran the checked in recipe with the released data, and got the exact same tri3 results as @xiaohui-zhang <https://github.com/xiaohui-zhang>: %WER 18.84 [ 3351 / 17783, 553 ins, 852 del, 1946 sub ] exp/tri3/decode_dev_rescore/wer_17 %WER 16.68 [ 4586 / 27500, 682 ins, 907 del, 2997 sub ] exp/tri3/decode_test_rescore/wer_15 I noticed a difference in scores between sclite and score_basic on the same lattices (sclite being better). The original run from the paper gives these results for tri3 with score_basic: %WER 18.75 [ 3334 / 17783, 591 ins, 792 del, 1951 sub ] exp/tri3/decode_dev_rescore/wer_14 %WER 16.58 [ 4559 / 27500, 667 ins, 909 del, 2983 sub ] exp/tri3/decode_test_rescore/wer_15 As a reminder, they were the following with sclite: %WER 18.4 | 507 17783 | 84.8 10.8 4.4 3.2 18.4 91.7 | -0.056 | exp/tri3/decode_dev_rescore/score_14_0.0/ctm.filt.filt.sys %WER 16.4 | 1155 27500 | 86.1 10.6 3.3 2.5 16.4 86.4 | -0.025 | exp/tri3/decode_test_rescore/score_14_0.0/ctm.filt.filt.sys So, we're almost closing the gap here. The thing is, right before submitting the paper and releasing the corpus, we chose to remove a few of the talks which represented unnecessary weight in the archive for only a few noisy utterances (around 200, almost empty or speech inside songs). The small remaining difference might come from there. As for the chain results, here are the ones of the run from the paper, with score_basic: exp/chain_cleaned/tdnnf_1a/decode_dev_rescore/wer_10 %WER 7.37 [ 1310 / 17783, 213 ins, 316 del, 781 sub ] exp/chain_cleaned/tdnnf_1a/decode_test_rescore/wer_9 %WER 7.55 [ 2075 / 27500, 280 ins, 588 del, 1207 sub ] We're not closing the gap yet here. I suspect it might come from the modification in the ivector training strategy. I launched a run starting from the checked-in tri3, with the original ivector strategy to see where that leads us. Once this is done I'll update the results based on score_basic to avoid confusion. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#2442 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AKisX2ZXWNbZzukwLM_HoTVNBBxA5A6Vks5t8_UogaJpZM4UIma9> . |
553f7b8 to 27067fc Compare | Thanks Yenda for mentioning this! |
| Thanks @francoishernandez for updating. Maybe you can also remove those "garbage" sentences from the releases dataset? |
| Yes that's what we did. (Original run from paper had 2397 files, released data has 2351) |
| cool thanks for clarifying |
| Let us know if you guys need anything else to merge. |
danpovey left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked through it quickly and noticed a few issues.
egs/tedlium/s5_r3/cmd.sh Outdated
| # JHU cluster (or most clusters using GridEngine, with a suitable | ||
| # conf/queue.conf). | ||
| export train_cmd="queue.pl" | ||
| export decode_cmd="queue.pl --mem 4G" No newline at end of file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add missing newline
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
addressed
| @@ -0,0 +1,233 @@ | |||
| #!/bin/bash | |||
| | |||
| # See run_tdnnf_1a.sh for comparative results. | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you should still always include the current result in any given example script, even if there is nothing to compare with. But please don't make TDNN and TDNN-F part of different sequences: call them both 'tdnn'.
| # Final train prob -0.0802 -0.0899 | ||
| # Final valid prob -0.0980 -0.0974 | ||
| # Final train prob (xent) -1.1450 -0.9449 | ||
| # Final valid prob (xent) -1.2498 -1.0002 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please have the compare.sh script also output the number of parameters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
addressed
| | ||
| echo "$0: creating neural net configs using the xconfig parser"; | ||
| | ||
| num_targets=$(tree-info $tree_dir/tree |grep num-pdfs|awk '{print $2}') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought I mentioned this before, but this example script is kind of broken. The xconfig refers to shell variables $opts, $linear_opts, $output_opts (and maybe others) which are not defined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed these in 75e9d60
| --feat.cmvn-opts "--norm-means=false --norm-vars=false" \ | ||
| --chain.xent-regularize 0.1 \ | ||
| --chain.leaky-hmm-coefficient 0.1 \ | ||
| --chain.l2-regularize 0.00005 \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you probably want this set to zero. (Especially after fixing the accidental omission of l2 regularziation at the model level).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
addressed
| | ||
| . ./path.sh | ||
| | ||
| export LC_ALL=C |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be unnecessary, as path.sh should export LC_ALL=C.
| dir=data/local/dict_nosp | ||
| mkdir -p $dir | ||
| | ||
| srcdict=db//TEDLIUM_release-3/TEDLIUM.152k.dic |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove double slash.
| Are you OK to wait to get results with the fixed TDNN-F training run, or do you want me to merge it now and update the results later? |
| Do you think this will change a lot the results? (The only modification in the end was the 'useless' l2 regul option, no?) I would vote to merge it as is, so that anyone can do further runs ans tuning. Le ven. 29 juin 2018 19:21, Daniel Povey <notifications@github.com> a écrit : … Are you OK to wait to get results with the fixed TDNN-F training run, or do you want me to merge it now and update the results later? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#2442 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AOMOkzqItc53kNt6ce44GaP1lXVrocDYks5uBmIpgaJpZM4UIma9> . |
| The l2 is quite important: if you don't have it, it trains too slowly. I think that's why the TDNN-F barely gave any improvement. But actually that recipe could be improved substantially anyway, for other reasons. Maybe the best thing is to just remove that recipe entirely and leave the 1a; then later on, I or someone else can add a good-performing TDNN-F recipe. …On Fri, Jun 29, 2018 at 1:47 PM, François Hernandez < ***@***.***> wrote: Do you think this will change a lot the results? (The only modification in the end was the 'useless' l2 regul option, no?) I would vote to merge it as is, so that anyone can do further runs ans tuning. Le ven. 29 juin 2018 19:21, Daniel Povey ***@***.***> a écrit : > Are you OK to wait to get results with the fixed TDNN-F training run, or > do you want me to merge it now and update the results later? > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > <#2442 (comment)>, or mute > the thread > <https://github.com/notifications/unsubscribe-auth/ AOMOkzqItc53kNt6ce44GaP1lXVrocDYks5uBmIpgaJpZM4UIma9> > . > — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#2442 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADJVuw514hOoc2CzTBMabtjNg300kqR4ks5uBmhOgaJpZM4UIma9> . |
| Ok as you wish, we just wanted to have it somewhere to go with the paper. But I can keep my branch up for that purpose. Le ven. 29 juin 2018 19:52, Daniel Povey <notifications@github.com> a écrit : … The l2 is quite important: if you don't have it, it trains too slowly. I think that's why the TDNN-F barely gave any improvement. But actually that recipe could be improved substantially anyway, for other reasons. Maybe the best thing is to just remove that recipe entirely and leave the 1a; then later on, I or someone else can add a good-performing TDNN-F recipe. On Fri, Jun 29, 2018 at 1:47 PM, François Hernandez < ***@***.***> wrote: > Do you think this will change a lot the results? > (The only modification in the end was the 'useless' l2 regul option, no?) > > I would vote to merge it as is, so that anyone can do further runs ans > tuning. > > Le ven. 29 juin 2018 19:21, Daniel Povey ***@***.***> a > écrit : > > > Are you OK to wait to get results with the fixed TDNN-F training run, or > > do you want me to merge it now and update the results later? > > > > — > > You are receiving this because you were mentioned. > > Reply to this email directly, view it on GitHub > > <#2442 (comment)>, > or mute > > the thread > > <https://github.com/notifications/unsubscribe-auth/ > AOMOkzqItc53kNt6ce44GaP1lXVrocDYks5uBmIpgaJpZM4UIma9> > > . > > > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > <#2442 (comment)>, or mute > the thread > < https://github.com/notifications/unsubscribe-auth/ADJVuw514hOoc2CzTBMabtjNg300kqR4ks5uBmhOgaJpZM4UIma9 > > . > — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#2442 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AOMOk92bDX68G2FnifoG7MeGWDx51iklks5uBmlRgaJpZM4UIma9> . |
| Dan, if you don't mind I think it is good to merge as is. It will match the paper and be the baseline. |
| Can you at least put a warning at the top of the TDNN-F script that the script has some problems (you can mention the specific problems f you want) and should not be used as an example to copy from? |
Here is the recipe coming with the recent release of TED-LIUM corpus version 3.
I copied the ability to download the LMs from David's PR, and made the same for the RNNLM models. Could you add these models to kaldi-asr.org? (I added a dummy link in local/ted_download_rnnlm.sh for now.)
I did not add what Dan is mentioning about the ivector training here yet.
Also, please note that the "join_suffix" normalization script present in r2 has been applied to the LM data available in the r3 tarball, which explains its disappearing from this recipe.