Skip to content

Conversation

@xiaohui-zhang
Copy link
Contributor

… compatiblity issues.

Copy link
Contributor

@danpovey danpovey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't gone through all of this... let me see how you respond to these issues... see if you can apply these comments as globally as possible.

paste \
<(cat $lexicon | awk '{print $1}' | uconv -f $encoding -t $encoding -x "$icu_transform") \
<(cat $lexicon | sed $'s/^[^ \t][^ \t]*[ \t]//g') \
<(cat $lexicon | perl -ape 's/^[^ \t][^ \t]*[ \t]//g;') \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because this regex is more complicated you should check with some example input that it does what you think it does (different progrmas have different conventions).

tab=$'\t'
ls -1 $wav_train/id$sid/*.wav \
| sed "s/\(.*\)\/\(.*\).wav/${sid2}_\2${tab}\1\/\2.wav/" \
| perl -ape "s/(.*)\/(.*).wav/${sid2}_\2\t\1\/\2.wav/;" \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can't assume that these advanced features are compatible; you'll have to test it. I suggest to just figure out a single line that you think should probably match, and try both versions. However, in this particular case you could probably leave the sed there. It's the newline which we really have a problem with, the tab can be dealt with using the existing solution.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep that's what I did regarding tests. Do you still want to switch back to our old solution using sed?

tab=$'\t'
ls -1 $wav_dir/*/s${sid}_*.wav \
| sed "s/\(.*\)\/\(.*\)\/s.*_\(.*\).wav/${sid2}_\3_\2${tab}\1\/\2\/s${sid}_\3.wav/" \
| perl -ape "s/(.*)\/(.*)\/s.*_(.*).wav/${sid2}_\3_\2$\t\1\/\2\/s${sid}_\3.wav/;" \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again, for complex things like this you should either test somehow, or leave the sed command.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep. I actually tested every complex perl command I wrote... The major feature not compatible I've encountered that is that "(" is already a special symbol in perl so we don't need the backslash before it.

@danpovey danpovey merged commit 7ed7311 into kaldi-asr:master Oct 11, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

2 participants