Skip to content

Conversation

@kkm000
Copy link
Contributor

@kkm000 kkm000 commented Apr 1, 2019

Ok, here are my changes so far. The only thing I did not do here is change the configure default to MKL. so this may possibly be a WIP.

I debugged the script on 20 different Docker images, ranging from deprecated systems to the latest stable branches: https://github.com/kkm000/install_mkl. The tests use the expect tool, since the script asks to run sudo, sudo asks for the password, etc.

Now, how do we handle that transition overall? Just change the default blindly to MKL and see what is the fallout? I wold at least post a heads-up to the kaldi-help list. The current default is ATLAS, which is not good at all. What I am thinking of doing is maybe add some detection code (MKL in the recent packaging is very easy to detect, so it's either it's either in the system (just compiles) or in a well known directory /opt/inte/mkl). Except I do not know where to look for it on a Mac. My co-worker has a Mac, I'll email him to bring it, but I do not know if he still has it, and I do not remember if he's coming back from vacation today.

Generally, regarding openblas detection, I think invoking a test compile is a better approach than poking around trying to find it it's in the system in a known location. If it is, a compile with -l... will do the trick. If not, then it's likely missing. In any case, it's probably better to build OpenBLAS from source. At the very least, it might be better optimized for the user's system.

Running the reference CBLAS will probably be disastrously slow anyway.

So, in the end, I'm thinking we should, in the case the user did not ask for a specific math lib:

  • Check if MKL is there, and use it.
  • Check for OpenBlas in tools/. Then check for it in the system. What we can do about packaging, I dunno yet. rhel packages seem to package <cblas.h>, should work as is. debian packages use a known filename, I'll see if we can adapt it.
  • If OpenBLAS is found, print a note that MKL is better, and use OpenBLAS.
  • If not, print an error and ask to use the option to specify the library.

I. e., do not probe for neither the reference blas or ATLAS, because these are rather inferior options.

What do you think? Am I making sense?

Close #3078

@danpovey
Copy link
Contributor

danpovey commented Apr 1, 2019 via email

@kkm000
Copy link
Contributor Author

kkm000 commented Apr 1, 2019

Thanks for checking with MKL on Mac! I'll try to understand what did not it like about the check_dependencies.sh.

After that, running check_dependencies.sh didn't print anything about MKL

Can you please figure out where did the MKL files end up installed, includes and libs? Was it also /opt/intel, or some Mac-specific location?

I'll update the missing MKL message. I remember I pretty much s/ATLAS/MKL/, but you are right, it became confusing. install_mkl.sh can work on Linux only (Intel provides rm and deb only packages, and only for Linux, on the feed. So the message is misleading for Mac, should point to the intel's site instead.

I have not touched configure yet, so no changes there.

@danpovey
Copy link
Contributor

danpovey commented Apr 1, 2019 via email

@kkm000
Copy link
Contributor Author

kkm000 commented Apr 1, 2019

Super, thank you! Now I know everything I need except why bash on Mac is not like bash everywhere else, and did not digest the line 152, and why did the script print the line mac:tools: at the end of the message. My fellow Mac owner will be working from home this whole week, but I'll ask around.

@danpovey
Copy link
Contributor

danpovey commented Apr 2, 2019 via email

@kkm000
Copy link
Contributor Author

kkm000 commented Apr 2, 2019

It's bash 3.2, actually. It requires the case ... (pattern1|pattern2) syntax when used inside a subshell.

The missing file is handled by 2>/dev/null; the evalled string is empty.

@kkm000
Copy link
Contributor Author

kkm000 commented Apr 2, 2019

Do you think the steps I described in the text for trying OpenBLAS if MKL not installed are sensible?

/cc @jtrmal

@danpovey
Copy link
Contributor

danpovey commented Apr 2, 2019 via email

@kkm000
Copy link
Contributor Author

kkm000 commented Apr 2, 2019

The main question is about detecting the matrix library in configure:

Running the reference CBLAS will probably be disastrously slow anyway.

So, in the end, I'm thinking we should, in the case the user did not ask for a specific math lib:

  • Check if MKL is there, and use it.
  • Check for OpenBlas in tools/. Then check for it in the system. What we can do about packaging, I dunno yet. rhel packages seem to package <cblas.h>, should work as is. debian packages use a known filename, I'll see if we can adapt it.
  • If OpenBLAS is found, print a note that MKL is better, and use OpenBLAS.
  • If not, print an error and ask to use the option to specify the library.

I. e., do not probe for neither the reference blas or ATLAS, because these are rather inferior options.

What do you think? Am I making sense?

@kkm000
Copy link
Contributor Author

kkm000 commented Apr 2, 2019

Only in case the user did not specify an explicit library.

@danpovey
Copy link
Contributor

danpovey commented Apr 2, 2019 via email

@kkm000
Copy link
Contributor Author

kkm000 commented Apr 2, 2019

The current default in configure is ATLAS on Linux, and that's not good. What should we use?

@danpovey
Copy link
Contributor

danpovey commented Apr 2, 2019 via email

@kkm000
Copy link
Contributor Author

kkm000 commented Apr 2, 2019

Aha, I see.

elif [ "`uname`" == "Darwin" ]; then . . . if [ ! -e /System/Library/Frameworks/Accelerate.framework ]; then failure "Need the Accelerate framework to compile on Darwin." fi 

So it's always accelerate on Mac. And for Linux, I'll just change to MKL for now.

@kkm000
Copy link
Contributor Author

kkm000 commented Apr 2, 2019

$ ./configure Configuring KALDI to use MKL Configuring ... Checking compiler g++ ... Checking OpenFst library in /home/kkm/work/kaldi2/tools/openfst ... Checking cub library in /home/kkm/work/kaldi2/tools/cub ... Doing OS specific configurations ... On Linux: Checking for linear algebra header files ... Configuring MKL library directory: Found: /opt/intel/mkl/lib/intel64 MKL configured with threading: sequential, libs: -L/opt/intel/mkl/lib/intel64 -Wl,-rpath=/opt/intel/mkl/lib/intel64 -lmkl_intel_lp64 -lmkl_core -lmkl_sequential MKL include directory configured as: /opt/intel/mkl/include Configuring MKL threading as sequential MKL threading libraries configured as -ldl -lpthread -lm Using Intel MKL as the linear algebra library. Intel(R) Math Kernel Library Version 2019.0.2 Product Build 20190118 for Intel(R) 64 architecture applications Successfully configured for Linux with MKL libs from /opt/intel/mkl CUDA will not be used! If you have already installed cuda drivers and cuda toolkit, try using --cudatk-dir=... option. Note: this is only relevant for neural net experiments Info: configuring Kaldi not to link with Speex (don't worry, it's only needed if you intend to use 'compress-uncompress-speex', which is very unlikely) SUCCESS To compile: make clean -j; make depend -j; make -j ... or e.g. -j 10, instead of -j, to use a specified number of CPUs 
@kkm000 kkm000 changed the title [WIP-ish] Simplify installation and document the use of Intel MKL Simplify installation and document the use of Intel MKL Apr 2, 2019
@danpovey
Copy link
Contributor

danpovey commented Apr 2, 2019 via email

@danpovey
Copy link
Contributor

danpovey commented Apr 2, 2019

@huangruizhe can you please test this branch on our grid and see if it compiles, and also compare the speed (e.g. of matrix-lib-test) with our current installation? Please show the output of the check_dependencies.sh and configure scripts.

@huangruizhe
Copy link
Contributor

Got it.

@huangruizhe
Copy link
Contributor

@danpovey It seems it requires sudo to install this branch. Here are the commands and outputs:

extras/check_dependencies.sh

extras/check_dependencies.sh: Intel MKL is not installed. Run extras/install_mkl.sh to install it. ... You can also use other matrix algebra libraries. For information, see: ... http://kaldi-asr.org/doc/matrixwrap.html 

extras/install_mkl.sh

extras/install_mkl.sh: Your system is using debian-style package management. extras/install_mkl.sh: You must be root to install MKL. Restart this script using the 'sudo' command, as: sudo extras/install_mkl.sh -sp debian intel-mkl-64bit-2019.2-057 We recommend adding the '-sp debian' options to skip the MKL and distro detection, since this has already been done. This minimizes the number of programs invoked with the root privileges to keep your system safe from unexpected or erroneous changes. Also, if you are setting the CC environment variable, sudo might not allow it to propagate to the command that it invokes. Run the above sudo command now? [Y/n]:n 

Could you help me with this? Thanks!

@kkm000
Copy link
Contributor Author

kkm000 commented Apr 4, 2019

@huangruizhe, just chiming in to confirm that what you are seeing is this is the correct behavior so far.

BTW,

Run the above sudo command now? [Y/n]:n 

The n at the end means you (or something doing it for you on the node) has actually entered the n. You may try a y and see what happens. Depending on the account settings, it may simply continue, or it may ask for your own password (to confirm it's still you at the console) and will also continue install. If the account you were logging into is set up differently, sudo may refuse to do anything at all, or it may ask for the root password instead, which you may or may not know. I mean, if it's not allowed by sudo security setup, you won't be able to do anything anyway, no harm trying, but if it allowed, you may be able still to run setup to the end.

If your CPU support the AVX512 instruction set, I would be surprised if you do not see at the least 3× speed improvement over OpenBLAS. If it's only AVX2, the figure would be more modest, my guess is 1.4 to 2.0+ times improvement. Very interesting, actually, I'm eager to see the result.

The matrix-lib-speed-test is not built by default; you can build and time it with this command sequence (even probably w/o building the whole Kaldi) from the src/ directory:

# From the src/ directory, $ pwd /home/kkm/work/kaldi2/src # 1. Make sure the base/ and matrix/ libraries are built, as matrix depends only on base make -j8 -C base all && make -j8 -C matrix all # Then build and run only that test. There is no simple way to build it separately # from running, we ignore its timing the first time. make -C matrix test TESTFILES=matrix-lib-speed-test # The second time the same command will not build anything, and just run the test, # so the reported timing is fair. time make -C matrix test TESTFILES=matrix-lib-speed-test 

Full output on my machine:

make: Entering directory '/home/kkm/work/kaldi2/src/matrix' Running matrix-lib-speed-test ... 2s... SUCCESS matrix-lib-speed-test make: Leaving directory '/home/kkm/work/kaldi2/src/matrix' real 0m1.761s user 0m1.719s sys 0m0.021s 

I'd say the test is a bit too short for the result to be amenable to a real comparison, but as a ballpark baseline it's perfectly OK. The above figures are for a build with the same MKL version on quite a modern high-end CPU, which we can check by querying the /proc filesystem:

$ cat /proc/cpuinfo | egrep '^(model name|cache size|flags)' | head -3 model name : Intel(R) Core(TM) i9-7960X CPU @ 2.80GHz cache size : 22528 KB flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single pti ssbd mba ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb intel_pt avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts hwp hwp_act_window hwp_epp hwp_pkg_req flush_l1d 

Below is a cleaner way to spot these AVX512 instruction subsets among this mess of CPU capability flags (I'm using head because the output repeats as many times as there are CPU cores, usually identical)

$ cat /proc/cpuinfo | grep '^flags' | head -1 | egrep -o 'avx512\w+' avx512f avx512dq avx512cd avx512bw avx512vl 

So we have the avx512f, avx512dq, avx512cd, avx512bw and avx512vl instruction subsets, and MKL will derive it's larges performance gains in matrix ops from these. But my CPU is of a desktop variety; xeons sometimes have a larger cache than the 21MB this CPU has, which is even better for large matrices.

The 2.80GHz figure is a bit misleading, as the CPU has an advanced modern clock control. The clock actually varies during computation and reaches 4.6GHz, but with AVX512 units in use, will probably drop to 3.2GHz or so, because of the large amount of heat these computations produce.

It would be very interesting what kind of performance figures you'd finally get on MKL vs OpenBLAS, and if you post the output of the same cat /proc/cpuinfo | egrep '^(model name|cache size|flags)' | head -3 command, that would show your CPU type, too (unless the Uni for some reason does not allow outsiders to know the grid computer hardware specs; Dan should know if it's ok).

Thanks for your work, really appreciate, and while you are unable to get MKL on the nodes, you may configure with OpenBLAS and get the reference run time with it, to compare with the MKL runtime later, if you have time.

@huangruizhe
Copy link
Contributor

@kkm000 Thanks for the kind explanation!

I would try to get it run, replicate your experiments, and find out the speedup. For the moment, I do not have a sudo access so I cannot install MKL. Here is what I get if I type "Y":

Sorry, user r***g is not allowed to execute 'extras/install_mkl.sh -sp debian intel-mkl-64bit-2019.2-057' as root on a1*.c***.edu. 

Let us see how Dan would comment.

and while you are unable to get MKL on the nodes, you may configure with OpenBLAS and get the reference run time with it, to compare with the MKL runtime later,

Would you point me to how to configure it with OpenBLAS? Is it the current configuration of Kaldi? Perhaps I should read some info here. Thanks again!

@kkm000
Copy link
Contributor Author

kkm000 commented Apr 4, 2019

I see, thanks for checking. Actually, I likely was wrong about exactly OpenBLAS, sorry. What Dan initially asked was

and also compare the speed (e.g. of matrix-lib-test) with our current installation

I do not know what is your "current installation"; I incorrectly assumed it was OpenBLAS (as it's usually the fastest choice AFAIK), but it may be something else. So basically it's all about doing exactly the same test as above using the regular configuration that you normally use. To be comparable, both tests should be better performed on the same machine, if that's doable (I do not know if it's possible, as all clusters are different).

@danpovey
Copy link
Contributor

danpovey commented Apr 4, 2019 via email

@kkm000
Copy link
Contributor Author

kkm000 commented Apr 5, 2019

Dan, I am not sure I understand which of the two options do you mean: just omit the 'exec sudo', or give the user the commands to run instead of scripting them? Both options look inferior, the latter more so. If first, there is less potential for error, and I'm adding command line options to skip as much stuff as possible (full message that also explains it to the user in @huangruizhe's comment above):

Restart this script using the 'sudo' command, as: sudo $0 -sp $distro $package ... more blah blah ... if [ -t 0 ]; then echo; read -ep "Run the above sudo command now? [Y/n]:" case $REPLY in ''|[Yy]*) set -x; exec sudo "$0" -sp "$distro" "$package" ...

Nothing wrong with that, really. Say yes or retype the command? To me, yes is less error-prone.

If you mean to "just give them commands to run", then it's not as simple as it sounds, and sometimes worse than ugly, for apt in particular. I won't copy the whole kaboom here, but look at this, including old/new apt detection: https://github.com/kaldi-asr/kaldi/blob/a4785af4e/tools/extras/install_mkl.sh#L213-L243

I did it quite the best recommended way. The old apt-key import way of dropping the signing key into the common store is not considered the best practice today, but is the only way to do it with apt before v1.2. This version is already rare in the wild (debian 7 and ubuntu 14, both EOL), but all instructions you'd find for setting apt repos suggest doing that--for simplicity, because apt does not have a way to do it correctly and simply, choose one. I'm faithfully doing the setup the long but the most secure way (as I always do on my own machine). So from the security standpoint, I am doing it significantly better than even the own Intel's instructions say. I'm even going as far as to give the notice that their apt is not the latest and how to restore the currently recommended, more secure setup when/if they upgrade the distro. And even further, I'm printing that notice after apt-get has finished its churn, so maybe the user would really read it...

I've run serious interactive (using 'expect') tests of this script for correctness on 20 different Linuxes, here's the test harness: https://github.com/kkm000/install_mkl.

Every command that is run in elevated privilege state is printed (set -x).

As a final, although quite a weak argument, I'm with a SaaS company that works with customer's data, which means one break-in and we're out. So every line I write for privileged code is habitually passed through my internal "what-if" filter. Not a great argument, really, as everyone, me included, is biased in estimating their own cognitive biases, but still, what's certain is that I'm very likely much more security-conscious than not...

If this does not sound convincing, then I think it is better to simply drop the script and point to the Intel's site. I do respect and understand your security diligence, but I'm sure this script is not the most dangerous piece of software among those which an average user casually runs using sudo five times a day, really.

@danpovey
Copy link
Contributor

danpovey commented Apr 5, 2019 via email

@danpovey danpovey merged commit faa7ff8 into kaldi-asr:master Apr 7, 2019
kkm000 pushed a commit to kkm000/kaldi that referenced this pull request Apr 9, 2019
Establish MATHLIB as the single ground-truth source for the chosen matrix algebra library.
@kkm000
Copy link
Contributor Author

kkm000 commented Apr 9, 2019

X-ref: Follow-up fix in #3216.

kkm000 pushed a commit that referenced this pull request Apr 9, 2019
Establish MATHLIB as the single ground-truth source for the chosen matrix algebra library.
@kkm000 kkm000 deleted the 19-use-mkl branch April 23, 2019 07:36
danpovey pushed a commit to danpovey/kaldi that referenced this pull request Jun 19, 2019
Establish MATHLIB as the single ground-truth source for the chosen matrix algebra library.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

3 participants