DEV Community

João M.C. Teixeira
João M.C. Teixeira

Posted on • Edited on

Parallelized vectorization with Dask - a Monte-Carlo example

Today, I came across an article on Medium about parallelization in Python (here); I used that post as an example to practice vectorization principles with Numpy - you can read my previous post on DEV here. The performance gain obtained in a single core with Numpy is outstanding.

Can we improve the performance of the vectorized Monte-Carlo approach even further?

Dask offers a Numpy-similar interface with automated parallelization. So, let us try it!

This is the solution I came up with to compute the number pi using a Monte-Carlo approach, in other words, reproducing the same algorithm as in the previous referred posts but with Dask. Here, I am using the default configuration, I am not exploring tweaks in Dask to gain further performance. I find it amazing how Dask keeps the memory profile really low. After all, Dask managed the parallelization in my laptop's 8 threads and the available memory seamlessly.

start = time.time() sample = 10_000_000_000 # <- this is huge! xxyy = da.random.uniform(-1, 1, size=(2, sample)) norm = da.linalg.norm(xxyy, axis=0) summ = da.sum(norm <= 1) insiders = summ.compute() pi = 4 * insiders / sample print("pi ~= {}".format(pi)) print("Finished in: {:.2f}s".format(time.time()-start)) 
Enter fullscreen mode Exit fullscreen mode

In my laptop:

pi ~= 3.141615808 Finished in: 107.14s 
Enter fullscreen mode Exit fullscreen mode
CPU~Quad core Intel Core i7-8550U (-MT-MCP-) speed/max~800/4000 MHz Kernel~4.15.0-99-generic x86_64 Mem~7178.7/32050.2MB HDD~2250.5GB(56.6% used) Procs~300 Client~Shell inxi~2.3.56 ` 
Enter fullscreen mode Exit fullscreen mode

Additional notes:

It is possible to write this statement:

sum = da.sum(norm <= 1) 
Enter fullscreen mode Exit fullscreen mode

using masked arrays:

mask = da.ma.masked_inside(norm, 0, 1) trues = da.ma.getmaskarray(mask) summ = da.sum(trues) 
Enter fullscreen mode Exit fullscreen mode

Yet this latter form consumes more time, about 20% in my machine.

What are your thoughts?
Cheers

Top comments (0)