Effective Numerical Computation in NumPy and SciPy
This document provides an overview of effective numerical computation in NumPy and SciPy. It discusses how Python can be used for numerical computation tasks like differential equations, simulations, and machine learning. While Python is initially slower than languages like C, libraries like NumPy and SciPy allow Python code to achieve sufficient speed through techniques like broadcasting, indexing, and using sparse matrix representations. The document provides examples of how to efficiently perform tasks like applying functions element-wise to sparse matrices and calculating norms. It also presents a case study for efficiently computing a formula that appears in a machine learning paper using different sparse matrix representations in SciPy.
Overview of effective numerical computation using NumPy and SciPy by Kimikazu Kato, highlighting the focus on practical case studies.
Discusses applications of numerical computation in various fields and emphasizes Python's productivity and ease of use.
Python's speed issues compared to C; emphasizes writing Python-friendly code over direct translations from C for performance.
Recommendations for better performance in Python using NumPy: avoid loops, utilize library capabilities, and focus on efficient coding techniques.Explains broadcasting with examples; how it applies operations to arrays efficiently in different dimensions.
Illustrates indexing in NumPy, demonstrating how to modify array elements using index arrays.
Definition and advantages of sparse matrices; introduces scipy.sparse and its matrix types for efficient storage and computation.
Examples of creating and operating on sparse matrices in SciPy, comparing performance of different sparse matrix types. Detailed explanation of internal structures of csr_matrix and csc_matrix, highlighting their efficiency and performance benchmarks.
Introduces case studies showing practical applications of numerical computation techniques using sparse matrices.Dealing with diagonal sparse matrices and implementing efficient calculations for recommendation algorithms.
Performance comparison of different methods for handling sparse and dense matrices in terms of computational speed.
Summarizes key takeaways about avoiding loops in Python, understanding sparse matrix structures, and prioritizing efficient mathematical formulations.
Thanks to individuals who contributed advice during the presentation.
About Myself KimikazuKato Chief Scientists at Silver Egg Technology Co., Ltd. Ph.D in Computer Science Background in Mathematics, Numerical Computation, Algorithms, etc. <2 year experience in Python >10 year experience in numerical computation Now designing algorithms for recommendation system, and doing research about machine learning and data analysis. 2 / 35
3.
This talk... isabout effective usage of NumPy/SciPy is NOT exhaustive introduction of capabilities, but shows some case studies based on my experience and interest 3 / 35
4.
Table of Contents Introduction Basics about NumPy Broadcasting Indexing Sparse matrix Usage of scipy.sparse Internal structure Case studies Conclusion 4 / 35
5.
Numerical Computation Differentialequations Simulations Signal processing Machine Learning etc... Why Numerical Computation in Python? Productivity Easy to write Easy to debug Connectivity with visualization tools Matplotlib IPython Connectivity with web system Many frameworks (Django, Pyramid, Flask, Bottle, etc.) 5 / 35
6.
But Python isVery Slow! Code in C #include <stdio.h> int main() { int i; double s=0; for (i=1; i<=100000000; i++) s+=i; printf("%.0fn",s); } Code in Python s=0. for i in xrange(1,100000001): s+=i print s Both of the codes compute the sum of integers from 1 to 100,000,000. Result of benchmark in a certain environment: Above: 0.109 sec (compiled with -O3 option) Below: 8.657 sec (80+ times slower!!) 6 / 35
7.
Better code importnumpy as np a=np.arange(1,100000001) print a.sum() Now it takes 0.188 sec. (Measured by "time" command in Linux, loading time included) Still slower than C, but sufficiently fast as a script language. 7 / 35
8.
Lessons Python isvery slow when written badly Translate C (or Java, C# etc.) code into Python is often a bad idea. Python-friendly rewriting sometimes result in drastic performance improvement 8 / 35
9.
Basic rules forbetter performance Avoid for-sentence as far as possible Utilize libraries' capabilities instead Forget about the cost of copying memory Typical C programmer might care about it, but ... 9 / 35
Broadcasting >>> importnumpy as np >>> a=np.array([0,1,2]) >>> a*3 array([0, 3, 6]) >>> b=np.array([1,4,9]) >>> np.sqrt(b) array([ 1., 2., 3.]) A function which is applied to each element when applied to an array is called a universal function. 11 / 35
Refernces Gabriele Lanaro,"Python High Performance Programming," Packt Publishing, 2013. Stéfan van der Walt, Numpy Medkit 14 / 35
15.
Sparse matrix Definedas a matrix in which most elements are zero Compressed data structure is used to express it, so that it will be... Space effective Time effective 15 / 35
16.
scipy.sparse The classscipy.sparse has mainly three types as expressions of a sparse matrix. (There are other types but not mentioned here) lil_matrix : convenient to set data; setting a[i,j] is fast csr_matrix : convenient for computation, fast to retrieve a row csc_matrix : convenient for computation, fast to retrieve a column Usually, set the data into lil_matrix, and then, convert it to csc_matrix or csr_matrix. For csr_matrix, and csc_matrix, calcutaion of matrices of the same type is fast, but you should avoid calculation of different types. 16 / 35
Merit of knowingthe internal structure Setting csr_matrix or csc_matrix with its internal structure is much faster than setting lil_matrix with indices. See the benchmark of setting ý ý ý 20 / 35
21.
from scipy.sparse importlil_matrix, csr_matrix import numpy as np from timeit import timeit def set_lil(n): a=lil_matrix((n,n)) for i in xrange(n): a[i,i]=2. if i+1n: a[i,i+1]=1. return a def set_csr(n): data=np.empty(2*n-1) indices=np.empty(2*n-1,dtype=np.int32) indptr=np.empty(n+1,dtype=np.int32) # to be fair, for-sentence is intentionally used # (using indexing technique is faster) for i in xrange(n): indices[2*i]=i data[2*i]=2. if in-1: indices[2*i+1]=i+1 data[2*i+1]=1. indptr[i]=2*i indptr[n]=2*n-1 a=csr_matrix((data,indices,indptr),shape=(n,n)) return a print lil:,timeit(set_lil(10000), number=10,setup=from __main__ import set_lil) print csr:,timeit(set_csr(10000), number=10,setup=from __main__ import set_csr) 21 / 35
22.
Result: lil: 11.6730761528 csr: 0.0562081336975 Remark When you deal with already sorted data, setting csr_matrix or csc_matrix with data, indices, indptr is much faster than setting lil_matrix But the code tend to be more complicated if you use the internal structure of csr_matrix or csc_matrix 22 / 35
Case 1: Norms If 2 is dense: norm=np.dot(v,v) Ï2 Ï % 2% Expressed as product of matrices. (dot means matrix product, but you don't have to take transpose explicitly.) When is sparse, suppose that is expressed as matrix: 2 2 g * norm=v.multiply(v).sum() (multiply() is element-wise product) This is because taking transpose of a sparse matrix changes the type. 24 / 35
Case 2: Applyinga function to all of the elements of a sparse matrix A universal function can be applied to a dense matrix: import numpy as np a=np.arange(9).reshape((3,3)) a array([[0, 1, 2], [3, 4, 5], [6, 7, 8]]) np.tanh(a) array([[ 0. , 0.76159416, 0.96402758], [ 0.99505475, 0.9993293 , 0.9999092 ], [ 0.99998771, 0.99999834, 0.99999977]]) This is convenient and fast. However, we cannot do the same thing for a sparse matrix. 26 / 35
27.
from scipy.sparse importlil_matrix a=lil_matrix((3,3)) a[0,0]=1. a[1,0]=2. b=a.tocsr() np.tanh(b) 3x3 sparse matrix of type 'type 'numpy.float64'' with 2 stored elements in Compressed Sparse Row format This is because, for an arbitrary function, its application to a sparse matrix is not necessarily sparse. However, if a universal function satisfies , the density is preserved. Then, how can we compute it? 27 / 35
28.
Use the internalstructure!! The positions of the non-zero elements are not changed after application of the function. Keep indices and indptr, and just change data. Solution: b = csr_matrix((np.tanh(a.data), a.indices, a.indptr), shape=a.shape) 28 / 35
29.
Case 3: Formulawhich appears in a paper In the algorithm for recommendation system [1], the following formula appears: øø * g where is dense matrix, and D is a diagonal matrix defined from a given array as: % ý * Here, (which corresponds to the number of users or items) is big and (which means the number of latent factors) is small. [1] Hu et al. Collaborative Filtering for Implicit Feedback Datasets, ICDM, 2008. * 29 / 35
30.
Solution 1: Thereis a special class dia_matrix to deal with a diagonal sparse matrix. import scipy.sparse as sparse import numpy as np def f(a,d): a: 2d array of shape (n,f), d: 1d array of length n dd=sparse.diags([d],[0]) return np.dot(a.T,dd.dot(a)) 30 / 35
Solution 3: û ) û ) g g û ) û ) This is equivalent to the broadcasting! def h(a,d): return np.dot(a.T*d,a) ü ü ü * * û *) ý * ü ü g ü * * * * û *) * 32 / 35
Conclusion Try notto use for-sentence, but use libraries' capabilities instead. Knowledge about the internal structure of the sparse matrix is useful to extract further performance. Mathematical derivation is important. The key is to find a mathematically equivalent and Python-friendly formula. Computational speed does not necessarily matter. Finding a better code in a short time is valuable. Otherwise, you shouldn't pursue too much. 34 / 35
35.
Acknowledgment I wouldlike to thank (@shima__shima) who gave me useful advice in Twitter. 35 / 35