Introduction to NumPy for Machine Learning Programmers
This document serves as an introduction to NumPy for machine learning programmers, detailing its basic usage, performance optimization techniques, and practical applications such as ridge regression and non-negative matrix factorization. It emphasizes the importance of leveraging built-in libraries for efficient computations in Python and includes various code examples to illustrate these concepts. The presentation concludes by encouraging the study of source code, particularly from scikit-learn, to enhance understanding of machine learning algorithms.
Introduction to NumPy targeting those implementing ML in Python. Overview includes preliminaries, usage, indexing, broadcasting, scikit-learn case study, and conclusion.
Kimikazu Kato, Chief Scientist at Silver Egg Technology with expertise in algorithm design and a Ph.D. in Computer Science.
Highlighting Python's inefficiencies compared to C, demonstrating performance benchmarks, and discussing optimal coding practices for enhanced speed.
Guidelines for better performance in Python through NumPy, focusing on indexing, broadcasting, and memory management.
Examples of Boolean indexing in NumPy arrays, including comparisons and conditions, supplemented with a pandas DataFrame example.
Explaining Ridge regression through scikit-learn, highlighting key equations, the role of matrices, and performance enhancements.
Introduction to NMF, the algorithm's approach to approximate matrices, and its application in tasks like face detection.
Summary of best practices using NumPy and references for further reading on scikit-learn and NumPy enhancements.
Who am I? KimikazuKato Chief Scientist at Silver Egg Technology Algorithm designer for a recommendation system Ph.D in computer science (Master's degree in math) 4 / 26
5.
Python is VerySlow! Code in C #include<stdio.h> intmain(){ inti;doubles=0; for(i=1;i<=100000000;i++)s+=i; printf("%.0fn",s); } Code in Python s=0. foriinxrange(1,100000001): s+=i prints Both of the codes compute the sum of integers from 1 to 100,000,000. Result of benchmark in a certain environment: Above: 0.109 sec (compiled with -O3 option) Below: 8.657 sec (80+ times slower!!) 5 / 26
Lessons Python is veryslow when written badly Translate C (or Java, C# etc.) code into Python is often a bad idea. Python-friendly rewriting sometimes result in drastic performance improvement 7 / 26
8.
Basic rules forbetter performance Avoid for-sentence as far as possible Utilize libraries' capabilities instead Forget about the cost of copying memory Typical C programmer might care about it, but ... 8 / 26
Cf. In Pandas >>>importpandasaspd >>>importnumpyasnp >>>df=pd.DataFrame(np.random.randn(5,3),columns=["A","B","C"]) >>>df AB C 0 1.084117-0.626930-1.818375 1 1.717066 2.554761-0.560069 2-1.355434-0.464632 0.322603 3 0.013824 0.298082-1.405409 4 0.743068 0.292042-1.002901 [5rowsx3columns] >>>df[df.A>0.5] A B C 0 1.084117-0.626930-1.818375 1 1.717066 2.554761-0.560069 4 0.743068 0.292042-1.002901 [3rowsx3columns] >>>df[(df.A>0.5)&(df.B>0)] A B C 1 1.717066 2.554761-0.560069 4 0.743068 0.292042-1.002901 [2rowsx3columns] 14 / 26
15.
Case Study 1:Ridge Regression (sklearn.linear_model.Ridge) , : input, output of training data : hyper parameter The optimum is given as: The corresponding part of the code: K=safe_sparse_dot(X,X.T,dense_output=True) try: dual_coef=_solve_cholesky_kernel(K,y,alpha) coef=safe_sparse_dot(X.T,dual_coef,dense_output=True).T exceptlinalg.LinAlgError: (sklearn.h/linear_model/ridge.py L338-343) ∥y − Xw + α∥wmin w ∥ 2 2 ∥ 2 2 X y α w = ( X + αI yX T ) −1 X T 15 / 26
flat classflatiter(builtins.object) | Flatiteratorobjecttoiterateoverarrays. | | Aflatiteriteratorisreturnedby``x.flat``foranyarrayx. |Itallowsiteratingoverthearrayasifitwerea1-Darray, | eitherinafor-looporbycallingitsnextmethod. | | IterationisdoneinC-contiguousstyle,withthelastindexvaryingthe | fastest.Theiteratorcanalsobeindexedusingbasicslicingor | advancedindexing. | | SeeAlso | -------- | ndarray.flat:Returnaflatiteratoroveranarray. | ndarray.flatten:Returnsaflattenedcopyofanarray. | | Notes | ----- | AflatiteriteratorcannotbeconstructeddirectlyfromPythoncode | bycallingtheflatiterconstructor. In short, x.flatis a reference to the elements of the array x, and can be used like a one dimensional array. 18 / 26
Case Study 2:NMF (sklearn.decomposition.nmf) NMF = Non-negative Matrix Factorization Successful in face part detection 20 / 26
21.
Idea of NMF Approximatethe input matrix as a product of two smaller non-negative matrix: Notation Parameter set: Error function: X ≈ HW T ≥ 0, ≥ 0Wij Hij Θ = (W , H), : i-th element of Θθi f(Θ) = ∥X − HW T ∥ 2 F 21 / 26
22.
Algorithm of NMF Projectedgradient descent (Lin 2007): where Convergence condition: where (Note: ) = P [ − α∇f( )]Θ (k+1) Θ (k) Θ (k) P [x = max(0, )]i xi f( ) ≤ ϵ f( )∥ ∥∇ P Θ (k) ∥ ∥ ∥ ∥∇ P Θ (1) ∥ ∥ f(Θ) = {∇ P ∇f(Θ)i min (0, ∇f(Θ ))i if > 0θi if = 0θi ≥ 0θi 22 / 26
Conclusion Avoid for-sentence; useNumPy/SciPy's capabilities Mathematical derivation is important You can learn a lot from the source code of scikit-learn 25 / 26
26.
References Official scikit-learn For beginners ofNumPy/SciPy Gabriele Lanaro, "Python High Performance Programming," Packt Publishing, 2013. Stéfan van der Walt, Numpy Medkit Python Scientific Lecture Notes Algorithm of NMF C.-J. Lin. Projected gradient methods for non-negative matrix factorization. Neural Computation 19, 2007. 26 / 26