NumPy Essentials: Arrays and Vectorization
Part 1: Getting Started
import numpy as np # Create your first array arr = np.array([1, 2, 3, 4, 5]) print(arr) # [1 2 3 4 5]
What happened: We converted a Python list into a NumPy array - the foundation of scientific computing.
Part 2: Arrays vs Lists
# Python list python_list = [1, 2, 3, 4, 5] print(type(python_list)) # <class 'list'> # NumPy array numpy_array = np.array([1, 2, 3, 4, 5]) print(type(numpy_array)) # <class 'numpy.ndarray'>
Key difference: Lists store objects, arrays store numbers - much faster for math!
Part 3: Array Properties
arr = np.array([1, 2, 3, 4, 5]) print(arr.shape) # (5,) - 5 elements in 1 dimension print(arr.size) # 5 - total number of elements print(arr.dtype) # int64 - data type
Intuition: Shape tells you the dimensions, size tells you total elements.
Part 4: Creating Arrays
# Zeros zeros = np.zeros(5) # [0. 0. 0. 0. 0.] # Ones ones = np.ones(3) # [1. 1. 1.] # Range range_arr = np.arange(10) # [0 1 2 3 4 5 6 7 8 9] # Evenly spaced linspace = np.linspace(0, 1, 5) # [0. 0.25 0.5 0.75 1. ]
Practice: Create arrays filled with specific values or patterns.
Part 5: 2D Arrays (Matrices)
# Create a 2D array matrix = np.array([[1, 2, 3], [4, 5, 6]]) print(matrix.shape) # (2, 3) - 2 rows, 3 columns print(matrix.size) # 6 - total elements
Visualization: Think of it as a table with rows and columns.
Part 6: Array Creation Shortcuts
# 2D zeros zeros_2d = np.zeros((3, 4)) # 3x4 matrix of zeros # Identity matrix identity = np.eye(3) # 3x3 identity matrix # Random numbers random_arr = np.random.random(5) # 5 random numbers [0,1)
Use cases: Initialize matrices for machine learning, create test data.
Part 7: Array Indexing
arr = np.array([10, 20, 30, 40, 50]) # Single element print(arr[0]) # 10 - first element print(arr[-1]) # 50 - last element # Multiple elements print(arr[1:4]) # [20 30 40] - slice notation
Rule: Same as Python lists, but much faster for large arrays.
Part 8: 2D Array Indexing
matrix = np.array([[1, 2, 3], [4, 5, 6]]) # Single element print(matrix[0, 1]) # 2 - row 0, column 1 # Entire row print(matrix[1, :]) # [4 5 6] - row 1, all columns # Entire column print(matrix[:, 2]) # [3 6] - all rows, column 2
Syntax: [row, column]
- comma separates dimensions.
Part 9: The Magic of Vectorization
# Python way (slow) python_list = [1, 2, 3, 4, 5] result = [] for x in python_list: result.append(x * 2) # NumPy way (fast) numpy_array = np.array([1, 2, 3, 4, 5]) result = numpy_array * 2 # [2 4 6 8 10]
Vectorization: Apply operations to entire arrays at once - no loops needed!
Part 10: Element-wise Operations
a = np.array([1, 2, 3]) b = np.array([4, 5, 6]) # Basic operations print(a + b) # [5 7 9] - addition print(a - b) # [-3 -3 -3] - subtraction print(a * b) # [4 10 18] - multiplication print(a / b) # [0.25 0.4 0.5] - division
Key insight: Operations happen element-by-element automatically.
Part 11: Broadcasting
# Array and scalar arr = np.array([1, 2, 3, 4]) result = arr + 10 # [11 12 13 14] # Different shapes a = np.array([[1, 2, 3]]) # 1x3 b = np.array([[10], [20]]) # 2x1 result = a + b # 2x3 result
Broadcasting: NumPy automatically expands arrays to compatible shapes.
Part 12: Mathematical Functions
arr = np.array([1, 4, 9, 16]) # Common functions print(np.sqrt(arr)) # [1. 2. 3. 4.] - square root print(np.log(arr)) # natural logarithm print(np.exp(arr)) # exponential print(np.sin(arr)) # sine
Advantage: All functions work element-wise across entire arrays.
Part 13: Array Statistics
data = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10]) # Basic statistics print(np.mean(data)) # 5.5 - average print(np.median(data)) # 5.5 - middle value print(np.std(data)) # 2.87 - standard deviation print(np.sum(data)) # 55 - total
Use case: Quick analysis of datasets without writing loops.
Part 14: Array Reshaping
arr = np.array([1, 2, 3, 4, 5, 6]) # Reshape to 2x3 reshaped = arr.reshape(2, 3) print(reshaped) # [[1 2 3] # [4 5 6]] # Flatten back to 1D flat = reshaped.flatten() # [1 2 3 4 5 6]
Rule: Total elements must stay the same (2×3 = 6 elements).
Part 15: Boolean Indexing
data = np.array([1, 5, 3, 8, 2, 9]) # Create boolean mask mask = data > 4 # [False True False True False True] # Filter data filtered = data[mask] # [5 8 9] # One-liner big_numbers = data[data > 4] # [5 8 9]
Power: Select elements based on conditions without loops.
Part 16: Array Concatenation
a = np.array([1, 2, 3]) b = np.array([4, 5, 6]) # Concatenate combined = np.concatenate([a, b]) # [1 2 3 4 5 6] # Stack vertically stacked = np.vstack([a, b]) # [[1 2 3] # [4 5 6]]
Use case: Combine datasets or results from different computations.
Part 17: Matrix Operations
# Matrix multiplication A = np.array([[1, 2], [3, 4]]) B = np.array([[5, 6], [7, 8]]) # Element-wise multiplication element_wise = A * B # [[5 12] [21 32]] # Matrix multiplication matrix_mult = A @ B # [[19 22] [43 50]]
Difference: *
is element-wise, @
is true matrix multiplication.
Part 18: Performance Comparison
import time # Large arrays size = 1000000 a = np.random.random(size) b = np.random.random(size) # Time NumPy start = time.time() result = a + b numpy_time = time.time() - start print(f"NumPy time: {numpy_time:.4f} seconds") # Typically 100x faster than pure Python!
Why faster: NumPy uses optimized C code under the hood.
Part 19: Common Patterns
# Generate data x = np.linspace(0, 10, 100) # 100 points from 0 to 10 y = np.sin(x) # Sine wave # Find peaks peaks = y[y > 0.9] # Normalize data normalized = (y - np.mean(y)) / np.std(y)
Real-world: Data generation, filtering, and preprocessing.
Part 20: Advanced Indexing
arr = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]) # Fancy indexing rows = [0, 2] cols = [1, 2] result = arr[rows, cols] # [2 9] - elements at (0,1) and (2,2) # Boolean indexing with conditions mask = (arr > 3) & (arr < 8) # Multiple conditions filtered = arr[mask] # [4 5 6 7]
Power: Extract complex patterns from data with simple syntax.
Part 21: Array Sorting
data = np.array([3, 1, 4, 1, 5, 9, 2, 6]) # Sort array sorted_data = np.sort(data) # [1 1 2 3 4 5 6 9] # Get sort indices indices = np.argsort(data) # [1 3 6 0 2 7 4 8] # Sort 2D array matrix = np.array([[3, 1], [4, 2]]) sorted_matrix = np.sort(matrix, axis=1) # Sort each row
Use case: Order data for analysis or find top/bottom values.
Part 22: Working with NaN
data = np.array([1, 2, np.nan, 4, 5]) # Check for NaN has_nan = np.isnan(data) # [False False True False False] # Remove NaN clean_data = data[~np.isnan(data)] # [1. 2. 4. 5.] # NaN-aware functions mean_ignore_nan = np.nanmean(data) # 3.0
Real data: Often contains missing values - NumPy handles them gracefully.
Part 23: Array Memory and Views
arr = np.array([1, 2, 3, 4, 5]) # Slicing creates a view (shares memory) view = arr[1:4] view[0] = 999 print(arr) # [1 999 3 4 5] - original changed! # Copy creates new array copy = arr.copy() copy[0] = 777 print(arr) # [1 999 3 4 5] - original unchanged
Memory efficiency: Views save memory, copies ensure independence.
Part 24: Practical Example - Data Analysis
# Simulate temperature data days = 30 temperatures = np.random.normal(25, 5, days) # Mean 25°C, std 5°C # Analysis avg_temp = np.mean(temperatures) hot_days = np.sum(temperatures > 30) cold_days = np.sum(temperatures < 20) temp_range = np.max(temperatures) - np.min(temperatures) print(f"Average: {avg_temp:.1f}°C") print(f"Hot days (>30°C): {hot_days}") print(f"Cold days (<20°C): {cold_days}") print(f"Temperature range: {temp_range:.1f}°C")
Real application: Weather data analysis with just a few lines.
Part 25: Image Processing Example
# Create a simple "image" (2D array) image = np.random.randint(0, 256, (100, 100)) # 100x100 grayscale # Basic operations bright_image = image + 50 # Brighten dark_image = image * 0.5 # Darken threshold = image > 128 # Binary threshold # Image statistics print(f"Average brightness: {np.mean(image):.1f}") print(f"Bright pixels: {np.sum(image > 200)}")
Application: Images are just arrays of numbers - perfect for NumPy.
Part 26: Scientific Computing
# Simulate a simple physics problem time = np.linspace(0, 10, 1000) # Time from 0 to 10 seconds gravity = 9.81 # m/s² initial_velocity = 50 # m/s # Calculate position (physics equation) position = initial_velocity * time - 0.5 * gravity * time**2 # Find maximum height max_height = np.max(position) max_time = time[np.argmax(position)] print(f"Maximum height: {max_height:.1f}m at {max_time:.1f}s")
Power: Solve complex scientific problems with vectorized operations.
Part 27: Performance Tips
# Avoid Python loops # BAD: result = [] for x in large_array: result.append(x**2) # GOOD: result = large_array**2 # Use built-in functions # BAD: total = 0 for x in large_array: total += x # GOOD: total = np.sum(large_array)
Golden rule: If you're writing a loop, there's probably a NumPy function for it.
Part 28: Common Mistakes
# Mistake 1: Creating arrays in loops # BAD: arr = np.array([]) for i in range(1000): arr = np.append(arr, i) # Slow! # GOOD: arr = np.arange(1000) # Fast! # Mistake 2: Not using vectorization # BAD: result = np.zeros(len(arr)) for i in range(len(arr)): result[i] = arr[i] * 2 # GOOD: result = arr * 2
Efficiency: Pre-allocate arrays and use vectorized operations.
Part 29: Next Steps
# What you can do with NumPy: # 1. Data analysis (pandas builds on NumPy) # 2. Machine learning (scikit-learn uses NumPy) # 3. Image processing (OpenCV, PIL) # 4. Scientific computing (SciPy) # 5. Deep learning (TensorFlow, PyTorch) # Example: Linear regression in one line X = np.random.random((100, 2)) y = np.random.random(100) weights = np.linalg.lstsq(X, y, rcond=None)[0]
Foundation: NumPy is the base for the entire Python scientific ecosystem.
Key Takeaways
- Arrays > Lists: Faster, more memory efficient for numerical data
- Vectorization: Apply operations to entire arrays at once
- Broadcasting: Automatically handle different array shapes
- Boolean indexing: Filter data with conditions
- No loops: NumPy functions are optimized - use them!
- Shape matters: Understanding dimensions is crucial
- Memory views: Slicing shares memory, copying creates new arrays
Practice Challenge
# Create a 10x10 matrix of random numbers # Find all numbers greater than 0.5 # Calculate their average # Replace numbers less than 0.3 with 0 matrix = np.random.random((10, 10)) mask = matrix > 0.5 high_values = matrix[mask] average = np.mean(high_values) matrix[matrix < 0.3] = 0 print(f"Found {len(high_values)} values > 0.5") print(f"Their average: {average:.3f}")
Master these concepts and you'll have a solid foundation for data science, machine learning, and scientific computing!
Top comments (0)