Posted on Jul 6

NumPy Essentials: Arrays and vectorization

NumPy Essentials: Arrays and Vectorization

Part 1: Getting Started

import numpy as np # Create your first array arr = np.array([1, 2, 3, 4, 5]) print(arr) # [1 2 3 4 5]

What happened: We converted a Python list into a NumPy array - the foundation of scientific computing.

Part 2: Arrays vs Lists

# Python list python_list = [1, 2, 3, 4, 5] print(type(python_list)) # <class 'list'>  # NumPy array numpy_array = np.array([1, 2, 3, 4, 5]) print(type(numpy_array)) # <class 'numpy.ndarray'>

Key difference: Lists store objects, arrays store numbers - much faster for math!

Part 3: Array Properties

arr = np.array([1, 2, 3, 4, 5]) print(arr.shape) # (5,) - 5 elements in 1 dimension print(arr.size) # 5 - total number of elements print(arr.dtype) # int64 - data type

Intuition: Shape tells you the dimensions, size tells you total elements.

Part 4: Creating Arrays

# Zeros zeros = np.zeros(5) # [0. 0. 0. 0. 0.]  # Ones ones = np.ones(3) # [1. 1. 1.]  # Range range_arr = np.arange(10) # [0 1 2 3 4 5 6 7 8 9]  # Evenly spaced linspace = np.linspace(0, 1, 5) # [0. 0.25 0.5 0.75 1. ]

Practice: Create arrays filled with specific values or patterns.

Part 5: 2D Arrays (Matrices)

# Create a 2D array matrix = np.array([[1, 2, 3], [4, 5, 6]]) print(matrix.shape) # (2, 3) - 2 rows, 3 columns print(matrix.size) # 6 - total elements

Visualization: Think of it as a table with rows and columns.

Part 6: Array Creation Shortcuts

# 2D zeros zeros_2d = np.zeros((3, 4)) # 3x4 matrix of zeros  # Identity matrix identity = np.eye(3) # 3x3 identity matrix  # Random numbers random_arr = np.random.random(5) # 5 random numbers [0,1)

Use cases: Initialize matrices for machine learning, create test data.

Part 7: Array Indexing

arr = np.array([10, 20, 30, 40, 50]) # Single element print(arr[0]) # 10 - first element print(arr[-1]) # 50 - last element  # Multiple elements print(arr[1:4]) # [20 30 40] - slice notation

Rule: Same as Python lists, but much faster for large arrays.

Part 8: 2D Array Indexing

matrix = np.array([[1, 2, 3], [4, 5, 6]]) # Single element print(matrix[0, 1]) # 2 - row 0, column 1  # Entire row print(matrix[1, :]) # [4 5 6] - row 1, all columns  # Entire column print(matrix[:, 2]) # [3 6] - all rows, column 2

Syntax: [row, column] - comma separates dimensions.

Part 9: The Magic of Vectorization

# Python way (slow) python_list = [1, 2, 3, 4, 5] result = [] for x in python_list: result.append(x * 2) # NumPy way (fast) numpy_array = np.array([1, 2, 3, 4, 5]) result = numpy_array * 2 # [2 4 6 8 10]

Vectorization: Apply operations to entire arrays at once - no loops needed!

Part 10: Element-wise Operations

a = np.array([1, 2, 3]) b = np.array([4, 5, 6]) # Basic operations print(a + b) # [5 7 9] - addition print(a - b) # [-3 -3 -3] - subtraction print(a * b) # [4 10 18] - multiplication print(a / b) # [0.25 0.4 0.5] - division

Key insight: Operations happen element-by-element automatically.

Part 11: Broadcasting

# Array and scalar arr = np.array([1, 2, 3, 4]) result = arr + 10 # [11 12 13 14]  # Different shapes a = np.array([[1, 2, 3]]) # 1x3 b = np.array([[10], [20]]) # 2x1 result = a + b # 2x3 result

Broadcasting: NumPy automatically expands arrays to compatible shapes.

Part 12: Mathematical Functions

arr = np.array([1, 4, 9, 16]) # Common functions print(np.sqrt(arr)) # [1. 2. 3. 4.] - square root print(np.log(arr)) # natural logarithm print(np.exp(arr)) # exponential print(np.sin(arr)) # sine

Advantage: All functions work element-wise across entire arrays.

Part 13: Array Statistics

data = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10]) # Basic statistics print(np.mean(data)) # 5.5 - average print(np.median(data)) # 5.5 - middle value print(np.std(data)) # 2.87 - standard deviation print(np.sum(data)) # 55 - total

Use case: Quick analysis of datasets without writing loops.

Part 14: Array Reshaping

arr = np.array([1, 2, 3, 4, 5, 6]) # Reshape to 2x3 reshaped = arr.reshape(2, 3) print(reshaped) # [[1 2 3] # [4 5 6]]  # Flatten back to 1D flat = reshaped.flatten() # [1 2 3 4 5 6]

Rule: Total elements must stay the same (2×3 = 6 elements).

Part 15: Boolean Indexing

data = np.array([1, 5, 3, 8, 2, 9]) # Create boolean mask mask = data > 4 # [False True False True False True]  # Filter data filtered = data[mask] # [5 8 9]  # One-liner big_numbers = data[data > 4] # [5 8 9]

Power: Select elements based on conditions without loops.

Part 16: Array Concatenation

a = np.array([1, 2, 3]) b = np.array([4, 5, 6]) # Concatenate combined = np.concatenate([a, b]) # [1 2 3 4 5 6]  # Stack vertically stacked = np.vstack([a, b]) # [[1 2 3] # [4 5 6]]

Use case: Combine datasets or results from different computations.

Part 17: Matrix Operations

# Matrix multiplication A = np.array([[1, 2], [3, 4]]) B = np.array([[5, 6], [7, 8]]) # Element-wise multiplication element_wise = A * B # [[5 12] [21 32]]  # Matrix multiplication matrix_mult = A @ B # [[19 22] [43 50]]

Difference: * is element-wise, @ is true matrix multiplication.

Part 18: Performance Comparison

import time # Large arrays size = 1000000 a = np.random.random(size) b = np.random.random(size) # Time NumPy start = time.time() result = a + b numpy_time = time.time() - start print(f"NumPy time: {numpy_time:.4f} seconds") # Typically 100x faster than pure Python!

Why faster: NumPy uses optimized C code under the hood.

Part 19: Common Patterns

# Generate data x = np.linspace(0, 10, 100) # 100 points from 0 to 10 y = np.sin(x) # Sine wave  # Find peaks peaks = y[y > 0.9] # Normalize data normalized = (y - np.mean(y)) / np.std(y)

Real-world: Data generation, filtering, and preprocessing.

Part 20: Advanced Indexing

arr = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]) # Fancy indexing rows = [0, 2] cols = [1, 2] result = arr[rows, cols] # [2 9] - elements at (0,1) and (2,2)  # Boolean indexing with conditions mask = (arr > 3) & (arr < 8) # Multiple conditions filtered = arr[mask] # [4 5 6 7]

Power: Extract complex patterns from data with simple syntax.

Part 21: Array Sorting

data = np.array([3, 1, 4, 1, 5, 9, 2, 6]) # Sort array sorted_data = np.sort(data) # [1 1 2 3 4 5 6 9]  # Get sort indices indices = np.argsort(data) # [1 3 6 0 2 7 4 8]  # Sort 2D array matrix = np.array([[3, 1], [4, 2]]) sorted_matrix = np.sort(matrix, axis=1) # Sort each row

Use case: Order data for analysis or find top/bottom values.

Part 22: Working with NaN

data = np.array([1, 2, np.nan, 4, 5]) # Check for NaN has_nan = np.isnan(data) # [False False True False False]  # Remove NaN clean_data = data[~np.isnan(data)] # [1. 2. 4. 5.]  # NaN-aware functions mean_ignore_nan = np.nanmean(data) # 3.0

Real data: Often contains missing values - NumPy handles them gracefully.

Part 23: Array Memory and Views

arr = np.array([1, 2, 3, 4, 5]) # Slicing creates a view (shares memory) view = arr[1:4] view[0] = 999 print(arr) # [1 999 3 4 5] - original changed!  # Copy creates new array copy = arr.copy() copy[0] = 777 print(arr) # [1 999 3 4 5] - original unchanged

Memory efficiency: Views save memory, copies ensure independence.

Part 24: Practical Example - Data Analysis

# Simulate temperature data days = 30 temperatures = np.random.normal(25, 5, days) # Mean 25°C, std 5°C  # Analysis avg_temp = np.mean(temperatures) hot_days = np.sum(temperatures > 30) cold_days = np.sum(temperatures < 20) temp_range = np.max(temperatures) - np.min(temperatures) print(f"Average: {avg_temp:.1f}°C") print(f"Hot days (>30°C): {hot_days}") print(f"Cold days (<20°C): {cold_days}") print(f"Temperature range: {temp_range:.1f}°C")

Real application: Weather data analysis with just a few lines.

Part 25: Image Processing Example

# Create a simple "image" (2D array) image = np.random.randint(0, 256, (100, 100)) # 100x100 grayscale  # Basic operations bright_image = image + 50 # Brighten dark_image = image * 0.5 # Darken threshold = image > 128 # Binary threshold  # Image statistics print(f"Average brightness: {np.mean(image):.1f}") print(f"Bright pixels: {np.sum(image > 200)}")

Application: Images are just arrays of numbers - perfect for NumPy.

Part 26: Scientific Computing

# Simulate a simple physics problem time = np.linspace(0, 10, 1000) # Time from 0 to 10 seconds gravity = 9.81 # m/s² initial_velocity = 50 # m/s  # Calculate position (physics equation) position = initial_velocity * time - 0.5 * gravity * time**2 # Find maximum height max_height = np.max(position) max_time = time[np.argmax(position)] print(f"Maximum height: {max_height:.1f}m at {max_time:.1f}s")

Power: Solve complex scientific problems with vectorized operations.

Part 27: Performance Tips

# Avoid Python loops # BAD: result = [] for x in large_array: result.append(x**2) # GOOD: result = large_array**2 # Use built-in functions # BAD: total = 0 for x in large_array: total += x # GOOD: total = np.sum(large_array)

Golden rule: If you're writing a loop, there's probably a NumPy function for it.

Part 28: Common Mistakes

# Mistake 1: Creating arrays in loops # BAD: arr = np.array([]) for i in range(1000): arr = np.append(arr, i) # Slow!  # GOOD: arr = np.arange(1000) # Fast!  # Mistake 2: Not using vectorization # BAD: result = np.zeros(len(arr)) for i in range(len(arr)): result[i] = arr[i] * 2 # GOOD: result = arr * 2

Efficiency: Pre-allocate arrays and use vectorized operations.

Part 29: Next Steps

# What you can do with NumPy: # 1. Data analysis (pandas builds on NumPy) # 2. Machine learning (scikit-learn uses NumPy) # 3. Image processing (OpenCV, PIL) # 4. Scientific computing (SciPy) # 5. Deep learning (TensorFlow, PyTorch)  # Example: Linear regression in one line X = np.random.random((100, 2)) y = np.random.random(100) weights = np.linalg.lstsq(X, y, rcond=None)[0]

Foundation: NumPy is the base for the entire Python scientific ecosystem.

Key Takeaways

Arrays > Lists: Faster, more memory efficient for numerical data
Vectorization: Apply operations to entire arrays at once
Broadcasting: Automatically handle different array shapes
Boolean indexing: Filter data with conditions
No loops: NumPy functions are optimized - use them!
Shape matters: Understanding dimensions is crucial
Memory views: Slicing shares memory, copying creates new arrays

Practice Challenge

# Create a 10x10 matrix of random numbers # Find all numbers greater than 0.5 # Calculate their average # Replace numbers less than 0.3 with 0  matrix = np.random.random((10, 10)) mask = matrix > 0.5 high_values = matrix[mask] average = np.mean(high_values) matrix[matrix < 0.3] = 0 print(f"Found {len(high_values)} values > 0.5") print(f"Their average: {average:.3f}")

Master these concepts and you'll have a solid foundation for data science, machine learning, and scientific computing!

DEV Community