Concurrency and parallelism are two techniques that let you run several programs simultaneously. Python has multiple options for handling tasks concurrently and in parallel, which can be confusing.

Explore the tools and libraries available for properly implementing concurrency and parallelism in Python, and how they differ.

Understanding Concurrency and Parallelism

Concurrency and parallelism refer to two fundamental principles of task execution in computing. Each has its distinct characteristics.

  1. Concurrency is the ability of a program to manage multiple tasks at the same time without necessarily executing them at the exact same time. It revolves around the idea of interleaving tasks, switching between them in a way that appears simultaneous.
    task executed concurrently
    Diagram by Princewill Inyang
  2. Parallelism, on the other hand, involves executing multiple tasks genuinely in parallel. It typically takes advantage of multiple CPU cores or processors. Parallelism achieves true simultaneous execution, letting you perform tasks faster, and is well-suited for computationally intensive operations.
    task executed in parallel
    Diagram by Princewill Inyang

The Importance of Concurrency and Parallelism

The need for concurrency and parallelism in computing cannot be overstated. Here's why these techniques matter:

  1. Resource utilization: Concurrency allows for efficient utilization of system resources, ensuring that tasks are actively making progress rather than idly waiting for external resources.
  2. Responsiveness: Concurrency can improve the responsiveness of applications, especially in scenarios involving user interfaces or web servers.
  3. Performance: Parallelism is crucial for achieving optimal performance, particularly in CPU-bound tasks like complex calculations, data processing, and simulations.
  4. Scalability: Both concurrency and parallelism are essential for building scalable systems.
  5. Future-proofing: As hardware trends continue to favor multicore processors, the ability to harness parallelism will become increasingly necessary.

Concurrency in Python

You can achieve concurrency in Python using threading and asynchronous programming with the asyncio library.

Threading in Python

Threading is a Python concurrency mechanism that allows you to create and manage tasks within a single process. Threads are suitable for certain types of tasks, particularly those that are I/O-bound and can benefit from concurrent execution.

Python's threading module provides a high-level interface for creating and managing threads. While the GIL (Global Interpreter Lock) limits threads in terms of true parallelism, they can still achieve concurrency by interleaving tasks efficiently.

The code below shows an example implementation of concurrency using threads. It uses the Python request library to send an HTTP request, a common I/O blocking task. It also uses the time module to calculate execution time.

 import requests
import time
import threading

urls = [
    'https://www.google.com',
    'https://www.wikipedia.org',
    'https://www.makeuseof.com',
]

# function to request a URL
def download_url(url):
    response = requests.get(url)
    print(f"Downloaded {url} - Status Code: {response.status_code}")


# Execute without threads and measure execution time
start_time = time.time()

for url in urls:
    download_url(url)

end_time = time.time()
print(f"Sequential download took {end_time - start_time:.2f} seconds\n")


# Execute with threads, resetting the time to measure new execution time
start_time = time.time()
threads = []

for url in urls:
    thread = threading.Thread(target=download_url, args=(url,))
    thread.start()
    threads.append(thread)

# Wait for all threads to complete
for thread in threads:
    thread.join()

end_time = time.time()
print(f"Threaded download took {end_time - start_time:.2f} seconds")

Running this program, you should see how much faster the threaded requests are than the sequential requests. Although the difference is just a fraction of a second, you get a clear sense of the performance improvement when using threads for I/O-bound tasks.

executing tasks sequentially and in threads
Screenshot By Princewill Inyang

Asynchronous Programming With Asyncio

asyncio provides an event loop that manages asynchronous tasks called coroutine­s. Coroutine­s are functions that you can pause and resume, making them ideal for I/O-bound tasks. The library is particularly useful for scenarios where tasks involve waiting for external resources, such as network requests.

You can modify the previous request-sending example to work with asyncio:

 import asyncio
import aiohttp
import time

urls = [
    'https://www.google.com',
    'https://www.wikipedia.org',
    'https://www.makeuseof.com',
]

# asynchronous function to request URL
async def download_url(url):
    async with aiohttp.ClientSession() as session:
        async with session.get(url) as response:
            content = await response.text()
            print(f"Downloaded {url} - Status Code: {response.status}")

# Main asynchronous function
async def main():
    # Create a list of tasks to download each URL concurrently
    tasks = [download_url(url) for url in urls]

    # Gather and execute the tasks concurrently
    await asyncio.gather(*tasks)

start_time = time.time()

# Run the main asynchronous function
asyncio.run(main())

end_time = time.time()

print(f"Asyncio download took {end_time - start_time:.2f} seconds")

Using the code, you can download web pages concurrently using asyncio and take advantage of asynchronous I/O operations. This can be more efficient than threading for I/O-bound tasks.

use asyncio to execute task concurrently
Screenshot By Princewill Inyang

Parallelism in Python

You can implement parallelism using Python's multiprocessing module, which allows you to take full advantage of multicore processors.

Multiprocessing in Python

Python's multiprocessing module provides a way to achieve parallelism by creating separate processes, each with its own Python interpreter and memory space. This effectively bypasses the Global Interpreter Lock (GIL), making it suitable for CPU-bound tasks.

 import requests
import multiprocessing
import time

urls = [
    'https://www.google.com',
    'https://www.wikipedia.org',
    'https://www.makeuseof.com',
]

# function to request a URL
def download_url(url):
    response = requests.get(url)
    print(f"Downloaded {url} - Status Code: {response.status_code}")

def main():
    # Create a multiprocessing pool with a specified number of processes
    num_processes = len(urls)
    pool = multiprocessing.Pool(processes=num_processes)

    start_time = time.time()
    pool.map(download_url, urls)
    end_time = time.time()

    # Close the pool and wait for all processes to finish
    pool.close()
    pool.join()

    print(f"Multiprocessing download took {end_time-start_time:.2f} seconds")

main()

In this example, multiprocessing spawns multiple processes, allowing the download_url function to run in parallel.

execute task in parallel
Screenshot By Princewill Inyang

When to Use Concurrency or Parallelism

The choice between concurrency and parallelism depends on the nature of your tasks and the available hardware resources.

You can use concurrency when dealing with I/O-bound tasks, such as reading and writing to files or making network requests, and when memory constraints are a concern.

Use multiprocessing when you have CPU-bound tasks that can benefit from true parallelism and when you have robust isolation between tasks, where one task's failure should not impact others.

Take Advantage of Concurrency and Parallelism

Parallelism and concurrency are effective ways of improving the responsiveness and performance of your Python code. It's important to understand the differences between these concepts and select the most effective strategy.

Python offers the tools and modules you need to make your code more effective through concurrency or parallelism, regardless of whether you're working with CPU-bound or I/O-bound processes.