In Python, parallel programming allows you to run multiple tasks concurrently, making efficient use of multi-core processors. There are several libraries and techniques to achieve parallelism, each suited for different use cases.
1. Using the multiprocessing
Module
The multiprocessing
module is one of the most common ways to perform parallel programming in Python. It bypasses the Global Interpreter Lock (GIL) and utilizes multiple CPU cores.
Basic Example of Parallelism with multiprocessing
:
import multiprocessing
# Function to be run in parallel
def worker(number):
print(f"Worker {number}")
if __name__ == "__main__":
# Create multiple processes
processes = []
for i in range(5): # Creating 5 worker processes
p = multiprocessing.Process(target=worker, args=(i,))
processes.append(p)
p.start()
# Wait for all processes to complete
for p in processes:
p.join()
2. Using Pool
for Process Pooling
If you need to run the same function multiple times with different inputs, multiprocessing.Pool
provides a convenient way to manage a pool of processes.
Example with Pool
:
import multiprocessing
# Function to be run in parallel
def square(n):
return n * n
if __name__ == "__main__":
# Create a Pool of 4 workers
with multiprocessing.Pool(4) as pool:
results = pool.map(square, [1, 2, 3, 4, 5])
print(results) # Output: [1, 4, 9, 16, 25]
pool.map
applies the function to each item in the list in parallel.
3. Using the concurrent.futures
Module
The concurrent.futures
module provides a high-level interface for asynchronous execution of tasks using either threads (ThreadPoolExecutor
) or processes (ProcessPoolExecutor
).
Example with ThreadPoolExecutor
(for I/O-bound tasks):
from concurrent.futures import ThreadPoolExecutor
def fetch_data(url):
print(f"Fetching data from {url}")
# Simulate an I/O task
return f"Data from {url}"
if __name__ == "__main__":
urls = ['url1', 'url2', 'url3', 'url4']
# Create a ThreadPoolExecutor with 4 threads
with ThreadPoolExecutor(max_workers=4) as executor:
results = list(executor.map(fetch_data, urls))
print(results)
- Use
ThreadPoolExecutor
for I/O-bound tasks (e.g., network requests, file I/O).
Example with ProcessPoolExecutor
(for CPU-bound tasks):
from concurrent.futures import ProcessPoolExecutor
def compute_square(n):
return n * n
if __name__ == "__main__":
numbers = [1, 2, 3, 4, 5]
# Create a ProcessPoolExecutor with 4 processes
with ProcessPoolExecutor(max_workers=4) as executor:
results = list(executor.map(compute_square, numbers))
print(results) # Output: [1, 4, 9, 16, 25]
- Use
ProcessPoolExecutor
for CPU-bound tasks (e.g., mathematical calculations, data processing).
4. Using asyncio
for Asynchronous I/O
asyncio
is designed for asynchronous programming and works well for I/O-bound tasks. It doesn’t provide true parallelism (it doesn’t use multiple cores), but it allows concurrent I/O operations in a single thread.
Example with asyncio
:
import asyncio
async def fetch_data(url):
print(f"Fetching data from {url}")
await asyncio.sleep(1) # Simulating I/O-bound task
return f"Data from {url}"
async def main():
urls = ['url1', 'url2', 'url3', 'url4']
tasks = [fetch_data(url) for url in urls]
results = await asyncio.gather(*tasks)
print(results)
if __name__ == "__main__":
asyncio.run(main())
asyncio
uses an event loop to handle concurrent tasks. It is more lightweight than threading or multiprocessing and ideal for tasks like web scraping, API requests, etc.
5. Using joblib
for Parallelism
joblib
is another library often used for parallel processing, especially in scientific computing, machine learning, or heavy computations.
Example with joblib
:
from joblib import Parallel, delayed
def square(n):
return n * n
if __name__ == "__main__":
results = Parallel(n_jobs=4)(delayed(square)(i) for i in range(5))
print(results)
joblib
is easier to use when dealing with large-scale parallel tasks that can be distributed over multiple cores.
6. When to Use Each Approach
multiprocessing
: Use for CPU-bound tasks requiring multiple processes and the ability to bypass the GIL.concurrent.futures
: Use for both thread-based or process-based parallelism with a simple API.asyncio
: Use for I/O-bound tasks, especially when you need to handle many concurrent tasks (like HTTP requests) without using threads or processes.joblib
: Use for parallelizing heavy computations or processing large datasets.
Library | Best For |
---|---|
multiprocessing |
CPU-bound tasks, utilizing multiple cores |
concurrent.futures |
Simplified parallel execution (threads or processes) |
asyncio |
Asynchronous I/O tasks (non-blocking) |
joblib |
Parallel tasks in scientific computing or data processing |