Monday, January 20, 2025
HomeProgrammingHow To Do Parallel Programming In Python?

How To Do Parallel Programming In Python?

In Python, parallel programming allows you to run multiple tasks concurrently, making efficient use of multi-core processors. There are several libraries and techniques to achieve parallelism, each suited for different use cases.

1. Using the multiprocessing Module

The multiprocessing module is one of the most common ways to perform parallel programming in Python. It bypasses the Global Interpreter Lock (GIL) and utilizes multiple CPU cores.

Basic Example of Parallelism with multiprocessing:

import multiprocessing

# Function to be run in parallel
def worker(number):
    print(f"Worker {number}")

if __name__ == "__main__":
    # Create multiple processes
    processes = []
    for i in range(5):  # Creating 5 worker processes
        p = multiprocessing.Process(target=worker, args=(i,))
        processes.append(p)
        p.start()

    # Wait for all processes to complete
    for p in processes:
        p.join()

2. Using Pool for Process Pooling

If you need to run the same function multiple times with different inputs, multiprocessing.Pool provides a convenient way to manage a pool of processes.

Example with Pool:

import multiprocessing

# Function to be run in parallel
def square(n):
    return n * n

if __name__ == "__main__":
    # Create a Pool of 4 workers
    with multiprocessing.Pool(4) as pool:
        results = pool.map(square, [1, 2, 3, 4, 5])
    print(results)  # Output: [1, 4, 9, 16, 25]
  • pool.map applies the function to each item in the list in parallel.
See also  Object Class in Java

3. Using the concurrent.futures Module

The concurrent.futures module provides a high-level interface for asynchronous execution of tasks using either threads (ThreadPoolExecutor) or processes (ProcessPoolExecutor).

Example with ThreadPoolExecutor (for I/O-bound tasks):

from concurrent.futures import ThreadPoolExecutor

def fetch_data(url):
    print(f"Fetching data from {url}")
    # Simulate an I/O task
    return f"Data from {url}"

if __name__ == "__main__":
    urls = ['url1', 'url2', 'url3', 'url4']

    # Create a ThreadPoolExecutor with 4 threads
    with ThreadPoolExecutor(max_workers=4) as executor:
        results = list(executor.map(fetch_data, urls))

    print(results)
  • Use ThreadPoolExecutor for I/O-bound tasks (e.g., network requests, file I/O).

Example with ProcessPoolExecutor (for CPU-bound tasks):

from concurrent.futures import ProcessPoolExecutor

def compute_square(n):
    return n * n

if __name__ == "__main__":
    numbers = [1, 2, 3, 4, 5]

    # Create a ProcessPoolExecutor with 4 processes
    with ProcessPoolExecutor(max_workers=4) as executor:
        results = list(executor.map(compute_square, numbers))

    print(results)  # Output: [1, 4, 9, 16, 25]
  • Use ProcessPoolExecutor for CPU-bound tasks (e.g., mathematical calculations, data processing).
See also  Understanding the strchr() Function in C

4. Using asyncio for Asynchronous I/O

asyncio is designed for asynchronous programming and works well for I/O-bound tasks. It doesn’t provide true parallelism (it doesn’t use multiple cores), but it allows concurrent I/O operations in a single thread.

Example with asyncio:

import asyncio

async def fetch_data(url):
    print(f"Fetching data from {url}")
    await asyncio.sleep(1)  # Simulating I/O-bound task
    return f"Data from {url}"

async def main():
    urls = ['url1', 'url2', 'url3', 'url4']
    tasks = [fetch_data(url) for url in urls]
    results = await asyncio.gather(*tasks)
    print(results)

if __name__ == "__main__":
    asyncio.run(main())
  • asyncio uses an event loop to handle concurrent tasks. It is more lightweight than threading or multiprocessing and ideal for tasks like web scraping, API requests, etc.

5. Using joblib for Parallelism

joblib is another library often used for parallel processing, especially in scientific computing, machine learning, or heavy computations.

Example with joblib:

from joblib import Parallel, delayed

def square(n):
    return n * n

if __name__ == "__main__":
    results = Parallel(n_jobs=4)(delayed(square)(i) for i in range(5))
    print(results)
  • joblib is easier to use when dealing with large-scale parallel tasks that can be distributed over multiple cores.
See also  How to compare two lists in Python?

6. When to Use Each Approach

  • multiprocessing: Use for CPU-bound tasks requiring multiple processes and the ability to bypass the GIL.
  • concurrent.futures: Use for both thread-based or process-based parallelism with a simple API.
  • asyncio: Use for I/O-bound tasks, especially when you need to handle many concurrent tasks (like HTTP requests) without using threads or processes.
  • joblib: Use for parallelizing heavy computations or processing large datasets.

 

Library Best For
multiprocessing CPU-bound tasks, utilizing multiple cores
concurrent.futures Simplified parallel execution (threads or processes)
asyncio Asynchronous I/O tasks (non-blocking)
joblib Parallel tasks in scientific computing or data processing

 

RELATED ARTICLES
0 0 votes
Article Rating

Leave a Reply

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
- Advertisment -

Most Popular

Recent Comments

0
Would love your thoughts, please comment.x
()
x