Multiprocessing Pools in Python

Python ships with a multiprocessing module that allows your code to run functions in parallel by offloading calls to available processors.

In this guide, we will explore the concept of Pools and what a Pool in multiprocessing is.

A Python snippet to play with

Let’s take the following code.

import random, time def calculate_something(i): time.sleep(5) print(random.randint(10, 100)*i) for i in range(5): calculate_something(i)
Code language: Python (python)

This function will take about 5*5seconds to complete (25seconds?)

We loop through 5 times and call a function that calculates something for us. We use time.sleep to pretend like the function is doing more work than it is. This gives us a good reason to look into doing things in parallel.

Introducing Multiprocessing

Multiprocessing is pretty simple. Do all the above, but instead of doing all the operations on a single process, rather hand off each one to somewhere that can do it simultaneously.

import random, time, multiprocessing def calculate_something(i): time.sleep(5) print(random.randint(10, 100)*i) processes = [] for i in range(5): p = multiprocessing.Process(target=calculate_something, args=(i,)) processes.append(p) p.start() for j in range(len(processes)): processes[j].join()
Code language: Python (python)

Now they will all run in parallel, the whole thing will complete in around 5seconds.

But what if you had 1000 items in your loop? ..and only 4 processors on your machine?

This is where Pools shine.

Introducing Pools

Multiprocessing was easy, but Pools is even easier!

Let’s convert the above code to use pools:

import random, time, multiprocessing def calculate_something(): time.sleep(5) print(random.randint(10, 100)*i) pool = multiprocessing.Pool(multiprocessing.cpu_count()-1) for i in range(1000): pool.apply_async(calculate_something, args=(i)) pool.close() pool.join()
Code language: Python (python)

So what’s actually happening here?

We create a pool from multiprocessing.Pool() and tell it to use 1 less CPU than we have. The reason for this is to not lock up the machine for other tasks.

So let’s say we have 8 CPUs in total, this means the pool will allocate 7 to be used and it will run the tasks with a max of 7 at a time. The first CPU to complete will take the next task from the queue, and so it will continue until all 1000 tasks have been completed.

Note that: if you only have 2 processors, then you might want to remove the -1 from the multiprocessing.cpu_count()-1. Otherwise, it will only do things on a single CPU!

Tags:
Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments