Python ships with a
multiprocessing module that allows your code to run functions in parallel by offloading calls to available processors.
In this guide, we will explore the concept of Pools and what a
A Python snippet to play with
Let’s take the following code.
import random, time def calculate_something(i): time.sleep(5) print(random.randint(10, 100)*i) for i in range(5): calculate_something(i)
This function will take about 5*5seconds to complete (25seconds?)
We loop through 5 times and call a function that calculates something for us. We use
time.sleep to pretend like the function is doing more work than it is. This gives us a good reason to look into doing things in parallel.
Multiprocessing is pretty simple. Do all the above, but instead of doing all the operations on a single process, rather hand off each one to somewhere that can do it simultaneously.
import random, time, multiprocessing def calculate_something(i): time.sleep(5) print(random.randint(10, 100)*i) processes =  for i in range(5): p = multiprocessing.Process(target=calculate_something, args=(i,)) processes.append(p) p.start() for j in range(len(processes)): processes[j].join()
Now they will all run in parallel, the whole thing will complete in around 5seconds.
But what if you had 1000 items in your loop? ..and only 4 processors on your machine?
This is where Pools shine.
Multiprocessing was easy, but Pools is even easier!
Let’s convert the above code to use pools:
import random, time, multiprocessing def calculate_something(): time.sleep(5) print(random.randint(10, 100)*i) pool = multiprocessing.Pool(multiprocessing.cpu_count()-1) for i in range(1000): pool.apply_async(calculate_something, args=(i)) pool.close() pool.join()
So what’s actually happening here?
We create a
multiprocessing.Pool() and tell it to use 1 less CPU than we have. The reason for this is to not lock up the machine for other tasks.
So let’s say we have 8 CPUs in total, this means the pool will allocate 7 to be used and it will run the tasks with a max of 7 at a time. The first CPU to complete will take the next task from the queue, and so it will continue until all 1000 tasks have been completed.
Note that: if you only have 2 processors, then you might want to remove the
-1 from the
multiprocessing.cpu_count()-1. Otherwise, it will only do things on a single CPU!