KTH Logo

Parallel programming in Python: mpi4py (part 2)

In part 1 of this post, we introduced the mpi4py module (MPI for Python) which provides an object-oriented interface for Python resembling the message passing interface (MPI) and enables Python programs to exploit multiple processors on multiple compute nodes.

The mpi4py module provides methods for communicating various types of Python objects in different ways. In part 1 of this post we showed you how to communicate generic Python objects between MPI processes – the methods for doing this have names that are all lowercase letters. Some of these methods were introduced in part 1 of this post. It is also possible to directly send buffer-like objects, where the data is exposed in a raw format and can be accessed without copying, between MPI processes. The methods for doing this start with an uppercase letter.

In this post we continue introducing the mpi4py module, with a focus on the direct communication of buffer-like objects using the latter type of methods (that is, those starting with a capital letter), including Send, Recv, Isend, Irecv, Bcast, and Reduce, as well as Scatterv and Gatherv, which are vector variants of Scatter and Gather, respectively.

(more…)

Parallel programming in Python: mpi4py (part 1)

In previous posts we have introduced the multiprocessing module which makes it possible to parallelize Python programs on shared memory systems. The limitation of the multiprocessing module is that it does not support parallelization over multiple compute nodes (i.e. on distributed memory systems). To overcome this limitation and enable cross-node parallelization, we can use MPI for Python, that is, the mpi4py module. This module provides an object-oriented interface that resembles the message passing interface (MPI),  and hence allows Python programs to exploit multiple processors on multiple compute nodes. The mpi4py module supports both point-to-point and collective communications for Python objects as well as buffer-like objects. This post will briefly introduce the use of the mpi4py module in communicating generic Python objects, via all-lowercase methods including send, recv, isend, irecv, bcast, scatter, gather, and reduce.

(more…)

Parallel programming in Python: multiprocessing (part 2)

In the previous post we introduced the Pool class of the multiprocessing module. In this post we continue on and introduce the Process class, which makes it possible to have direct control over individual processes.

A process can be created by providing a target function and its input arguments to the Process constructor. The process can then be started with the start method and ended using the join method. Below is a very simple example that prints the square of a number.

import multiprocessing as mp

def square(x):
    print(x * x)

p = mp.Process(target=square, args=(5,))
p.start()
p.join()

(more…)

Parallel programming in Python: multiprocessing (part 1)

Parallel programming solves big numerical problems by dividing them into smaller sub-tasks, and hence reduces the overall computational time on multi-processor and/or multi-core machines. Parallel programming is well supported in traditional programming languages like C and FORTRAN, which are suitable for “heavy-duty” computational tasks. Traditionally, Python is considered to not support parallel programming very well, partly because of the global interpreter lock (GIL). However, things have changed over time. Thanks to the development of a rich variety of libraries and packages, the support for parallel programming in Python is now much better.

This post (and the following part) will briefly introduce the multiprocessing module in Python, which effectively side-steps the GIL by using subprocesses instead of threads. The multiprocessing module provides many useful features and is very suitable for symmetric multiprocessing (SMP) and shared memory systems. In this post we focus on the Pool class of the multiprocessing module, which controls a pool of worker processes and supports both synchronous and asynchronous parallel execution.

(more…)

Scalability: strong and weak scaling

High performance computing (HPC) clusters are able to solve big problems using a large number of processors. This is also known as parallel computing, where many processors work simultaneously to produce exceptional computational power and to significantly reduce the total computational time. In such scenarios, scalability or scaling is widely used to indicate the ability of hardware and software to deliver greater computational power when the amount of resources is increased. For HPC clusters, it is important that they are scalable, in other words that the capacity of the whole system can be proportionally increased by adding more hardware. For software, scalability is sometimes referred to as parallelization efficiency — the ratio between the actual speedup and the ideal speedup obtained when using a certain number of processors.

In this post we focus on software scalability and discuss two common types of scaling. The speedup in parallel computing can be straightforwardly defined as

speedup = t1 / tN

where t1 is the computational time for running the software using one processor, and tN is the computational time running the same software with N processors. Ideally, we would like software to have a linear speedup that is equal to the number of processors (speedup = N), as that would mean that every processor would be contributing 100% of its computational power. Unfortunately, this is a very challenging goal for real applications to attain.

(more…)