Basic MPI


Overview

MPI (Message Passing Interface) is a standard interface for the message passing paradigm of parallel computing. This is a model of cooperative processes, working on separate data spaces and interchanging messages when they need to share or communicate data. The model implies an active role on both the part of the sending process and the receiving process.

The machine model underlying MPI is that of a multicomputer: a distributed memory machine with processors tied together via some kind of interconnection fabric. The multicomputer model ignores the underlying topology of the fabric, and presumes there are two types of memory accesses: local and distant. However, it is possible to impose a topology model on MPI, an advanced topic reserved for later.

Although MPI has an underlying distributed memory model, it can be used for

  1. Distributed memory machines
  2. Shared memory machines
  3. Arrays of SMPs ("clusters")
  4. Networks of workstations
  5. Heterogeneous networks of machines
This is because the logical programming model does not have to match the physical machine architecture. Generally you can get more efficient code if your programming model matches the machine's physical hardware, but the advantage of MPI is that by aiming at the absolute minimal assumption about the hardware, your code will be portable across the wide variety of actual systems listed above.

Ancient history

In the 1980's, distributed memory machine manufacturers all developed message passing libraries for their machines - it was the only effective way to use them. However, each company had its own model of message passing and different bindings to standard languages such as C and Fortran. Codes running on one machine could only be ported to another with great effort, and often it was not just a matter of translating one message passing function call into another since the underlying models differed. However, some general forms emerged. Most had send-receive primitives for point-to-point communication where one processor sends a message and another receives it. (The other flavor of communication is collective: an entire set of processors is involved, e.g. in a broadcast message from a processor to all other processors).

A send primitive typically had the format send(address, length, destination, id) . Here

The receive operation was similar: receive(address, buffsize, source, id, length) , where This approach can handle virtually all that is needed by a distributed memory model of computing, but has shortcomings. Messages of noncontiguous data and sophisticated data objects also need to be communicated. Partitioning the processes into groups that work on heterogenous parts of the code are needed in some applications. Finally, for extremely large scale computations you want to tie together machines that are different architectures, with possibly different internal binary representations.

The few portable message passing systems were mostly university or national lab research projects, and were incomplete, lacked vendor support, and were inefficient since they introduced another layer on top of the machines' "native" message passing libraries. Out of those, the only real survivor is PVM (Parallel Virtual Machine), partly because it was the first that tried to get large scale vendor support.

Appearance of MPI

MPI is an industry standard, prompted by recognition on the part of parallel system purchasers that code development was not cost effective on those machines. Typically it took three or more years to port and validate a major engineering or scientific code - but the parallel systems were outdated every two years.

Basic MPI Concepts

A MPI program consists of multiple processes, each with its own address space. Each process runs the same program (SPMD model), but has a unique number that identifies it. If there are p MPI processes participating in a single program, a process's identifying number is an integer between 0 and p-1 and is called its "rank". In the statement of most SPMD algorithms, there are lines like
   if (myid == 0) { ... }
     else { ... }
which is read "if my process identification number is zero, then do the following; otherwise, do something else". This is how MIMD programs are built on a SPMD model - each runs the same program, which branches dependent on the process's ID number. In MPI, that ID number is called its rank. Note that most distributed memory machines can be run directly in MIMD mode with each processor actually running a completely different program. However, SPMD is the model most often used to emulate MIMD actions. The reasons are psychological - it is easier to have a single source code to write and examine.

Each message in MPI consists of the data sent and a header . The header contains

  1. The rank of the sender
  2. The rank of the receiver
  3. A message identifier number called its tag
  4. A communicator identification
The MPI standard guarantees that the integers 0-32767 can be used as valid tag numbers, but most implementations allow far more. One basic concept in MPI is that of a communicator group : a set of MPI processes that are grouped together in working on a problem, which can send messages to each other. For the start, we will use the default communicator group MPI_COMM_WORLD , which sets a single context and involves all the processes running. This is a predefined communicator, of type MPI_Comm . Later we will cover more details about the concepts of communicators and contexts. But to get an idea of why different communicator groups may be needed in a single program, consider what happens if we are running an MPI program that calls a math library - which was also built to use parallelism via MPI. To keep process number 3 in our program getting confused with process number 3 as defined by the library, we need to have an additional identifier to distinguish them (and keep one from receiving a message intended for the other). This additional identification is the communicator group.

Basic MPI Functions

Although MPI has over 120 different functions that can be invoked, all parallel programs can be built using just six:
  1. MPI_INIT() initializes MPI in a program.
  2. MPI_COMM_SIZE() returns the number of cooperating processes.
  3. MPI_COMM_RANK() returns the process identifier for the process that invokes it.
  4. MPI_SEND() sends a message.
  5. MPI_RECV() receives a message.
  6. MPI_FINALIZE() cleans up and terminates MPI
For our purposes, there are a few more that are useful right from the start:
  1. MPI_BCAST(): send a message from one processor to all the others in the specified communicator group.
  2. MPI_ALLREDUCE(): perform a reduction operation , and make the reduced scalar available to all participating processes.
The last one is useful for most dotproducts, since it is typically the case that the resulting scalar is needed by all the processors. For performance evaluation, we can also use:
  1. MPI_WTIME(): returns a double that gives the number of seconds since either the beginning of the program, or 1 January 1970.
  2. MPI_WTICK(): returns a float that gives the clock resolution.
We will not use all of them immediately. Here are some details about the ones we need right away; this is for the C language versions. The last two are the most basic send/receive pair, and their required arguments are Note that there is no argument giving the "buffsize" as was mentioned earlier in generic send/receive functions. In MPI, if the sent message is too large to fit into the receiving buffer, it causes either segmentation fault (the best case) or weird corruption of your data.

The final argument of the MPI_Recv function gives information about the message as actually received. It is a C structure with at least three components, the source, the tag, and an error code of type MPI_ERROR. So if, for example, the receive used for the source field the wild-card MPI_ANY_SOURCE, then status->MPI_SOURCE will contain the rank of the process that sent the received message. Note that the MPI_Status variable does not necessarily have a field for the count of data items actually sent; you should use the function MPI_Get_count() for that.


  • Last Modified: Thu 08 Feb 2018, 07:13 AM