The Message Passing Interface (MPI) allows parallel applications to communicate between processes using message passing. MPI programs initialize and finalize a communication environment, and most communication occurs through point-to-point send and receive operations between processes. Collective communication routines like broadcast, scatter, and gather allow all processes to participate in the communication.
Overview of MPI (Message Passing Interface) and its importance in parallel computing.
Describes the message passing model and its four primary operation classes: Environment Management, Data Movement, Collective Computation, and Synchronization.
Explains the basic structure of an MPI program including including necessary headers and initializing/terminating the MPI environment.
Details routines for managing the MPI environment, e.g., initializing, finalizing, getting processor name, and timing functions.
Describes communication concepts in MPI, including communicators, ranks, and methods to determine size and rank of MPI processes.
Illustration of a basic HelloWorld MPI program showcasing the use of MPI_Init, MPI_Comm_size, and MPI_Comm_rank.
Covers point-to-point communication in MPI, including send/receive operations, message composition, and the importance of message IDs.
Differences between blocking and non-blocking send/receive operations, emphasizing use cases and handling tasks during communication.
Details about the MPI send and receive functions, including parameters and type compatibility in communication.
Explains how to compute π using numerical integration across multiple processes with MPI, including code structure and communication.
Describes collective communication concepts like broadcast, scatter, gather, and reduction operations among multiple processes.
Usage of collective operations for efficient data distribution at program beginning and results collection at the end.
Explains MPI barrier synchronization, in which a process waits until all processes reach the same point.
Continuation of the PI computation code, detailing summation, reduction, and processing across multiple processors.
Describes the programming environment necessary for MPI, including compilers and software components.
Overview of various MPI implementations and specifics of the Cray MPI-3.0 standard optimized for their hardware.
Instructions on compiling MPI programs with command specifics for C, C++, and Fortran.
Describes the execution modes for MPI programs: interactive and batch mode with scripting examples.
Discusses error handling techniques in MPI, including behavior on errors and possible overrides.
The concluding slide thanking the audience for their attention.
The Message PassingModel Applications that do not share a global address space need a Message Passing Framework. An application passes messages among processes in order to perform a task. Almost any parallel application can be expressed with the message passing model. Four classes of operations: Environment Management Data movement/ Communication Collective computation/communication Synchronization
3.
General MPI ProgramStructure Header File – include "mpi.h" – include 'mpif.h' Initialize MPI Env. – MPI_Init(..) Terminate MPI Env. – MPI_Finalize()
4.
#include "mpi.h" #include <stdio.h> intmain( int argc, char *argv[] ) { MPI_Init( &argc, &argv ); printf( "Hello, world!n" ); MPI_Finalize(); return 0; } Header File – include "mpi.h" – include 'mpif.h' Initialize MPI Env. – MPI_Init(..) Terminate MPI Env. – MPI_Finalize() General MPI Program Structure
5.
Environment Management Routines Groupof Routines used for interrogating and setting the MPI execution environment. MPI_Init Initializes the MPI execution environment. This function must be called in every MPI program MPI_Finalize Terminates the MPI execution environment. This function should be the last MPI routine called in every MPI program - no other MPI routines may be called after it.
6.
MPI_Get_processor_name Returns the processorname. Also returns the length of the name. The buffer for "name" must be at least MPI_MAX_PROCESSOR_NAME characters in size. What is returned into "name" is implementation dependent - may not be the same as the output of the "hostname" or "host" shell commands. MPI_Get_processor_name (&name,&resultlength) MPI_Wtime Returns an elapsed wall clock time in seconds (double precision) on the calling processor. MPI_Wtime () Environment Management Routines
7.
Communication Communicator : AllMPI communication occurs within a group of processes. Rank : Each process in the group has a unique identifier Size: Number of processes in a group or communicator The Default/ pre-defined communicator is the MPI_COMM_WORLD which is a group of all processes.
8.
Environment / Communication MPI_Comm_size Returnsthe total number of MPI processes in the specified communicator, such as MPI_COMM_WORLD. If the communicator is MPI_COMM_WORLD, then it represents the number of MPI tasks available to your application. MPI_Comm_size (comm,&size) MPI_Comm_rank Returns the rank of the calling MPI process within the specified communicator. Initially, each process will be assigned a unique integer rank between 0 and number of tasks - 1 within the communicator MPI_COMM_WORLD. This rank is often referred to as a task ID. If a process becomes associated with other communicators, it will have a unique rank within each of these as well. MPI_Comm_rank (comm,&rank)
9.
MPI – HelloWorldExample #include <mpi.h> #include<iostream.h> int main(int argc, char **argv) { int rank; int size; MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &size); MPI_Comm_rank(MPI_COMM_WORLD, &rank); cout << "Hello, I’m process " << rank << " of " << size << endl; MPI_Finalize(); return 0; } MPI Init(int *argc, char ***argv); MPI_init(NULL,NULL) Hello, I’m process 0 of 3 Hello, I’m process 2 of 3 Hello, I’m process 1 of 3 Not necessarily sorted!
10.
Communication Point-to-point communications :Transfer message from one process to another process ❖ It involves an explicit send and receive, which is called two-sided communication. ❖ Message: data + (source + destination + communicator ) ❖ Almost all of the MPI commands are built around point-to-point operations.
11.
MPI Send andReceive The foundation of communication is built upon send and receive operations among processes. Almost every single function in MPI can be implemented with basic send and receive calls. 1. process A decides a message needs to be sent to process B. 2. Process A then packs up all of its necessary data into a buffer for process B. 3. These buffers are often referred to as envelopes since the data is being packed into a single message before transmission. 4. After the data is packed into a buffer, the communication device (which is often a network) is responsible for routing the message to the proper location. 5. Location identifier is the rank of the process
12.
MPI Send andReceive 6. Send and Recv has to occur in pairs and are Blocking functions. 7. Even though the message is routed to B, process B still has to acknowledge that it wants to receive A’s data. Once it does this, the data has been transmitted. Process A is acknowledged that the data has been transmitted and may go back to work. (Blocking) 8. Sometimes there are cases when A might have to send many different types of messages to B. Instead of B having to go through extra measures to differentiate all these messages. 9. MPI allows senders and receivers to also specify message IDs with the message (known as tags). 10. When process B only requests a message with a certain tag number, messages with different tags will be buffered by the network until B is ready for them.
More MPI Concepts Blocking:blocking send or receive routines does not return until operation is complete. --blocking sends ensure that it is safe to overwrite the sent data --blocking receives ensure that the data has arrived and is ready for use Non-blocking: Non-blocking send or receive routines returns immediately, with no information about completion. -- User should test for success or failure of communication. -- In between, the process is free to handle other tasks. -- It is less likely to form deadlocking code -- It is used with MPI_Wait() or MPI_Test()
16.
MPI Send andReceive MPI_Send ( &data, count, MPI_INT, 1, tag, comm); Parses memory based on the starting address, size, and count based on contiguous data MPI_Recv (void* data, int count, MPI_Datatype datatype, int source, int tag, MPI_Comm communicator, MPI_Status* status); Address of data Number of Elements Data Type Destination (Rank) Message Identifier (int) Communicator
MPI Datatypes MPIdatatype C equivalent MPI_SHORT short int MPI_INT int MPI_LONG long int MPI_LONG_LONG long long int MPI_UNSIGNED_CHAR unsigned char MPI_UNSIGNED_SHORT unsigned short int MPI_UNSIGNED unsigned int MPI_UNSIGNED_LONG unsigned long int MPI_UNSIGNED_LONG_L ONG unsigned long long int MPI_FLOAT float MPI_DOUBLE double MPI_LONG_DOUBLE long double MPI_BYTE char ➢ MPI predefines its primitive data types ➢ Primitive data types are contiguous. ➢ MPI also provides facilities for you to define your own data structures based upon sequences of the MPI primitive data types. ➢ Such user defined structures are called derived data types
19.
Compute pi byNumerical Integration N processes (0,1….. N-1) Master Process: Process 0 Divide the computational task into N portions and each processor will compute its own (partial) sum. Then at the end, the master (processor 0) collects all (partial) sums and forms a total sum. Basic set of MPI functions used ◦ Init ◦ Finalize ◦ Send ◦ Recv ◦ Comm Size ◦ Rank
20.
MPI_Init(&argc,&argv); // Initialize MPI_Comm_size(MPI_COMM_WORLD,&num_procs); // Get # processors MPI_Comm_rank(MPI_COMM_WORLD, &myid); N = # intervals used to do the integration... w = 1.0/(double) N; mypi = 0.0; // My partial sum (from a MPI processor) Compute my part of the partial sum based on 1. myid 2. num_procs if ( I am the master of the group ) { for ( i = 1; i < num_procs; i++) { receive the partial sum from MPI processor i; Add partial sum to my own partial sum; } Print final total; } else { Send my partial sum to the master of the MPI group; } MPI_Finalize();
21.
int main(int argc,char *argv[]) { int N; // Number of intervals double w, x; // width and x point int i, myid; double mypi, others_pi; MPI_Init(&argc,&argv); // Initialize // Get # processors MPI_Comm_size(MPI_COMM_WORLD, &num_procs); MPI_Comm_rank(MPI_COMM_WORLD, &myid); N = atoi(argv[1]); w = 1.0/(double) N; mypi = 0.0; //Each MPI Process has its own copy of every variable Compute PI by Numerical Integration C code
22.
/* -------------------------------------------------------------------------------- Every MPIprocess computes a partial sum for the integral ------------------------------------------------------------------------------------ */ for (i = myid; i < N; i = i + num_procs) { x = w*(i + 0.5); mypi = mypi + w*f(x); } P = total number of Processes, N in the total number of rectangles Process 0 computes the sum of f(w *(0.5)), f(w*(P+0.5)), f(w*(2P+0.5))…… Process 1 computes the sum of f(w *(1.5)), f(w*(P+1.5)), f(w*(2P+1.5))…… Process 2 computes the sum of f(w *(2.5)), f(w*(P+2.5)), f(w*(2P+2.5))…… Process 3 computes the sum of f(w *(3.5)), f(w*(P+3.5)), f(w*(2P+3.5))…… Process 4 computes the sum of f(w *(4.5)), f(w*(P+4.5)), f(w*(2P+4.5))…… Compute PI by Numerical Integration C code
23.
if ( myid== 0 ) //Now put the sum together... { // Proc 0 collects and others send data to proc 0 for (i = 1; i < num_procs; i++) { MPI_Recv(&others_pi, 1, MPI_DOUBLE, i, 0, MPI_COMM_WORLD, NULL); mypi += others_pi; } cout << "Pi = " << mypi<< endl << endl; // Output... } else { //The other processors send their partial sum to processor 0 MPI_Send(&mypi, 1, MPI_DOUBLE, 0, 0, MPI_COMM_WORLD); } MPI_Finalize(); } Compute PI by Numerical Integration C code
24.
Collective Communication Communications thatinvolve all processes in a group. One to all ♦ Broadcast ♦ Scatter (personalized) All to one ♦ Gather All to all ♦ Allgather ♦ Alltoall (personalized) “Personalized” means each process gets different data
25.
Collective Communication In acollective operation, processes must reach the same point in the program code in order for the communication to begin. The call to the collective function is blocking.
26.
Broadcast: Root Process sendsthe same piece of data to all Processes in a communicator group. Scatter: Takes an array of elements and distributes the elements in the order of process rank. Collective Communication
27.
Gather: Takes elements frommany processes and gathers them to one single process. This routine is highly useful to many parallel algorithms, such as parallel sorting and searching Collective Communication
28.
Reduce: Takes an arrayof input elements on each process and returns an array of output elements to the root process. The output elements contain the reduced result. Reduction Operation: Max, Min, Sum, Product, Logical and Bitwise Operations. MPI_Reduce (&local_sum, &global_sum, 1, MPI_FLOAT, MPI_SUM, 0, MPI_COMM_WORLD) Collective Computation
All Gather: Justlike MPI_Gather, the elements from each process are gathered in order of their rank, except this time the elements are gathered to all processes All to All: Extension to MPI_Allgather. The jth block from process i is received by process j and stored in the i-th block. Useful in applications like matrix transposes or FFTs
31.
If one processreads data from disc or the command line, it can use a broadcast or a gather to get the information to other processes. Likewise, at the end of a program run, a gather or reduction can be used to collect summary information about the program run. However, a more common scenario is that the result of a collective is needed on all processes. Consider the computation of the standard deviation : Assume that every processor stores just one Xi value You can compute μ by doing a reduction followed by a broadcast. It is better to use a so-called allreduce operation, which does the reduction and leaves the result on all processors. Collectives: Use
32.
MPI_Barrier(MPI_Comm comm) Provides theability to block the calling process until all processes in the communicator have reached this routine. #include "mpi.h" #include int main(int argc, char *argv[]) { int rank, nprocs; MPI_Init(&argc,&argv); MPI_Comm_size(MPI_COMM_WORLD,&nprocs); MPI_Comm_rank(MPI_COMM_WORLD,&rank); MPI_Barrier(MPI_COMM_WORLD); printf("Hello, world. I am %d of %dn", rank, procs); fflush(stdout); MPI_Finalize(); return 0; } Synchronization
h = 1.0/ (double) n; sum = 0.0; for (i = myid + 1; i <= n; i += numprocs) { x = h * ((double)i - 0.5); sum += f(x); } mypi = h * sum; MPI_Reduce(&mypi, &pi, 1, MPI_DOUBLE, MPI_SUM, 0, MPI_COMM_WORLD); if (myid == 0){ printf("pi is approximately %.16fn", pi ); endwtime = MPI_Wtime(); printf("wall clock time = %fn", endwtime-startwtime); } MPI_Finalize() Compute PI by Numerical Integration C code
35.
Programming Environment A ProgrammingEnvironment (PrgEnv) Set of related software components like compilers, scientific software libraries, implementations of parallel programming paradigms, batch job schedulers, and other third- party tools, all of which cooperate with each other. Current Environments on Cray PrgEnv-cray, PrgEnv-gnu and PrgEnv-intel
36.
Implementations of MPI Examplesof Different Implementations ▪ MPICH - developed by Argonne National Labs (free) ▪ MPI/LAM - developed by Indiana, OSC, Notre Dame (free) ▪ MPI/Pro - commercial product ▪ Apple's X Grid ▪ OpenMPI CRAY XC40 provides an implementation of the MPI-3.0 standard via the Cray Message Passing Toolkit (MPT), which is based on the MPICH 3 library and optimised for the Cray Aries interconnect. All Programming Environments (PrgEnv-cray, PrgEnv-gnu and PrgEnv-intel) can utilize the MPI library that is implemented by Cray
37.
Compiling an MPIprogram Depends upon the implementation of MPI Some Standard implementations : MPICH/ OPENMPI Language Wrapper Compiler Name C mpicc C++ mpicxx or mpic++ Fortran mpifort (for v1.7 and above) mpif77 and mpif90 (for older versions)
38.
Running an MPIprogram Execution mode: Interactive Mode : mpirun -np <#Number of Processors> <name_of_executable> Batch Mode: Using a job script (details on the SERC webpage) #!/bin/csh #PBS -N jobname #PBS -l nodes=1: ppn=16 #PBS -l walltime=1:00:00 #PBS -e /path_of_executable/error.log cd /path_of_executable NPROCS=`wc -l < $PBS_NODEFILE` HOSTS=`cat $PBS_NODEFILE | uniq | tr 'n' "," | sed 's|,$||'` mpirun -np $NPROCS --host $HOSTS /name_of_executable
39.
Error Handling Most MPIroutines include a return/error code parameter. However, according to the MPI standard, the default behavior of an MPI call is to abort if there is an error. You will probably not be able to capture a return/error code other than MPI_SUCCESS (zero). The standard does provide a means to override this default error handler. Consult the error handling section of the relevant MPI Standard documentation located at http://www.mpi-forum.org/docs/. The types of errors displayed to the user are implementation dependent.