No Title

CSCI 6356 Fall 2000
Xiannong Meng Programming Assignment One
Assigned: Thursday September 7, 2000 Due: Thursday September 21, 2000

A complete MPI program can be written using just six MPI functions. These functions are MPI_Init(), MPI_Comm_size(), MPI_Comm_rank(), MPI_Send(), MPI_Recv(), and MPI_Finalize(). Use man pages to find out the detailed interface descriptions about these functions. Here is a simple example that the control process sends a message to all other processes. Those that receive the message send the process ID back to the control process.

/* This is a simple MPI program. It uses the six basic MPI functions to make */
/* a complete MPI program. */
/* Classroom demonstration */
/* Xiannong Meng UTPA-CS    */
/* modified sep-5-2000     */

#include <stddef.h>
#include <stdlib.h>
#include "mpi.h"
main(int argc, char **argv )
{
	char message[20];
	int i,rank, id, size, type=99;
	MPI_Status status;

	MPI_Init(&argc, &argv);              /* start MPI */
	MPI_Comm_size(MPI_COMM_WORLD,&size); /* get communicator size */
	MPI_Comm_rank(MPI_COMM_WORLD, &rank);/* get my own rank */
	if(rank == 0) {                      /* parent process  */
	   strcpy(message, "Hello, world");
	   for (i=1; i<size; i++) 
	   MPI_Send(message, strlen(message)+1, MPI_CHAR, i, 
               type, MPI_COMM_WORLD);        /* send message to every one */
	   for (i=1; i<size; i++)            /* try to receive from every */
					     /* one */
	     {
	       MPI_Recv(&id, 1, MPI_INT, i, 
			type, MPI_COMM_WORLD, &status);
	       printf("Id from node %d is %d\n",i,id);
	     }
	} 
  	else                                /* child process */
	  {
	    MPI_Recv(message, 20, MPI_CHAR, 0,
		     type, MPI_COMM_WORLD, &status);
	    MPI_Send(&rank,1,MPI_INT,0,type,MPI_COMM_WORLD);

	  }
      	MPI_Finalize();                     /* clean up MPI */
}

Your exercise is to do the following using the above program as a starting point.

Have the child process send a message back to the control process instead of just an integer. When the control process expects an integer, it knows how many integers will receive. In the example above, it is one. When receiving message, however, it can't tell directly from the message how many characters are in the message. There are two possible ways of dealing with it: one is to send the length of the message as an integer before sending the message; the other is to use the MPI_Get_count() function to get the number of characters received. You should try out both method. Use man pages to find out how MPI_Get_count() works.
As an extra, but more interesting exercise, have the child process send its host name to the parent process rather than just a plain greeting message. To get the host name of a local computer, you need to use the system call, gethostname. Use man page to find out how to use it.
In the above step, the child process sends information directly back to the parent process. In this exercise, write the program such that process i sends a greeting message to process where p is the total number of processes participating the computation. Should process i send its message to process i+1 first and then receive the message from process i-1? Or should it receive first then send? Does it make any difference? What happens when the program is run on one processor?
A key issue in the performance analysis of programs for distributed memory parallel system is the cost of communication relative to the cost of computation. In this third step of the assignment, you are to write an MPI program that estimates the cost of computation and the cost of communication.
We'll consider a ``computation'' to consist of the multiplication of two floats (why not two integers?):
```
float  x, y, z;

z = x * y;
```
This simple assignment statement involves the operations of fetching operands, computing the product, and storing the result.
The cost of communication can be divided into three parts.
1. Start-up time
2. Transmission time
3. Forwarding time
In this exercise, we focus on communication cost between any pair of computers so we will ignore . Start-up time is the duration needed for MPI to set up the communication links. Transmission time is the time needed to send the message to its destination. The most commonly used model of the cost of communication is linear:

Thus we can measure the and in order to get .
We can measure elapsed time using the MPI function
```
   double MPI_Wtime(void);    /* for MPI Wait time */
```
which returns the time in seconds that has elapsed since the last call of the same function. Thus one might measure the computation time as follows.
```
float  x, y, z;
double start, elapsed;

start = MPI_Wtime();
z = x * y;
elapsed = MPI_Wtime() - start;
```
We can find the resolution of MPI_Wtime() by calling the MPI function
```
   double MPI_Wtick(void);    /* for MPI time resolution */
```
If the computation takes less time than the resolution, the MPI_Wtime() returns a zero.
In order to take the timings, you should execute the operations repeatedly and take the average time. For computation timings, you should use arrays of very large size rather than scalars (single variables) to reduce the cache effect.
For the communication timings, you should repeatedly send messages from process 0 to process 1 and back to process 0. That is, the core of the communication timings would look as follows.
```
   if (myRank == 0)
   {
       /* send to process 0 */
       MPI_Send(message, count, MPI_CHAR, 1, 0, MPI_COMM_WORLD);
       /* receive back from process 0 */
       MPI_Recv(message, count, MPI_CHAR, 1, 0, MPI_COMM_WORLD, &status);
   }
   else /* myRank == 1 */
   {
       /* receive from process 1 */
       MPI_Recv(message, count, MPI_CHAR, 0, 0, MPI_COMM_WORLD, &status);
       /* send back to process 1 */
       MPI_Send(message, count, MPI_CHAR, 0, 0, MPI_COMM_WORLD);
   }
```
Output The program should print the average cost of a computation, and a series of average costs for communication: the cost for a message of 0 bytes, 10 bytes, 20 bytes, ... 100 bytes, 200 bytes, ... 1000 bytes.
Discussion You should include a discussion of your results, either as a separate file or as a section in your code (comments). the discussion should include the following.
1. How did you take the computation timings? Did you consider the loop overhead? Why?
2. What do you think the effect of the call to MPI_Wtime() itself? Does it skew the result (i.e. what about the time spent on calling WPI_Wtime())?
3. Instead of using elapse time, one might use actual CPU time. Would this make any different in final values of the timing? Discuss pros and cons using CPU timing vs. elapsed timing.

Appendix For a brief instruction on how to compile and run an MPI program, check out

<http://www.cs.panam.edu/~meng/Course/CS6356/MPI/instructions> for UNIX based MPI. Also check out an example of simple MPI program at <http://www.cs.panam.edu/~meng/Course/CS6356/MPI/hello.c> which is included in this assignment description.

Hand-in: Turn in the three programs in paper. Make sure they are properly documented.

About this document ...

Next: About this document

Xiannong Meng
Tue Sep 5 10:35:44 CDT 2000