ARSC T3D Users' Newsletter 66, December 22, 1995

MPI on the ARSC T3D

The are two versions of the MPI (message passing interface) available on the ARSC T3D. They are:

  • the 1.4a version of the EPCC/MPI version in the default location of:
    
      /usr/include/mpp and /mpp/lib
    
  • the 1.0.11 Argonne/Mississippi State version in:
    
      /usr/local/examples/mpp/mpich/include           
    and
    
      /usr/local/examples/mpp/mpich/lib/cray_t3d/t3d
    
Below is a sample makefile that makes an executable for each version:

  ------------------------------makefile------------------------------------------

  MPIINC = /usr/local/examples/mpp/mpich/include
  MPILIB = /usr/local/examples/mpp/mpich/lib/cray_t3d/t3d/libmpi.a

  14AINC=  /usr/include/mpp
  14ALIB=  /mpp/lib

  argonne:
          /mpp/bin/f90 -dp -X8 ring_mpi_mpich.f -I$(MPIINC) $(MPILIB)

  epcc14:
          /mpp/bin/f90 -dp -X8 ring_mpi_mpich.f -I/usr/include/mpp -lmpi

  clean:
          -rm *.o a.out mppcore
  
  ---------------------------end of makefile--------------------------------------
For the PVM user, MPI is a very similar collection of library routines to effect communication between programs running on different processors. Below is a sample MPI program for measuring the time it takes for a message to be sent around a ring of processors. This program is from Dr. Alan Wallcraft, a scientist with the Naval Research Center in Stennis, Mississippi:

  ------------------------sample MPI Fortran Program------------------------------

        PROGRAM RING
        IMPLICIT NONE
  C
        INTEGER          MPROC,NPROC
        COMMON/CPROCI/   MPROC,NPROC
  C
  C**********
  C*
  C 1)  PROGRAM TIMING A 'RING' FOR VARIOUS BUFFER LENGTHS.
  C
  C 2)  MPI VERSION.
  C*
  C**********
  C
        INCLUDE "mpif.h"
  C
        INTEGER MPIERR,MPIREQ(4),MPISTAT(MPI_STATUS_SIZE,4)
        INTEGER MYPE,MYPEM,MYPEP,NPES
  C
  *     REAL*8 MPI_Wtime
        REAL*8 T0,T1
  C
        INTEGER I,IRING,N2,NN,NR
        REAL*4  BUFFER(8192)
  C
  C     INITIALIZE.
  C
        CALL MPI_INIT(MPIERR)
        CALL MPI_COMM_RANK(MPI_COMM_WORLD, MYPE, MPIERR)
        CALL MPI_COMM_SIZE(MPI_COMM_WORLD, NPES, MPIERR)
        MYPEM = MOD( NPES + MYPE - 1, NPES)
        MYPEP = MOD(        MYPE + 1, NPES)
  C
        IF     (MYPE.EQ.0) THEN
          WRITE(6,*) 
          WRITE(6,*) 'NPES,MYPE,MYPE[MP] = ',NPES,MYPE,MYPEM,MYPEP
          WRITE(6,*) 
          CALL FLUSH(6)
        ENDIF
        CALL MPI_BARRIER(MPI_COMM_WORLD,MPIERR)
  C
        DO I= 1,8192
          BUFFER(I) = I
        ENDDO
  C
  C     SMALL BUFFER TIMING LOOP.
  C
        DO N2= 1,9
          DO I= -1,0
            NN = 2**N2 + I
            NR = 8192/(2**N2)
            CALL MPI_BARRIER(MPI_COMM_WORLD,MPIERR)
            T0 = MPI_Wtime()
            DO IRING= 0,NR-1
              IF     (MYPE.EQ.0) THEN
                CALL MPI_SEND(BUFFER(1+IRING*NN),NN,MPI_REAL4,
       +                      MYPEP, 9901, MPI_COMM_WORLD,
       +                      MPIERR)
                CALL MPI_RECV(BUFFER(1+IRING*NN),NN,MPI_REAL4,
       +                      MYPEM, 9901, MPI_COMM_WORLD,
       +                      MPISTAT, MPIERR)
              ELSE
                CALL MPI_RECV(BUFFER(1+IRING*NN),NN,MPI_REAL4,
       +                      MYPEM, 9901, MPI_COMM_WORLD,
       +                      MPISTAT, MPIERR)
                CALL MPI_SEND(BUFFER(1+IRING*NN),NN,MPI_REAL4,
       +                      MYPEP, 9901, MPI_COMM_WORLD, 
       +                      MPIERR)
              ENDIF
            ENDDO
            T1 = MPI_Wtime()
  *         CALL MPI_BARRIER(MPI_COMM_WORLD,MPIERR)
            IF     (MYPE.EQ.0) THEN
              WRITE(6,6000) NN,(T1-T0)*1.0D6/(NR*NPES)
              CALL FLUSH(6)
            ENDIF
          ENDDO
        ENDDO
        CALL MPI_BARRIER(MPI_COMM_WORLD,MPIERR)
        CALL MPI_FINALIZE(MPIERR)
        STOP
   6000 FORMAT(' BUFFER = ',I6,'   TIME =',F10.1,' Microsec')
  C     END OF RING.
        END

  -----------------------end of sample MPI Fortran program------------------------
As with PVM, MPI provides a library of communication routines and common variable names through an include file. A typical result for the above program on the Argonne/Mississippi State version of MPI running on 8PEs of ARSC's T3D:

  NPES,MYPE,MYPE[MP] = 8,  0,  7,  1

  BUFFER =      1   TIME =      49.7 Microsec
  BUFFER =      2   TIME =      49.5 Microsec
  BUFFER =      3   TIME =      61.4 Microsec
  BUFFER =      4   TIME =      50.2 Microsec
  BUFFER =      7   TIME =      62.0 Microsec
  BUFFER =      8   TIME =      51.1 Microsec
  BUFFER =     15   TIME =      38.4 Microsec
  BUFFER =     16   TIME =      38.2 Microsec
  BUFFER =     31   TIME =      39.4 Microsec
  BUFFER =     32   TIME =      38.9 Microsec
  BUFFER =     63   TIME =      40.8 Microsec
  BUFFER =     64   TIME =      40.0 Microsec
  BUFFER =    127   TIME =      43.0 Microsec
  BUFFER =    128   TIME =      42.4 Microsec
  BUFFER =    255   TIME =      47.7 Microsec
  BUFFER =    256   TIME =      47.5 Microsec
  BUFFER =    511   TIME =      57.6 Microsec
  BUFFER =    512   TIME =      56.8 Microsec
Using the same distribution tape for the T3D version of the Argonne/Mississippi, I installed MPI on our local SGI network of Indys connected with Ethernet. I ran the exact same MPI program on each of the T3D versions and the versions running on the SGIs. Below is a summary of the timings:

Table 1

MPI times (microseconds) for sending messages around a ring of 8 processors

  message size  T3D (Argonne/  T3D (EPCC/  Network of SGI Indys
  in 32b words  Mississippi)    CRI 1.4a)  (Argonne/Mississippi)
                 CRI's f90     CRI's cf77       SGI's f77

    1               49.7          40.4           1403.3
    2               49.5          43.5           1482.7
    3               61.4          43.0           1446.9
    4               50.2          53.1           1440.7
    7               62.0          54.2           1505.8
    8               51.1          55.9           1553.0
   15               38.4          58.6           1452.0
   16               38.2          59.8           1528.6
   31               39.4          65.6           1525.6
   32               38.9          65.2           1519.7
   63               40.8          76.5           1569.1
   64               40.0          75.4           1597.8
  127               43.0          92.3           1816.9
  128               42.4          91.1           1741.8
  255               47.7         140.8           2266.3
  256               47.5         140.9           2256.2
  511               57.6         208.6           3255.5
  512               56.8         210.2           3276.6
Because the EPCC/CRI version doesn't support the real*4 datatype of the CRI f90 compiler, the times for the EPCC/CRI version of MPI has to use the cf77 compiler. This compiler, like all CRI compilers since the late 1970's, treats real*4 as real*8 without a warning to the user.

Documentation

For the EPCC/CRI version of MPI there is a postscript version of a user manual in /usr/local/examples/mpp/mpi/user.ps and there are sample Fortran and C programs and a makefile in directory /usr/local/examples/mpp/mpi/test. These files are very specific to the EPCC/CRI implementation on the T3D.

For the the Argonne/Mississippi State version of MPI there is much more extensive documentation but it is not specific to the T3D. By adding the line:


  setenv MANPATH "$MANPATH":/usr/local/examples/mpp/mpich/man
the user has access to more than 350 manpages including a complete collection of routine manpages, accessible for examples as:

  man MPI_SUM
In the the directory /usr/local/examples/mpp/mpich/doc there are manuals for the Argonne/Mississippi State version of MPI:

  functions.ps.Z  - 15 pages of a quick reference of functions and their
                    calling sequences of 
  guide.ps.Z      - 32 page user's guide
  install.ps.Z    - 25 page installation guide that I used for both the T3D and
                    our network of SGI workstations
  mpiman.ps.Z     - 125 pages (draft!) reference manual
  adiman.ps.Z     - a description of the ADI interface to MPI

Current Editors:
Ed Kornkven ARSC HPC Specialist ph: 907-450-8669
Kate Hedstrom ARSC Oceanographic Specialist ph: 907-450-8678
Arctic Region Supercomputing Center
University of Alaska Fairbanks
PO Box 756020
Fairbanks AK 99775-6020
E-mail Subscriptions: Archives:
    Back issues of the ASCII e-mail edition of the ARSC T3D/T3E/HPC Users' Newsletter are available by request. Please contact the editors.
Back to Top