ARSC T3D Users' Newsletter 39, June 9, 1995

MPI on the T3D

From the Edinburgh Parallel Computing Centre (EPCC), ARSC has received a copy of the formal release of CRI/EPCC MPI for the T3D and installed it at ARSC. There are now two implementations of MPI on the ARSC T3D:

  1. The MPICH version 0.1a from Argonne and Mississippi State described in ARSC T3D newsletter #34 (5/5/95)
  2. The EPCC/CRI initial release
To the user the EPCC/CRI release consists of two include files:

  $(PATH)/include/mpi.h     # for C programs
  $(PATH)/include/mpif.h    # for Fortran programs
and a library:

  $(PATH)/lib/libmpi.a
Currently at ARSC we have that PATH=/usr/local/examples/mpp/mpi and shortly it will change to /mpp. In the directory, /usr/local/examples/mpp/mpi/test there are some examples that run correctly on ARSC's T3D.

There is a postscript user's guide that I will e-mail to anyone interested in using the EPCC/CRI version of MPI. This user's guide includes instructions for compiling, running and debugging T3D programs to use MPI.

A description of the implementation of EPCC/CRI's MPI is provided in the article: "The MPI Message Passing Standard on the Cray T3D" by Lyndon J. Clark that appears in the 1994 Fall CUG Proceedings. This paper has descriptions of the MPI functions and examples of MPI code. There is a wealth of MPI information available at: http://www.mcs.anl.gov/mpi.

If any ARSC user has problems using MPI please contact Mike Ess and he will relay the problems to EPCC. Here is the list of known bugs that accompanies the EPCC/CRI release of MPI:


  >This is the list of known problems in release 1.1a, the current major release. 
  > 
  > Release and Installation 
  > 
  >      Problems with the release and installation mechanism:
  > 
  >        1. The ./install script should make use of cpset rather than cp -p. 
  >           This means that it may be necessary to change the owner of the 
  >           installed files to bin in keeping with other installed libraries 
  >           and headers.
  > 
  >        2. The confidence check makefile should specify the default include 
  >           path using -I/usr/include/mpp for Fortran compilation; it is not 
  >           required for C compilation.
  > 
  >        3. The MPI library is not compiled with -hjump which means that a 
  >           problem relating to limits in the branching distance within an 
  >           executable can occur. This problem is only prevalent for very large 
  >           applications and the symptoms are clear; if this problem arises,
  >           contact us to obtain a library that uses jumps instead of branches.
  > 
  > MPI Conformance 
  > 
  >     This MPI implementation aims to conform to the standard dated 5 May 1994. 
  >     Minor errata changes have been incorporated into the standard and are 
  >     under correction at present; the implementation will be upgraded when 
  >     these changes have stabilized. Known problems with conformance to the 
  >     pre-errata MPI standard:
  > 
  >        1. The passing of Fortran CHARACTER arrays and strings as message data
  >           to all relevant MPI calls is erroneous. This can cause an address 
  >           error fault or message/memory corruption.
  > 
  >        2. The MPI Profiling Interface (Chapter 8 of the MPI standard) has not
  >           been implemented: no name-shifted (PMPI_) calls are defined.
  > 
  >        3. Completion of MPI_Status objects for immediate send requests is 
  >           incorrect. In particular MPI_Get_count will return zero, regardless
  >           of the actual send count.
  > 
  >        4. Passing MPI_BOTTOM to MPI_Pack and MPI_Unpack as the unpacked
  >           message buffer address is erroneously interpreted as incorrect, 
  >           resulting in an MPI exception (of class MPI_ERR_ARG).
  > 
  >        5. User buffer detach operation, MPI_Buffer_detach, always returns 
  >           null buffer address and zero size.
  > 
  >        6. Group and communicator free operations, MPI_{Group,Comm}_free 
  >           erroneously fail to change the value of the {group, communicator} 
  >           handle passed as INOUT argument to MPI_{GROUP,COMM}_NULL.
  > 
  >        7. Group comparison operator, MPI_Group_compare, erroneously gives 
  >           MPI_UNEQUAL for some group pairs that should be MPI_SIMILAR; also 
  >           gives MPI_CONGRUENT for different groups with the same members and 
  >           order, instead of MPI_IDENT.
  > 
  >        8. Datatype count and element query, MPI_Get_{count,elements}, do not 
  >           give MPI_UNDEFINED as expected when the received message does not 
  >           contain an integral number of datatype (elements).
  > 
  >        9. Explicit setting of lower-bound, using MPI_LB datatype, with 
  >           displacement greater than the displacement of previous elements of a
  >           structure type is not effective: lower-bound is given as the 
  >           smallest previous displacement.
  > 
  > Software Faults 
  > 
  >      Known bugs in the MPI implementation:
  > 
  >        1. There is a race condition within MPI_Scatter[v] which can result 
  >           in an address error. This is possible with a tight series of 
  >           scatters, followed by an MPI_Barrier, especially prevalent with 
  >           commuicators covering fewer processes than MPI_COMM_WORLD.
  > 
  >        2. Resetting the error handler to either of the predefined handlers:
  >           MPI_ERRORS_ARE_FATAL and MPI_ERRORS_RETURN is erroneous and causes 
  >           an address error during error handling for the subject communicator.
  > 
  >        3. Constructing derived datatypes of zero overall size will result in 
  >           an internal MPI exception (class MPI_ERR_INTERN) within the 
  >           datatype constructor, or may cause a Floating Point Exception when 
  >           used in communications.
  > 
  >        4. Passing MPI_OP_NULL, or a handle referring to a freed user 
  >           operator, to MPI global reduction calls (eg. MPI_Reduce) results 
  >           in an address error.
  > 
  >        5. Use of MPI_Request_free on active communication requests can cause 
  >           an address error in subsequent communications calls.
  > 
  > Software Limitations 
  > 
  >      Known limitations in the MPI implementation:
  > 
  >        1. The number of outstanding communications relating to any process 
  >           is limited. Once this limit is exceeded the application will abort 
  >           with an informative error message. This is not an MPI exception, 
  >           meaning that it is not handled by an attached user error handler
  >           routine.

New T3D Batch PE Limits

All active users of the ARSC T3D have had their batch PE limit increased to 128. This allows these users access to the 128-PE 8-hour queues that run on the weekends. If you need your T3D UDB limits changed please contact Mike Ess.

New Fortran Compiler

An upgrade version of the cf77 compiler is available on Denali with the path:

  /mpp/bin/cft77new 
and

  /mpp/bin/cf77new
For the default versions we have:

/mpp/bin/cf77 -V

  Cray CF77_M   Version 6.0.4.1 (6.59)   05/25/95 13:36:39
  Cray GPP_M    Version 6.0.4.1 (6.16)   05/25/95 13:36:39
  Cray CFT77_M  Version 6.2.0.4 (227918) 05/25/95 13:36:39
and for this new version:

/mpp/bin/cf77new -V

  Cray CF77_M   Version 6.0.4.1 (6.59)   05/25/95 13:37:26
  Cray GPP_M    Version 6.0.4.1 (6.16)   05/25/95 13:37:26
  Cray CFT77_M  Version 6.2.0.9 (259228) 05/25/95 13:37:27
This new compiler fixes a potential race condition in shared memory accesses and also fixes an inlining problem with the F90 intrinsics, MINLOC and MAXLOC.

I have completed my testing of this compiler and it will become the default on June 20. I encourage users to try this compiler before it becomes the default.

List of Differences Between T3D and Y-MP

The current list of differences between the T3D and the Y-MP is:
  1. Data type sizes are not the same (Newsletter #5)
  2. Uninitialized variables are different (Newsletter #6)
  3. The effect of the -a static compiler switch (Newsletter #7)
  4. There is no GETENV on the T3D (Newsletter #8)
  5. Missing routine SMACH on T3D (Newsletter #9)
  6. Different Arithmetics (Newsletter #9)
  7. Different clock granularities for gettimeofday (Newsletter #11)
  8. Restrictions on record length for direct I/O files (Newsletter #19)
  9. Implied DO loop is not "vectorized" on the T3D (Newsletter #20)
  10. Missing Linpack and Eispack routines in libsci (Newsletter #25)
  11. F90 manual for Y-MP, no manual for T3D (Newsletter #31)
  12. RANF() and its manpage differ between machines (Newsletter #37)
I encourage users to e-mail in differences that they have found, so we all can benefit from each other's experience.
Current Editors:
Ed Kornkven ARSC HPC Specialist ph: 907-450-8669
Kate Hedstrom ARSC Oceanographic Specialist ph: 907-450-8678
Arctic Region Supercomputing Center
University of Alaska Fairbanks
PO Box 756020
Fairbanks AK 99775-6020
E-mail Subscriptions: Archives:
    Back issues of the ASCII e-mail edition of the ARSC T3D/T3E/HPC Users' Newsletter are available by request. Please contact the editors.
Back to Top