ARSC T3D Users' Newsletter 25, March 3, 1995

ARSC T3D Future Upgrades

Next week we will be testing the upgrade to the T3D 1.2 Programming Environment (libraries, tools and compilers.) If all goes well it will be on the system in three weeks.

We are also planning to install CF90 and C++ for the T3D. This will come after the upgrade to the 1.2 P.E. I am interested in hearing from users who want to use the CF90 and C++ products as soon as they are available.

Upgrade to the T3D Memory

On February 7th, ARSC upgraded the memory on each PE from 2MWs to 8MWs. If any users have questions about this, please contact Mike Ess. We have run into problems with the user limit for mppcore size which is now set too low for the 8MW nodes. On March 3rd, all T3D users had their mppcore file limit increased to allow a core file from 8 PEs. If you are planning to debug with core files on more than 8 PEs, then please contact Mike Ess. If your setting is too low, you will get an error message like:

  mppexec: user UDB core limit reached, mppcore dump terminated
The T3D can create tremendously large mppcore files, the storage for these files are charged against your service unit allocation. If you're not going to use mppcore files, delete them as soon as you don't need them. These files appear in the directory of the executable that aborted with the name "mppcore". Y-MP jobs that abort produce a core file called "core".

With the increased memory size, another problem has arisen. A user has tried to create his own restart files by writing out large data arrays from all PEs at the same time. With the 8MW nodes the size of these arrays can be much larger than before. At times these write operations hang and the T3D application hangs. This shouldn't be the case, but as a work-around the user 'serialized' these write operations to go one at a time. This could be implemented in Fortran with a shared variable. In this case the user used a PVM message from PE N to PE N+1 to notify PE N+1 that its write could begin.

New SHMEM Manuals from CRI

One of the recipients of the ARSC T3D newsletter alerted me to postscript copies of the SHMEM manuals that I wrote about in Newsletter #24.

  SN-2516 SHMEM Technical guide for Fortran Users
  SN-2517 SHMEM Technical guide for C Users
I can e-mail these postscript versions to ARSC users who request them.

FMlib Available on ARSC's T3D

The FM library of communication functions is available and running on the ARSC T3D. The include file "fm.h", which is needed, is in the directory

  /usr/local/examples/mpp/include
the needed library is:

  /usr/local/examples/mpp/lib/libFM.a
These functions come in the similar send/receive pairs as PVM but have much smaller latencies than the PVM functions on the T3D. One set of functions use the fetch and increment hardware and the other set use the atomic swap hardware. A brief description of the functions follows:

  void FM_initialize(void);   /* Setup */
  void FM_set_parameter(int param_id, int value) 
  /* Change maximum message size or number of buffers */

  typedef void FM_shandler( );
  void FMf_send_4( int node, FM_shandler *fptr, ... )
  /* Sent a 4 word message, with four words as arguments to the 
     handler FM_shandler executed on the receiving PE node. */

  typedef void FM_lhandler( void *, int);
  void FMf_send(int node,FM_lhandler,void *buf,int byte_count);
  /* Sent a message of length byte_count, with handler
     FM_lhandler executed on the receiving PE node. */

  int FMf_extract_1( void );
  /* retrieve the next message sent to the current node */
  int FMf_extract( void );
  /* a more general message retrieve */
These functions with the FMf_ prefix use the fetch and increment hardware, there is a parallel set of functions with a FMi_ prefix that use the atomic swap hardware. Further details are in the WWW page:

  http://www-csag.cs.uiuc.edu/projects/communication/t3d-fm.html
The idea of passing a function to be executed on the receiving node is intriguing, but I haven't figured out a use for it. Because these functions are oriented towards small latencies they seem specialized for small (<4Kb) messages. I'll do some timing comparisons for next week's newsletter. Below is an example program and its output when run on a 2PE machine

  #include <stdio.h>
  #include <mpp/shmem.h>
  #include <mpp/globals.h>
  #include <fm.h>
  int num_incoming_msgs = 0;

  void remote_handler_short(int i1, int i2, int i3, int i4) {
    num_incoming_msgs++;
    printf( " %d %d %d %d\n", i1, i2, i3, i4 );
  }
  void remote_handler_long(void* buf, int byte_count ) {
    int* buffer = (int *)buf;
    int i;
    num_incoming_msgs++;  
    printf( " %d %d", num_incoming_msgs, ( byte_count / 8 ) );
    for(i = 0; i < (byte_count/8); i++) printf(" %d",buffer[i]);
    printf( "\n" );
  }       
  main() {
   int i;
   int buf[32];
   int my_pe = _MPP_MY_PE;
   for( i = 0; i < 32; i++ ) buf[ i ] = 0;
   FM_set_parameter( MAX_MSG_SIZE_FINC, 1024 );
   FM_set_parameter( MSG_BUFFER_SIZE_FINC, 512 );
   FM_initialize();
   if (my_pe == 0) {
     for( i = 0; i < 32; i++ ) buf[ i ] = i;
     for (i = 0; i < 10; i++) {
       FMf_send_4(1, remote_handler_short, i, i+1, i+2, i+3 );
       FMf_send(1, remote_handler_long, (void *)buf, (i+1)*8 );
     }    
   }
   else if (my_pe == 1) {
     while ( num_incoming_msgs < 2 * 10 ) {
       FMf_extract_1();
       FMf_extract();
     }
   }      
   barrier();
  }

   0 1 2 3
   2 1 0
   1 2 3 4
   4 2 0 1
   2 3 4 5
   6 3 0 1 2
   3 4 5 6
   8 4 0 1 2 3
   4 5 6 7
   10 5 0 1 2 3 4
   5 6 7 8
   12 6 0 1 2 3 4 5
   6 7 8 9
   14 7 0 1 2 3 4 5 6
   7 8 9 10
   16 8 0 1 2 3 4 5 6 7
   8 9 10 11
   18 9 0 1 2 3 4 5 6 7 8
   9 10 11 12
   20 10 0 1 2 3 4 5 6 7 8 9

Libsci Routines and Improved Speed on the T3D

"Linpack" when it was introduced in 1979 by Jack Dongarra, Cleve Moler, J.R. Bunch and Pete Steward, it was not a benchmark but a collection of linear algebra routines and the "Linpack Benchmark" grew from an appendix of the Linpack Users' Guide that compared various machines solving a 100x100 system of linear equations. The Linpack Benchmark measures the speed of only two of the routines in the package:

sgefa - factors a dense square matrix sgesl - uses the above factorization to solve a system of equations

The naming convention for the linpack routines is straightforward:

  sgefa
  ^^ ^
  

 

  

 - fa for factorization(sl for solve,...)
  
--- ge for general matrix(pp for packed,positive definite,...)
  ---- s  for single precision(d for double precision,...)
In 1992, the Lapack package was introduced as a replacement for Linpack and Eispack (a package of eigenvalue routines). This library was more comprehensive and faster than the previous packages. It contains replacements for the linpack routines sgefa and sgesl, and the linpack benchmark can be modified to run with Lapack routine with only the following changes:

  c     call sgefa(a,lda,n,ipvt,info)
        call sgetrf( n, n, a, lda, ipvt, info )

  c     call sgesl(a,lda,n,ipvt,b,0)
        call sgetrs( "n", n, 1, a, lda, ipvt, b, n, info )
The naming conventions and arguments passed are similar to the Linpack routines. With these two changes to the Linpack benchmark, we can compare the single PE results from the last newsletter with these single PE results of "the Linpack benchmark run with the Lapack routines".

  Problem      Linpack Mflop rates     Lapack Mflop rates
    size   Fortran BLAS1  Libsci BLAS1
     1          .19          .13              .04
     2          .45          .30              .15
     3          .90          .30              .03
     4         1.50         1.04              .06
     5         2.21         1.28              .09
    10         5.04         2.92             2.55
    20         8.72         6.07             5.70
    40        11.04        10.72            13.48
    50        10.93        12.29            15.95
    60        11.10        13.49            20.13
    70        11.21        13.38            21.12
    80        11.43        13.31            24.89
    90        11.55        13.46            25.15
   100        11.64        13.75            27.73
   200        12.08        17.30            36.82
   300        12.03        19.66            41.51
   400        11.84        21.20            44.12
   500        11.60        22.26            45.40
   600        11.36        23.03            46.73
   700        11.21        23.61            47.80
   800        11.14        24.14            48.57
   900        10.97        24.40            48.90
  1000        10.83        24.69            49.47
  1500        10.15        25.62            51.04
  2000         9.83        26.18            51.87
  2500         9.70        24.61            52.30
  2600         9.70        26.49            52.42
The Lapack routines on the T3D are a good improvement over the Linpack routines. Neither Linpack nor Eispack are in libsci for the T3D, the T3D version of libsci has only Lapack routines. On an older machine like the Y-MP, Linpack and Eispack are provided in libsci for legacy codes. The Y-MP libsci also contains all of the Lapack routines.

List of Differences Between T3D and Y-MP

The current list of differences between the T3D and the Y-MP is:
  1. Data type sizes are not the same (Newsletter #5)
  2. Uninitialized variables are different (Newsletter #6)
  3. The effect of the -a static compiler switch (Newsletter #7)
  4. There is no GETENV on the T3D (Newsletter #8)
  5. Missing routine SMACH on T3D (Newsletter #9)
  6. Different Arithmetics (Newsletter #9)
  7. Different clock granularities for gettimeofday (Newsletter #11)
  8. Restrictions on record length for direct I/O files (Newsletter #19)
  9. Implied DO loop is not "vectorized" on the T3D (Newsletter #20)
  10. Missing Linpack and Eispack routines in libsci (Newsletter #25)
I encourage users to e-mail in differences that they have found, so we all can benefit from each other's experience.
Current Editors:
Ed Kornkven ARSC HPC Specialist ph: 907-450-8669
Kate Hedstrom ARSC Oceanographic Specialist ph: 907-450-8678
Arctic Region Supercomputing Center
University of Alaska Fairbanks
PO Box 756020
Fairbanks AK 99775-6020
E-mail Subscriptions: Archives:
    Back issues of the ASCII e-mail edition of the ARSC T3D/T3E/HPC Users' Newsletter are available by request. Please contact the editors.
Back to Top