ARSC T3E Users' Newsletter 116, March 21, 1997

T3E Overview

This article quickly hits on many aspects of the T3E and its user environment. In coming issues, we will delve into the details, but the simple question today is: What are the first things you will notice about the ARSC T3E?

Standalone system

First, the T3E is a standalone system, there is no frontend system such as the YMP. So what does the compiling and other tasks that used to be performed by the front end? To understand this a brief description of the structure of the T3E is needed.

OS, Command, and Application Processors

The T3E consists of a collection of processors connected by a torous in a similar manner to the T3D. However, on the T3D all of these processors were available for user tasks, on the T3E some are defined as OS processors and these perform low level support for the operating system, Command processors which support the users interactive session, and application processors which run the parallel tasks users submit to the system.

grmview, mppview, xmppview

A new utility, grmview, gives a detailed view of the processor distribution between these three classes and the current usage. The numbers in each group will change to reflect the loads on the system. mppview replaces mppinfo in describing the processor usage, a new graphical interface, xmppview, gives a 3d view of the system and various aspects of processor usage.

No more powers of two

Users are no longer restricted to run with processor numbers that are a power of two: any number of processors can be used. Users now have a greater freedom to request the most suitable number of processors for the current job, e.g. the minimum number of processors to obtain sufficient memory. The removal of this constraint should result in greater throughput and improved turnaround overall.

Fortran 90

The fortran77 compiler has been replaced with a fortran90 compiler. This should accept all current f77 codes. Users can test f77 code under the f90 compiler on denali.

Performance

Performance has been improved. The observed bandwidth between processors has increased from 120MByte/sec to 320MByte/sec for SHMEM, from 15MByte/sec to 80MByte/sec for MPI. Early experiences show that code taken across from the T3D is running 2-3 times faster without any modification.

SHMEM

With the SHMEM library, SHMEM_PUT and SHMEM_GET now give similar bandwidths so users can choose whichever best matches the data transfer activity. Special hardware how ensures the caches are consistent so calls to the SHMEM_UDC_FLUSH and SHMEM_SET_CACHE are no longer necessary. However users are recommended to keep placing these in code to be portable between the T3D and T3E. Cache coherency functions are no-ops on the T3E. Improved routing in the torous to avoid contention/hotspots means that data sent via SHMEM_PUT many not arrive in the same order as the SHMEM_PUT calling sequence. This is of particular importance for users who determine if a transfer is complete by reading the value of the last data item in a transfer. SHMEM_FENCE can be used to enforce the ordering of transfers or one may test the entire message with SHMEM_WAIT.

Malleable executables, mpprun

Users can compile for a fixed number of processors and simply type the name of the program at the command line for execution. Malleable executables, where the number of processors was not specified at compile time, conceptually replace the T3D's plastic executables. To run these, use the command: mpprun -n NPES a.out [ARGS] where you would have use the T3D command: [mppexec] a.out [ARGS] -npes NPES

Timings on Multiple Cray Platforms (revisited)

Newsletter #99 presented a fortran subroutine which used the "gethmc" system call to obtain the system clock frequency, and worked across Cray platforms. The programmer could bracket some code with a pair of calls to "irtc" to determine the number of clock ticks spent in that code, and then, using the clock speed, compute the Mflop/s rate of that code, elapsed time in seconds, etc.

"gethmc" has been replaced in the PE 2.O libraries with a posix compliant routine, PXFSYSCONF. Here's an f90 module which can now be used, across platforms, to get the clock speed (it is taken from the "man" page for PXFSYSCONF and modified):


cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc
       module timing
       implicit none

       contains

       integer function iget_clktck ()
         integer ival, ierr, iclktck
         ival = 0
         iclktck = 0
         call pxfconst ('CLK_TCK',iclktck,ierr)
         if (ierr.ne.0) then
           print *,'FAIL: error from pxfconst = ',ierr
           goto 9999
         endif
         call pxfsysconf (iclktck, ival, ierr)
         if (ierr.ne.0) then
           print *,'FAIL: error from pxfsysconf = ',ierr
         endif

9999     continue
         iget_clktck = ival
       end function iget_clktck

       end module timing
cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc

Here's a test program:


cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc
        program prog
        use timing
        implicit none

        integer :: ticks_per_sec

        ticks_per_sec = iget_clktck ()
        print*, "ticks per sec: ", ticks_per_sec
        end
cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc

And here are results from runs on the three ARSC platforms:


  yukon (T3E):
   ticks per sec:  300000000
  
  denali (Y-MP):
   ticks per sec:  166666667
  
  denali (T3D):
   ticks per sec:  150015001

A few T3E WWW Pages

A full description of the ARSC T3E can be found at: http://www.arsc.edu/pubs/bulletins/T3Erelease.shtml

Data on the T3E in general can be found at: http://www.cray.com/PUBLIC/product-info/T3E/overview.html

NERSC's good tutorial and other T3E documentation is at: http://www.nersc.gov/training/links.html

Quick-Tip Q & A


A: {{ How would you condense every occurrence of multiple blank lines in a
      file into a single blank line? }}

    # Here's a perl script that will do it (except that multiple 
    # blanks at the top will come out as two).   There are certainly
    # other ways to do this, using sed, for instance, but why mess
    # with sed when you've got perl? (Yes, the T3E is getting perl.)
    ########

    #!/usr/local/bin/perl

    while (<STDIN>) {            # Load all input lines into one string
      $f .= $_;
    }
  
    $f =~ s
\n\n[\n]*
\n\n
g;    # Replace all groups of two or more 
                                 #  consecutive newlines with exactly two.
    print $f;                    # Output result.


Q: In Cray's Programming Environment 2.0, how can you tell what versions
   of libraries, compilers, etc... will be used as the current default?

[ Answers, questions, and tips graciously accepted. ]


Current Editors:
Ed Kornkven ARSC HPC Specialist ph: 907-450-8669
Kate Hedstrom ARSC Oceanographic Specialist ph: 907-450-8678
Arctic Region Supercomputing Center
University of Alaska Fairbanks
PO Box 756020
Fairbanks AK 99775-6020
E-mail Subscriptions: Archives:
    Back issues of the ASCII e-mail edition of the ARSC T3D/T3E/HPC Users' Newsletter are available by request. Please contact the editors.
Back to Top