ARSC T3D Users' Newsletter 54, September 30, 1995

New Charging Algorithm at ARSC

As of October 1st, 1995 ARSC will be changing its algorithm for calculating Service Units (SUs). The current algorithm is:


    1.000*Y-MP(CPU-hours)
  + 0.050*T3D(CPU-hours)
  + 0.005*Memory(MWord-hours)
  + 0.003*your Denali files(GByte-hours)
    ------------------------
  = total SU's
The new algorithm as of October 1st is:

    1.000*Y-MP(CPU-hours)
  + 0.010*T3D(CPU-hours)
  + 0.005*Memory(MWord-hours)
  + 0.003*your Denali files(GByte-hours)
    ------------------------
  = total SU's
The only change is the charge for using the ARSC T3D:

  from:  
0.050
*T3D(CPU-hours)
    to:  
0.010
*T3D(CPU-hours)
This reflects a depreciation of T3D time in line with our attempts to boost T3D utilization. It brings our valuations in line with other T3D sites and advertises our wish to become a production site for T3D applications. If you have questions about this change contact Mike Ess.

T3E Information at the Alaska CUG

CUG is the best place to get information about CRI and their users. In the coming weeks I'll be sending out some of what I learned at this CUG but for this issue I'd like to share what I learned about the T3E. These are just my notes from various speakers and I couldn't write as fast as they could talk.

From Jeff Brook's talk on single PE optimization:


       T3D                            T3E
       ---                            ---
  Dec's EV4 chip              Dec's EV5 chip
  6 clock pipeline            4 clock pipeline
  150 Mhz                     300 Mhz
  8KB data cache              first 8KB data cache
     direct mapped              direct mapped       
     write back                 write back
                              secondary 96KB cache
                                3 way set associative
                                write thru 
                                cache coherency by CRI
  1 read ahead stream         6 read ahead streams

  From the streams benchmark (simulated results for T3E)(MBs)

    operation    IBM 6000/590      T3D      T3E

       copy            600         205      543
       scale           533         151      514
       summation       655         151      583
       triad           655         106      606

  (This benchmark measures the bandwidth of the memory system as
   timed by a DO loop executing the above operations.)
From Steve Johnson's talk on new hardware from CRI:

  Design started in 4Q 1992
  Targeted to be 3X better that T3D in the components:
    CPU speed
    Memory 
    Interconnect speed
  Entry level price $.5M
  1 PE per node (currently 2 PE per node on T3D)
  1 I/O controller per 4 PEs
    (currently 1 I/O node per 64 PEs on T3D)
  Can use CRI's SCX connection to a CRI PVP machine
  Available March 1996
From Irene Qualters talk in the General Session:

  Target is 3X the performance of the T3D
  From simulations:
    5X   faster on some applications
    4.2X faster on the harmonic mean for the Livermore Loops
    Will have "serverized Unix" I/O nodes and system server nodes
From Gary Geissler's talk "T3E Overview and Status:

The current status is that they are waiting for custom chips to come back from their IC vendor. The official product announcement will be at the end of November (so all that is in this report is subject to change.) CRI is on target with their long term goal for the T3X family (based on 2048 PEs):


  1993 T3D   Peak performance  300 Gflop/s
  1996 T3E   Peak performance    1 Tflop/s
  1998 T3SN  Sustained   "       1 Tflop/s  (SN = scalable node)
All of the T3E is implemented in CMOS, which is a proven, mature technology. CMOS implementation permitted the implementation of an air cooled version. The T3E will be "binary compatible" with T3D executables but additional performance will come from recompiling the source. It is not intended to be a collection of workstations but can be used in that mode. Available in 16 to 2048 PEs in multiples of 4PEs. Available in late March, 1996.

Uses the Dec Alpha 21164, EV5 chip. The design starts with the 300 Mhz version but has the ability to use the 380 Mhz and 450 Mhz versions of the chip. This chip can execute four instructions simultaneously: 1 floating point add, 1 floating point multiply and 2 integer instructions. This is in contrast to the T3D which can execute only 2 instructions per clock period.

With a single PE per node, the memory is now 8 bank interleaved with a 1.2 GB/s peak bandwidth. This memory will have 6 streams as opposed to the 1 stream on the T3D. These high speed registers perform like the single stream on the T3D accessible with the rdahead flag on the loader. Their prefetch capability is shown to give better performance than an off chip cache in simulation.

The torus interconnect is more efficient and consistent than that on the T3D. It has an adaptive capability to deviate from the usual path between specific PEs to avoid contention at an intermediate PE.

With 4 PE's per board (or module) there will be 1 SCX channel to control I/O. (SCX is CRI's I/O technology beyond hippi, maybe more details in future newsletters.)

Hardware on Display

There was a T3E chassis on the floor in the commons area at the Alaska CUG. On the last day of CUG there were a table with the CPU board, memory daughter cards, power supply and cooling elements on display next to the chassis.

Announcement from the Pharoh: a T3D Optimization Conference

This message is addressed to those working on application development on the CRI T3D computer system. Please forward it freely to anyone who might be interested at your site.

              Pittsburgh Supercomputing Center (PSC),
                    Ohio Supercomputing Center (OSC), and
           Arctic Region Supercomputing Center (ARSC)

                 (a MetaCenter Regional Alliance)

                       ----together with---- 

                     CRAY Research, Inc. (CRI) 

                     are pleased to announce a 

  ***************************************************************
  *                                                             *
  *  Meeting on the Optimization of Codes for CRAY MPP Systems  *
  *                     January 24-26, 1996                     *
  *                  Pittsburgh,  Pennsylvania                  *
  *                                                             *
  ***************************************************************

  The purpose of this meeting is to bring together developers of
  T3D code and promote discussions of their experiences on the
  T3D, enabling them to further optimize the performance of their
  codes on the T3D and T3E. Selected presenters will deliver
  brief talks describing T3D projects, implementation design
  decisions, optimization strategies, resulting code performance,
  and any circumstances inhibiting further optimization of the
  code. 

  In addition to the talks, there will be opportunities for
  formal and informal discussions among the participants. For
  those participants interested in collaborating on code
  development or testing their newly acquired ideas, our
  state-of-the-art training facility will be available for the
  duration of the meetings.

  Registrations are currently being accepted from those
  interested in presenting and/or attending the meeting. 

      ----- Application Deadline: October 16, 1995 -----

  More details can be found by opening http://www.psc.edu/ and
  following the "Hotlist" link.

  ===============================================================

  REGISTRATION INFORMATION:

  The registration fee for this 3-day meeting is $75, which
  includes breakfast and lunch for the 3 days and the cost of
  handout materials.

  Housing and travel are the responsibility of participants, but
  we will assist you in making reservations. Group rates on local
  hotel accommodations are available on a first-come,
  first-served basis.

  If you are interested in being a presenter and/or attending the
  meeting, please return your completed registration form by
  October 16, 1995 to: 

    Workshop Application Committee
    ATTN: Anne Marie Zellner
    Pittsburgh Supercomputing Center
    4400 Fifth Avenue
    Pittsburgh, PA, 15213 

  You may also apply by sending the requested information via
  electronic mail to workshop@psc.edu or via fax (412/268-5832).
  Specific questions should be directed to Cathy Milligan,
  Pittsburgh Supercomputing Center's Education and Training
  Coordinator, at milligan@psc.edu (412/268-8263). 

  ===============================================================

  REGISTRATION FORM:
  
  Name: 
  
  Department: 
  
  Univ/Ind/Gov Affiliation:
  
  Address:
  
  Telephone:
  
  Electronic Mail Address:
  
  Social Security Number:                                        

  Citizenship:

  Academic Status (please mark one):

     F - Faculty
    PD - Postdoctorate
    GS - Graduate Student
    UG - Undergraduate
    UR - University Research Staff
    UN - University Non-Research Staff
    GV - Government
     I - Industrial
     O - Other

  Would you like assistance with arranging hotel accommodations
  (yes/no)?

  Briefly describe your computing background and research
  interests.

  Are you interested in being a presenter and attending the
  meeting, or just attending the meetings?                

    ___ presenter                
    ___ just attending

  If you are interested in being a presenter at this meeting,
  please submit a brief abstract of your talk describing your
  T3D project and your involvement in it. Be sure to include
  details on the project's goals, implementation design
  decisions, optimization strategies, performance, and any
  circumstances inhibiting further optimization of the code.
  Please note that if there are more offers of presentations
  than can be accommodated, the organizers will select those
  that they judge have interest to the widest audience.

List of Differences Between T3D and Y-MP

The current list of differences between the T3D and the Y-MP is:
  1. Data type sizes are not the same (Newsletter #5)
  2. Uninitialized variables are different (Newsletter #6)
  3. The effect of the -a static compiler switch (Newsletter #7)
  4. There is no GETENV on the T3D (Newsletter #8)
  5. Missing routine SMACH on T3D (Newsletter #9)
  6. Different Arithmetics (Newsletter #9)
  7. Different clock granularities for gettimeofday (Newsletter #11)
  8. Restrictions on record length for direct I/O files (Newsletter #19)
  9. Implied DO loop is not "vectorized" on the T3D (Newsletter #20)
  10. Missing Linpack and Eispack routines in libsci (Newsletter #25)
  11. F90 manual for Y-MP, no manual for T3D (Newsletter #31)
  12. RANF() and its manpage differ between machines (Newsletter #37)
  13. CRAY2IEG is available only on the Y-MP (Newsletter #40)
  14. Missing sort routines on the T3D (Newsletter #41)
  15. Missing compiler allocation flags (Newsletter #52)
  16. Missing compiler listing flags (Newsletter #53)
I encourage users to e-mail in differences that they have found, so we all can benefit from each other's experience.
Current Editors:
Ed Kornkven ARSC HPC Specialist ph: 907-450-8669
Kate Hedstrom ARSC Oceanographic Specialist ph: 907-450-8678
Arctic Region Supercomputing Center
University of Alaska Fairbanks
PO Box 756020
Fairbanks AK 99775-6020
E-mail Subscriptions: Archives:
    Back issues of the ASCII e-mail edition of the ARSC T3D/T3E/HPC Users' Newsletter are available by request. Please contact the editors.
Back to Top