ARSC T3D Users' Newsletter 101, August 23, 1996

ARSC Upgrades to UNICOS MAX 1.3.0.2

Last Tuesday, we upgraded the T3D's MAX operating system from version 1.2.0.5 to version 1.3.0.2. This upgrade should be transparent, but you should recompile and relink all of your your T3D executables as it affects include and library files as well as kernel routines and the user environment.

Here are the release contents for both MAX 1.3.0.0 and 1.3.0.2. In each case, point #1 is the most important.


 
 Release contents
 
 ----------------
 
 The UNICOS MAX 1.3.0.0 release includes the following changes:
 
 
 
 1) Added a series of fixes designed to enhance system stability
 
 2) Added support for preallocation of the roll file
 
 3) Added binary executables for SAM, mppview, and URM
 
 4) Added support for Phase III I/O
 
 
 
 
 
 Release contents
 
 ----------------
 
 The UNICOS MAX 1.3.0.2 release includes the following changes:
 
 
 
 1) Added a series of fixes designed to enhance system stability
 
 2) Added some improvements to the XDR routines, primarily to improve
 
    the performance by converting numbers in large blocks.  This allows
 
    the conversion to vectorize on PVP systems and to execute in a small
 
    (icache) loop on MPP systems.
 
 

Use f90 for Loopmark Listings of T3D Codes

If you use the "-rm" flag, CRI's f90 compiler will create a listing file with loops marked and optimizations explained. It can provide this for either T3D or Y-MP compilations. This is a big improvement over cf77, which only does "loopmark listing" of Y-MP compiles.

It's nice to know how a compiler alters your code when it optimizes it. Some optimizations reduce precision. Others, if you mislead the compiler (for instance, telling the Y-MP compiler to ignore vector dependencies when it shouldn't) can lead to incorrect results.

In Newsletter #99, I gave a program which timed the following loop:


ccc
      parameter (N=1000000)
      a = K ! A constant
      x = a

      do i=1,N
        x = x * a
      enddo
ccc

When I compiled it for the T3D and Y-MP by setting the TARGET environment variable accordingly and then using the cf77 commands:


   T3D:   "cf77 prog.f -o t3d.exe"
   Y-MP:  "cf77 prog.f -o ymp.exe"

I was surprised by the timings:


   T3D:   200,000 mflop/s
   Y-MP:  150 mflop/s

It was easy to find out what the Y-MP compiler had done, as a recompile with the cf77 flag, -Wf"-em":


   Y-MP:  "cf77 -Wf"-em" prog.f -o ymp.exe"

produced a "loopmark listing" which showed that the loop had vectorized. Good enough.

I assumed that the T3D compiler had actually eliminated the loop, but as "loopmark listing" is not available under cf77 for T3D codes, I didn't know how to prove it. Eventually, I discovered that I could recompile with the -Wf"-cm" flag:


   T3D:   "cf77 -Wf"-cm" prog.f -o t3d.exe"

which produced a CIF (Compiler Information File). CIF's contain human unreadable data, but in the CIF manual, I found a C program which extracts "compiler messages" from CIFs. I copied, compiled, and ran this C program on my CIF to get the following information:


    "message at line 22: A loop was eliminated by optimization."

This was moderately satisfying, at best. As far as I know, it's the most information on T3D optimizations you can get from the cf77 compiling system (if anyone knows a better way, let me know, and I'll pass it on).

The solution I found was to use f90.

In f90, you can compile with the same flags for either T3D or Y-MP to get various human readable listing files. For instance:


   T3D:    "f90 -rm loops.f -o loops"
   Y-MP:   "f90 -rm loops.f -o loops"

will produce a listing similar to cf77's Y-MP loopmark listing. To provide a T3D vs Y-MP example, I used these compile commands on the following code:


cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc
       program loops
       implicit none
       integer N
       parameter (N=1000)

       integer i
       real slamch
       real xarr(N), yarr(n), zarr(n), a, x, eps 

       eps = slamch('E') 

       a = 1.0 - eps
       x = a
       do i=1,N
         x = x * a
       enddo
       print*, "(1.0 - eps) ^ ", N, " = ", x

       do i=1,N
         xarr = i * eps
         yarr = i + eps
         zarr = eps
       enddo
   
       call dummy (xarr, yarr, zarr)

       do i=1,N
         yarr(i) = xarr(i) * i 
       enddo
       
       call dummy (xarr, yarr, zarr)

       end
ccc
       subroutine dummy (x,y,z)
       real x,y,z
        
       end
cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc

Here's an excerpt from the resulting Y-MP listing file (automatically named "loops.l"):


 

 
     12                         a = 1.0 - eps
 
     13                         x = a
 
     14     1 --------          do i=1,N
 
     15     1                     x = x * a
 
     16     1 ------->          enddo
 
     17                         print*, "(1.0 - eps) ^ ", N, " = ", x
 
     18                  
 
     19     1 --------          do i=1,N
 
     20    VecArrOps              xarr = i * eps
 
     21    ArrayOps               yarr = i + eps
 
     22    ArrayOps             zarr = eps
 
     23     1 ------->          enddo
 
     24                   
 
     25                         call dummy (xarr, yarr, zarr)
 
     26                  
 
     27     v --------          do i=1,N
 
     28     v                   yarr(i) = xarr(i) * i
 
     29     v ------->          enddo
 
     30                   
 

 
 f90 Compiler - 6 messages:
 
   1) <f90-6002,Scalar> A loop starting at line 14 was eliminated 
 
                        by optimization.
 
   2) <f90-6204,Vector> A loop starting at line 20 was vectorized.
 
   3) <f90-6009,Scalar> A floating point expression involving an 
 
                        induction variable was strength reduced
 
                        by optimization.  This may cause numerical 
 
                        differences.
 
   4) <f90-6004,Scalar> A loop starting at line 21 was fused with 
 
                        the loop starting at line 20.
 
   5) <f90-6004,Scalar> A loop starting at line 22 was fused with 
 
                        the loop starting at line 20.
 
   6) <f90-6204,Vector> A loop starting at line 27 was vectorized.

Running the "explain" command on any of these messages provides even more help (but the error codes should start with "cf90", not "f90"). For instance:


 

 
 denali$ explain cf90-6204
 
  Vector code was generated for the loop.  The compiler vectorizes
 
  a loop when it can be determined that the meaning of the loop
 
  will not change by doing so.  However, the order of expression
 
  evaluation may change, and results may differ.  Generally, the
 
  vector version of a loop executes much faster than the scalar
 
  version.

Here's the excerpt from the T3D listing:


 

 
     12                         a = 1.0 - eps
 
     13                         x = a
 
     14     1 --------          do i=1,N
 
     15     1                     x = x * a
 
     16     1 -------->          enddo
 
     17                         print*, "(1.0 - eps) ^ ", N, " = ", x
 
     18                  
 
     19     1 --------          do i=1,N
 
     20    ArrayOps               xarr = i * eps
 
     21    ArrayOps               yarr = i + eps
 
     22    ArrayOps             zarr = eps
 
     23     1 ------->          enddo
 
     24                   
 
     25                         call dummy (xarr, yarr, zarr)
 
     26                  
 
     27     1 --------          do i=1,N
 
     28     1                   yarr(i) = xarr(i) * i
 
     29     1 ------->          enddo
 
     30                   
 

 
 f90 Compiler - 4 messages:
 
   1) <f90-6002,Scalar> A loop starting at line 14 was eliminated by 
 
                        optimization.
 
   2) <f90-6009,Scalar> A floating point expression involving an 
 
                        induction variable was strength reduced
 
                        by optimization.  This may cause numerical 
 
                        differences.
 
   3) <f90-6004,Scalar> A loop starting at line 21 was fused with 
 
                        the loop starting at line 20.
 
   4) <f90-6004,Scalar> A loop starting at line 22 was fused with 
 
                        the loop starting at line 20.

This consistent behavior across platforms is really nice. A good reason to use f90 instead of cf77.

The 'mppfixpe' Command and Plastic Executables

The UNICOS command, "mppfixpe," will convert a plastic to a fixed executable. In other words, it will convert an executable which can use a variable number of PEs (to be determined at run-time) to one which must always use the same number of PEs (as specified in arguments to the command).

This may not seem like the most useful command (why would one want to sacrifice flexibility?), but there are good reasons to use fixed executables. For instance, we had visitors working on-site last week who got a 2:1 speedup in the load time of a program when they switched from plastic to fixed. This was a boon because they wanted to do multiple, short test runs, and the load time had become a major percentage of the total time spent on each run.

They used 'mppldr -X $(NPES) ...' to re-link the program with a fixed number of pes. However, had they no longer had access to the source or object files, mppldr would not have worked, and they could have used mppfixpe.

For a thorough discussion of plastic and fixed executables, see Newsletter #44 . Here, however, is a quick comparison:

  • Advantages of fixed executables:
    • mppldr not called on each run
    • smaller file size
  • Advantages of plastic executables:
    • number of PEs is flexible -- determined at runtime
    • can usually be converted to fixed, whenever desired, using mppfixpe

This is from CRI's man page:



 

 NAME

   mppfixpe - Reconfigures a CRAY T3D absolute for a different 

              number of PEs

 

 SYNOPSIS

   mppfixpe -o newname -X npes [-M opts] [-V] oldname

 

 DESCRIPTION

   The mppfixpe utility reads an existing CRAY T3D absolute (plastic

   a.out file) and, if possible, changes it so that it will execute using

   a different number of processing elements (PEs).

 

   A plastic a.out file refers to an a.out file on a CRAY T3D system that

   has been created without using either compiler or loader directives to

   specify (or fix) the number of processing elements.  This lets you

   specify the at execution time the number of PEs.  For example:

 

     /mpp/bin/cft77 t.f

     /mpp/bin/mppldr t.o

     a.out npes 128

 

   If you fix the number of PEs on either the cf77 or the mppldr command

   line, the resulting a.out file no longer is considered to be plastic,

   and you cannot specify the number of PEs to use at run time.

 

   A plastic a.out file is assumed to have been targeted for 0 PEs.

 

   The mppfixpe utility accepts the following options:

 

   -o newname    Specifies the path name where the new absolute is to be

                 stored.

 

   -X npes       Specifies the number of PEs for which the new absolute

                 is to be configured.

 

   -M opts       Requests that the loader produce a map of the new

                 absolute.  The opts values are those known to mppldr(1).

 

   -V            Causes the mppfixpe utility to write its version

                 identification to stderr.

 

   oldname       Specifies the path name of the existing CRAY T3D

                 absolute.

 

 NOTES

   The mppldr and mppfixpe utilities assume that fairly ordinary things

   are being done.  However, if you are changing the loader's CALLXFER

   directive, things may not work the way you want.

 

Quick-Tip Q & A

Q: How can you delete a file named "-i" ???
(You would create it if, for instance, you accidentally typed "cp txt -i" instead of "cp -i txt txt2".)
A: ???
(Sorry... not till next week...)

[ Answers, questions, and tips graciously accepted. ]


Current Editors:
Ed Kornkven ARSC HPC Specialist ph: 907-450-8669
Kate Hedstrom ARSC Oceanographic Specialist ph: 907-450-8678
Arctic Region Supercomputing Center
University of Alaska Fairbanks
PO Box 756020
Fairbanks AK 99775-6020
E-mail Subscriptions: Archives:
    Back issues of the ASCII e-mail edition of the ARSC T3D/T3E/HPC Users' Newsletter are available by request. Please contact the editors.
Back to Top