[Menu Bar] Resourses at ARSC Science at ARSC Newsroom Support About ARSC ARSC Home

 

ARSC T3D Users' Newsletter 101, August 23, 1996

Newsletter Index Quick-Tip Index Search Newsletters

ARSC Upgrades to UNICOS MAX 1.3.0.2

Last Tuesday, we upgraded the T3D's MAX operating system from version 1.2.0.5 to version 1.3.0.2. This upgrade should be transparent, but you should recompile and relink all of your your T3D executables as it affects include and library files as well as kernel routines and the user environment.

Here are the release contents for both MAX 1.3.0.0 and 1.3.0.2. In each case, point #1 is the most important.

 | Release contents
 | ----------------
 | The UNICOS MAX 1.3.0.0 release includes the following changes:
 | 
 | 1) Added a series of fixes designed to enhance system stability
 | 2) Added support for preallocation of the roll file
 | 3) Added binary executables for SAM, mppview, and URM
 | 4) Added support for Phase III I/O
 | 
 | 
 | Release contents
 | ----------------
 | The UNICOS MAX 1.3.0.2 release includes the following changes:
 | 
 | 1) Added a series of fixes designed to enhance system stability
 | 2) Added some improvements to the XDR routines, primarily to improve
 |    the performance by converting numbers in large blocks.  This allows
 |    the conversion to vectorize on PVP systems and to execute in a small
 |    (icache) loop on MPP systems.
 | 

Use f90 for Loopmark Listings of T3D Codes

If you use the "-rm" flag, CRI's f90 compiler will create a listing file with loops marked and optimizations explained. It can provide this for either T3D or Y-MP compilations. This is a big improvement over cf77, which only does "loopmark listing" of Y-MP compiles.

It's nice to know how a compiler alters your code when it optimizes it. Some optimizations reduce precision. Others, if you mislead the compiler (for instance, telling the Y-MP compiler to ignore vector dependencies when it shouldn't) can lead to incorrect results.

In Newsletter #99, I gave a program which timed the following loop:

ccc
      parameter (N=1000000)
      a = K ! A constant
      x = a

      do i=1,N
        x = x * a
      enddo
ccc

When I compiled it for the T3D and Y-MP by setting the TARGET environment variable accordingly and then using the cf77 commands:

   T3D:   "cf77 prog.f -o t3d.exe"
   Y-MP:  "cf77 prog.f -o ymp.exe"

I was surprised by the timings:

   T3D:   200,000 mflop/s
   Y-MP:  150 mflop/s

It was easy to find out what the Y-MP compiler had done, as a recompile with the cf77 flag, -Wf"-em":

   Y-MP:  "cf77 -Wf"-em" prog.f -o ymp.exe"

produced a "loopmark listing" which showed that the loop had vectorized. Good enough.

I assumed that the T3D compiler had actually eliminated the loop, but as "loopmark listing" is not available under cf77 for T3D codes, I didn't know how to prove it. Eventually, I discovered that I could recompile with the -Wf"-cm" flag:

   T3D:   "cf77 -Wf"-cm" prog.f -o t3d.exe"

which produced a CIF (Compiler Information File). CIF's contain human unreadable data, but in the CIF manual, I found a C program which extracts "compiler messages" from CIFs. I copied, compiled, and ran this C program on my CIF to get the following information:

    "message at line 22: A loop was eliminated by optimization."

This was moderately satisfying, at best. As far as I know, it's the most information on T3D optimizations you can get from the cf77 compiling system (if anyone knows a better way, let me know, and I'll pass it on).

The solution I found was to use f90.

In f90, you can compile with the same flags for either T3D or Y-MP to get various human readable listing files. For instance:

   T3D:    "f90 -rm loops.f -o loops"
   Y-MP:   "f90 -rm loops.f -o loops"

will produce a listing similar to cf77's Y-MP loopmark listing. To provide a T3D vs Y-MP example, I used these compile commands on the following code:

cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc
       program loops
       implicit none
       integer N
       parameter (N=1000)

       integer i
       real slamch
       real xarr(N), yarr(n), zarr(n), a, x, eps 

       eps = slamch('E') 

       a = 1.0 - eps
       x = a
       do i=1,N
         x = x * a
       enddo
       print*, "(1.0 - eps) ^ ", N, " = ", x

       do i=1,N
         xarr = i * eps
         yarr = i + eps
         zarr = eps
       enddo
   
       call dummy (xarr, yarr, zarr)

       do i=1,N
         yarr(i) = xarr(i) * i 
       enddo
       
       call dummy (xarr, yarr, zarr)

       end
ccc
       subroutine dummy (x,y,z)
       real x,y,z
        
       end
cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc

Here's an excerpt from the resulting Y-MP listing file (automatically named "loops.l"):

 |
 |     12                         a = 1.0 - eps
 |     13                         x = a
 |     14     1 --------          do i=1,N
 |     15     1                     x = x * a
 |     16     1 ------->          enddo
 |     17                         print*, "(1.0 - eps) ^ ", N, " = ", x
 |     18                  
 |     19     1 --------          do i=1,N
 |     20    VecArrOps              xarr = i * eps
 |     21    ArrayOps               yarr = i + eps
 |     22    ArrayOps             zarr = eps
 |     23     1 ------->          enddo
 |     24                   
 |     25                         call dummy (xarr, yarr, zarr)
 |     26                  
 |     27     v --------          do i=1,N
 |     28     v                   yarr(i) = xarr(i) * i
 |     29     v ------->          enddo
 |     30                   
 |
 | f90 Compiler - 6 messages:
 |   1) <f90-6002,Scalar> A loop starting at line 14 was eliminated 
 |                        by optimization.
 |   2) <f90-6204,Vector> A loop starting at line 20 was vectorized.
 |   3) <f90-6009,Scalar> A floating point expression involving an 
 |                        induction variable was strength reduced
 |                        by optimization.  This may cause numerical 
 |                        differences.
 |   4) <f90-6004,Scalar> A loop starting at line 21 was fused with 
 |                        the loop starting at line 20.
 |   5) <f90-6004,Scalar> A loop starting at line 22 was fused with 
 |                        the loop starting at line 20.
 |   6) <f90-6204,Vector> A loop starting at line 27 was vectorized.

Running the "explain" command on any of these messages provides even more help (but the error codes should start with "cf90", not "f90"). For instance:

 |
 | denali$ explain cf90-6204
 |  Vector code was generated for the loop.  The compiler vectorizes
 |  a loop when it can be determined that the meaning of the loop
 |  will not change by doing so.  However, the order of expression
 |  evaluation may change, and results may differ.  Generally, the
 |  vector version of a loop executes much faster than the scalar
 |  version.

Here's the excerpt from the T3D listing:

 |
 |     12                         a = 1.0 - eps
 |     13                         x = a
 |     14     1 --------          do i=1,N
 |     15     1                     x = x * a
 |     16     1 -------->          enddo
 |     17                         print*, "(1.0 - eps) ^ ", N, " = ", x
 |     18                  
 |     19     1 --------          do i=1,N
 |     20    ArrayOps               xarr = i * eps
 |     21    ArrayOps               yarr = i + eps
 |     22    ArrayOps             zarr = eps
 |     23     1 ------->          enddo
 |     24                   
 |     25                         call dummy (xarr, yarr, zarr)
 |     26                  
 |     27     1 --------          do i=1,N
 |     28     1                   yarr(i) = xarr(i) * i
 |     29     1 ------->          enddo
 |     30                   
 |
 | f90 Compiler - 4 messages:
 |   1) <f90-6002,Scalar> A loop starting at line 14 was eliminated by 
 |                        optimization.
 |   2) <f90-6009,Scalar> A floating point expression involving an 
 |                        induction variable was strength reduced
 |                        by optimization.  This may cause numerical 
 |                        differences.
 |   3) <f90-6004,Scalar> A loop starting at line 21 was fused with 
 |                        the loop starting at line 20.
 |   4) <f90-6004,Scalar> A loop starting at line 22 was fused with 
 |                        the loop starting at line 20.

This consistent behavior across platforms is really nice. A good reason to use f90 instead of cf77.

The 'mppfixpe' Command and Plastic Executables

The UNICOS command, "mppfixpe," will convert a plastic to a fixed executable. In other words, it will convert an executable which can use a variable number of PEs (to be determined at run-time) to one which must always use the same number of PEs (as specified in arguments to the command).

This may not seem like the most useful command (why would one want to sacrifice flexibility?), but there are good reasons to use fixed executables. For instance, we had visitors working on-site last week who got a 2:1 speedup in the load time of a program when they switched from plastic to fixed. This was a boon because they wanted to do multiple, short test runs, and the load time had become a major percentage of the total time spent on each run.

They used 'mppldr -X $(NPES) ...' to re-link the program with a fixed number of pes. However, had they no longer had access to the source or object files, mppldr would not have worked, and they could have used mppfixpe.

For a thorough discussion of plastic and fixed executables, see Newsletter #44. Here, however, is a quick comparison:

This is from CRI's man page:

| 
| NAME
|   mppfixpe - Reconfigures a CRAY T3D absolute for a different 
|              number of PEs
| 
| SYNOPSIS
|   mppfixpe -o newname -X npes [-M opts] [-V] oldname
| 
| DESCRIPTION
|   The mppfixpe utility reads an existing CRAY T3D absolute (plastic
|   a.out file) and, if possible, changes it so that it will execute using
|   a different number of processing elements (PEs).
| 
|   A plastic a.out file refers to an a.out file on a CRAY T3D system that
|   has been created without using either compiler or loader directives to
|   specify (or fix) the number of processing elements.  This lets you
|   specify the at execution time the number of PEs.  For example:
| 
|     /mpp/bin/cft77 t.f
|     /mpp/bin/mppldr t.o
|     a.out npes 128
| 
|   If you fix the number of PEs on either the cf77 or the mppldr command
|   line, the resulting a.out file no longer is considered to be plastic,
|   and you cannot specify the number of PEs to use at run time.
| 
|   A plastic a.out file is assumed to have been targeted for 0 PEs.
| 
|   The mppfixpe utility accepts the following options:
| 
|   -o newname    Specifies the path name where the new absolute is to be
|                 stored.
| 
|   -X npes       Specifies the number of PEs for which the new absolute
|                 is to be configured.
| 
|   -M opts       Requests that the loader produce a map of the new
|                 absolute.  The opts values are those known to mppldr(1).
| 
|   -V            Causes the mppfixpe utility to write its version
|                 identification to stderr.
| 
|   oldname       Specifies the path name of the existing CRAY T3D
|                 absolute.
| 
| NOTES
|   The mppldr and mppfixpe utilities assume that fairly ordinary things
|   are being done.  However, if you are changing the loader's CALLXFER
|   directive, things may not work the way you want.
| 

Quick-Tip Q & A

Q: How can you delete a file named "-i" ???
(You would create it if, for instance, you accidentally typed "cp txt -i" instead of "cp -i txt txt2".)
A: ???
(Sorry... not till next week...)

[ Answers, questions, and tips graciously accepted. ]

 


Current Editors:
Donald Bahls ARSC User Consultant ph: 907-450-8674
Ed Kornkven ARSC HPC Specialist ph: 907-450-8669
Arctic Region Supercomputing Center
University of Alaska Fairbanks
PO Box 756020
Fairbanks AK 99775-6020
Contact:
Send comments and questions to the current editors using this Contact Form.
E-mail Subscriptions: Archives:

 

Newsletter Index Quick-Tip Index Search Newsletters

 

Arctic Region Supercomputing Center
PO Box 756020, Fairbanks, AK 99775 | voice: 907-450-8600 | email:

home | search | about | support | news | science | resources