ARSC T3D Users' Newsletter 65, December 15, 1995

NQS - T3D Miscommunication

At ARSC, we recently had an interesting situation where PEs were available but couldn't be used. At that time, the T3D mppview display looked something like this:

  \ .        .        .        .        smith    smith    smith    smith   \
   \ .        .        .        .        smith    smith    smith    smith   \
  \ .        .        .        .        smith    smith    smith    smith   \
   \ .        .        .        .        smith    smith    smith    smith   \
  \ ess      ess      ess      ess      .        .        .        .       \
   \ ess      ess      .        .        .        .        .        .       \
  \ ess      ess      ess      ess      .        .        .        .       \
   \ ess      ess      .        .        .        .        .        .       \
A user now submitted a 64 PE job through the NQS queues, but his job did not run. From NQS's point of view (as shown with the qstat -a command), the 64 PE job was running! But for the T3D, the job could not run because the two idle blocks of 32 PEs were not 'torus contiguous' (I just made up this word).

NQS seems only to keep track of how many PEs are in use and not whether they are arranged in blocks that can be used on the T3D. This miscommunication between NQS and the T3D is a known problem but there's not much that can be done. What made this case even worse was that the 64 PE job not only could not run but was blocking smaller 2 and 8 PE NQS jobs from running also.

At ARSC, I try to watch out for these situations but there is no operator watching the T3D and there's not much an operator could do anyway. If you feel that the T3D is in this situation please call Mike Ess at 907-474-5405 or send e-mail and I'll see what I can do. In the above case, if either ess or smith stopped their jobs then the 64 PE job could find a 'torus contiguous' block of PEs and everything would be OK.


The following article describes the recent announcement between the Portland Group and CRI at Supercomputer '95 and appeared in HPCwire:

  > Subject: 7573 Cray Research Chooses The Portland Group's HPF Product 12.06.95
  > Cray Research Chooses The Portland Group's HPF Product               12.06.95
  > NEWS BRIEFS                                                          LIVEwire
  > =============================================================================
  >   San Diego, Calif. -- Cray Research and The Portland Group Inc. (PGI)
  > announced at Supercomputing '95 that they have signed a letter of intent for
  > Cray to offer PGI's pghpf High Performance Fortran (HPF) compiler on all of
  > its computing systems, including the recently announced CRAY T3E(TM) scalable
  > parallel system.
  >   "We are looking forward to offering our customers and prospects an HPF
  > product," said Mike Booth, vice president of Cray's software division. "Our
  > corporate strategy is to continue to leverage leading technologies from other
  > companies, while applying Cray core competencies in parallel computing. We
  > were seeing interest from our customers in HPF and conducted a very thorough
  > technical evaluation. PGI's product clearly emerged as the HPF of choice and
  > we believe that this product can provide ease-of-use and robustness to our
  > users wanting HPF."
  >   HPF extends ISO/ANSI Fortran 90 to support implicit data parallel
  > programming. It provides all the power of Fortran 90, including array syntax,
  > array intrinsics, and dynamic storage allocation. In addition, HPF directives
  > support the distribution of data among processors, the alignment of data
  > objects to one another, and assertion of the independence of parallel loop
  > iterations. HPF is the de facto standard for implicit parallel programming
  > for shared- and distributed-memory systems.
  >   PGI's pghpf product allows users to run applications unchanged on Cray
  > systems ranging from the CRAY J90(TM) low-cost compact supercomputer to the
  > company's high-end CRAY T90 system and the company's scalable parallel
  > systems, the current CRAY T3D and the new CRAY T3E systems. The pghpf
  > compiler has already proven effective on several large applications in the
  > areas of fluid flow, wave simulation, particle simulation, and 3D reservoir
  > modeling.
  >   "This is a real milestone for HPF as well as PGI," said Douglas Miles,
  > director of marketing at PGI. "We are pleased to see Cray Research offer HPF
  > on its systems and that our product, after rigorous analysis, emerged as the
  > leader for Cray. We look forward to seeing our product move into Cray's
  > customer base and to serving the High Performance Fortran needs of these
  > users. The fact that Cray, a leader in high-performance parallel computing,
  > has selected pghpf is a strong vote of confidence in our HPF technology."
  >   The PGI HPF compiler is currently available through PGI on the CRAY CS6400
  > symmetric multi-processing server and the CRAY T3D system. Users of pghpf on
  > the CRAY T3D system can port and develop HPF applications that will run
  > unchanged on the soon-to-be-available CRAY T3E. Cray said it expects to
  > directly offer pghpf on these systems and on Cray's parallel vector
  > supercomputers beginning in early 1996. pghpf will be available on the CRAY
  > T3E when volume shipments begin next year.
  >   PGI is offering product demonstrations of pghpf and related HPF program
  > development tools in its booth (#911).
  > HPCwire has released all copyright restrictions for this item. Please feel
  > free to distribute this article to your friends and colleagues. For a free
  > trial subscription, send e-mail to

Using the PGI HPF Compiler at ARSC

Since May of this year, I have been working with the PGI HPF compiler for the T3D. Below is a description of how to access the 2.0 version of this compiler on Denali. PGI has given ARSC permission to use an evaluation copy and PGI and I would like to hear about your experiences.

accessing the PGI HPF compilers

  1. Add these lines to the end of your .cshrc file:
      setenv PGI /tmp/ess/pgi
      setenv MANPATH "$MANPATH":/tmp/ess/pgi/man
      setenv PATH "$PATH":/tmp/ess/pgi/t3d/bin
      setenv LM_LICENSE_FILE /tmp/ess/pgi/license.dat
  2. A typical makefile for using pghpf might be:
      F77 = /mpp/bin/cf77
      HPF = pghpf
      .SUFFIXES: .o .hpf .f
              $(HPF) -c -Minfo -Mautopar -Mkeepftn $<
              $(F77) -c $<
      smooth:        smooth.o second.o
              /mpp/bin/mppldr -o smooth smooth.o second.o
              (export MPP_NPES=1 ; smooth )
      hpf:        hpf.o second.o
              pghpf -o hpf hpf.o second.o
              (export MPP_NPES=8 ; hpf )
              -rm -r *.o smooth hpf.f hpf mppcore
    The command 'make hpf' produces:
      12, 2 FORALLs generated
      39, 1 FORALL  generated
      54, 1 FORALL  generated
      70, SUM reduction generated
          /mpp/bin/cf77 -c second.f
          pghpf -o hpf hpf.o second.o
            (export MPP_NPES=8 ; hpf )
      221  0.14845792E-07  0.000070  0.295780  0.339684
    Several compiler switches are immediately useful:
      -Minfo    gives a summary of what transformations were done like the 
                FORALLs generated
      -Mautopar tells pghpf to generate code for multiple processors
      -Mkeepftn has pghpf generate the parallelized Fortran 77 program that
                the user can examine
  3. The program smooth.f (see below) is the same program discussed in the Fall CUG article "An Evaluation of High Performance Fortran Products on the Cray-T3D." Besides being an HPF compiler, pghpf, can be used to parallelize Fortran 77 programs with minimal effort. A diff between smooth.f and hpf.hpf shows:
    So these minimal changes are:
    1. describe the number of processors
    2. specify which arrays are to be distributed
    3. specify which DO loops have independent iterations

    Table 1

    times (seconds) for relaxation example on t3d on each 3 phases
               initialization  relaxation  residual
      smooth.f       0.000571   1.343860   0.448627
      hpf.hpf, 1PE   0.000494   1.456697   1.002058
      hpf.hpf, 2PE   0.000283   0.783687   0.600970
      hpf.hpf, 4PE   0.000155   0.454907   0.417283
      hpf.hpf, 8PE   0.000070   0.296023   0.338785
  4. Documentation

    There is a lot of documentation that goes with the pghpf compiler, potential users should check out:

      man pghpf 
    and there is a host of manpages for individual HPF functions:
      ls /tmp/ess/pgi/man/man3
      all_prefix.3f         iall_prefix.3f       minval_prefix.3f
      all_scatter.3f        iall_scatter.3f      minval_scatter.3f
      all_suffix.3f         iall_suffix.3f       minval_suffix.3f
      any_prefix.3f         iany.3f              number_of_processors.3f
      any_scatter.3f        iany_prefix.3f       parity.3f
      any_suffix.3f         iany_scatter.3f      parity_prefix.3f
      copy_prefix.3f        iany_suffix.3f       parity_scatter.3f
      copy_scatter.3f       ilen.3f              parity_suffix.3f
      copy_suffix.3f        iparity.3f           popcnt.3f
      count_prefix.3f       iparity_prefix.3f    poppar.3f
      count_scatter.3f      iparity_scatter.3f   processors_shape.3f
      count_suffix.3f       iparity_suffix.3f    product_prefix.3f
      grade_down.3f         leadz.3f             product_scatter.3f
      grade_up.3f           maxloc.3f            product_suffix.3f
      hpf_alignment.3f      maxval_prefix.3f     sum_prefix.3f
      hpf_distribution.3f   maxval_scatter.3f    sum_scatter.3f
      hpf_template.3f       maxval_suffix.3f     sum_suffix.3f
      iall.3f               minloc.3f
    In the directory /tmp/ess/pgi/doc/hpf/html there is a whole collection of manuals and manpages available in html format:
      drwxr-xr-x   2 ess   uaf   4096 Dec 11 16:01 faq
      drwxr-xr-x   2 ess   uaf   4096 Dec 11 16:01 man1
      drwxr-xr-x   2 ess   uaf   4096 Dec 11 16:01 man3
      -r--r--r--   1 ess   uaf    964 Dec 11 16:01 pghpf.index.html
      drwxr-xr-x   2 ess   uaf   4096 Dec 11 16:01 ref_manual
      drwxr-xr-x   2 ess   uaf   4096 Dec 11 16:01 release_notes
      drwxr-xr-x   2 ess   uaf   4096 Dec 11 16:01 users_guideA
    Be careful with these html documents as they can be big:
      faq                5 pages
      release notes     33 pages
      users_guide      137 pages
      reference manual 263 pages
    If you have any questions about using pghpf on Denali, please e-mail me your questions and I'll get an answer.
source code for the program smooth.f

          parameter( M = 100, N = 100, MAXTIME = 1000 )
          real p0( M, N ), p1( M, N )
          real t1, second, tset, tupdate, terror, totupdate, toterror
          real slamch, error
          integer i, j, k
  c        set initial conditions domain and spike
          t1 = second( )
          do 20 j = 1, N
             do 10 i = 1, M
                p0( i, j ) = 0.0
                p1( i, j ) = 0.0
    10           continue
    20        continue
          tset = second( ) - t1
          p0( M / 2, N / 2 ) = 1.0
  c        time step loop
          totupdate = 0.0
          toterror  = 0.0
          do 100 k = 1, MAXTIME
  c        smoothing stencil
          if( k/2*2 .ne. k ) then
             t1 = second()
  c odd time step
             do 40 j = 2, N-1
                do 30 i = 2, M-1
                   p1(i,j) = ( p0(i+1,j)+p0(i-1,j)+p0(i,j+1)+p0(i,j-1)
       &                   + 4.0 * p0(i,j) ) / 8.0
    30          continue
    40           continue
             tupdate = second() - t1
             t1 = second()
  c even time step
             do 60 j = 2, N-1
                do 50 i = 2, M-1
                   p0(i,j) = ( p1(i+1,j)+p1(i-1,j)+p1(i,j+1)+p1(i,j-1)
       &                   + 4.0 * p1(i,j) ) / 8.0 
    50          continue
    60           continue
             tupdate = second() - t1
  c        calculate change
          error = 0.0
          t1 = second()
          do 80 j = 2, N-1
             do 70 i = 2, M-1
                error = error + ( p0( i, j ) - p1( i, j ) ) ** 2
    70           continue
    80        continue
          terror = second( ) - t1
  c        write( 6, 600 ) k, error, tupdate, terror 
  c        stopping criteria
  c        error is less than the square root of the machine epsilon
          totupdate = totupdate + tupdate
          toterror  = toterror  + terror
          if( error .le. sqrt( slamch( 'e' ) ) ) goto 101
   100        continue
   101        continue
          write( 6, 600 ) k, error, tset, totupdate, toterror 
   600        format( i5, e16.8, f10.6, f10.6, f10.6 )

Current Editors:
Ed Kornkven ARSC HPC Specialist ph: 907-450-8669
Kate Hedstrom ARSC Oceanographic Specialist ph: 907-450-8678
Arctic Region Supercomputing Center
University of Alaska Fairbanks
PO Box 756020
Fairbanks AK 99775-6020
E-mail Subscriptions: Archives:
    Back issues of the ASCII e-mail edition of the ARSC T3D/T3E/HPC Users' Newsletter are available by request. Please contact the editors.
Back to Top