ARSC T3E Users' Newsletter 178, Sept 24, 1999

ARSC T3E PGHPF License Upgrade

ARSC is pleased to announce that the PGHPF license on yukon is now upgraded to permit jobs to use 256 processors. Documentation for pghpf 2.4 is available from ARSC's online document server at the address:

To use pghpf on yukon, and read the man pages, remember to load the pghpf module:

  yukon$ module load pghpf

PGI Presentation At ARSC

This morning, Doug Miles of PGI gave an overview of how HPF programs should be developed and what a user might need to consider when writing or seeking to optimize a HPF program. The scaling and portability of HPF was also shown in that HPF is installed and in use on systems ranging from workstation clusters to the ASCI systems.

Doug stressed several times that programs still need to be parallel and that the parallelism requires careful expression if the compiler is to generate good, i.e., high performance, code. He showed various user examples one of which quoted an application which achieved 200 Mflops per processor on 128 processors of a T3E.

Here's the content of a few slides extracted from the talk. Copies of the overheads can be obtained from Guy Robinson on request. Important PGHPF Command Line Options

  -O2           Common communication elimination, overlap optimizations
  -Mautopar     Auto-parallelize DO loops operating on distributed data
  -Moverlap     Control size of overlap (ghost) regions on distributed 
  -Mhpf2        Compile assuming HPF 2.0 semantics
  -Mf90         Compile for serial F90 execution
  -Mfreeform    Assume Fortran 90 freeform source
  -Minfo        Emit compile-time optimization/parallelization messages
  -Msmp         Use native shared-memory one-sided communications
  -Mprof        Enable function or line-level profiling
  -Mstats       Print comms/cpu/memory statistics at completion of run
  -Mkeepftn     Keep intermediate Fortran output file after HPF 
  -W0,<option>  Pass <option> to underlying Fortran compiler
PGHPF Tuning -- Key Elements
  • Always compile with -Minfo and pay close attention to the messages that are emitted
  • Learn to use the PGPROF profiler - it will quickly point you in the right direction
  • Don't be afraid to look at the intermediate source code generated by PGHPF - its not that ugly!
  • Have reasonable expectations - PGHPF is consistent and good at bookkeeping but is not always smart, I.e. equivalent code sequences don't always perform equally well
When -Msmp (should) Help
  • Enables Parallelization of a loop that would otherwise be scalarized - basically anytime it can eliminate pghpf_get_scalar calls
  • Amount of data per node is very small - I.e. in general -Msmp-compiled codes will scale better than those not compiled -Msmp
  • Allows overlap of communication and computation by eliminating synch points
  • Always try it and see - helps a lot in some cases but can sometimes cause slowdowns
Runtime Library Overhead
  • Takes several thousand cycles just to get in/out of a PGHPF communication runtime library routine
  • Beware any time you see a PGHPF runtime library call in a loop (even if its not innermost)
  • Despite the overhead, aggregate communication runtime routines are generally more efficient than inline communications (-Msmp) if there's enough data
Avoid if Possible
  • INHERIT or Transcriptive distributions- PGHPF will generate code that is general case, I.e. which works on cyclic(k)-distributed data; very inefficient.
  • CYCLIC(K) and CYCLIC distributions
  • PROCESSORS directive- Generally no reason to use it, PGHPF good at picking optimal mappings; using PROCESSORS often restricts how many CPUs can be used by the generated executable
  • DYNAMIC, REDISTRIBUTE, REALIGN- PGHPF will generate code that is general case, I.e. which works on cyclic(k)-distributed data; very inefficient.
Use if Possible:
  • BLOCK or GEN_BLOCK distributions- Distribute in as few dimensions as possible
  • INDEPENDENT rather than -Mautopar DO loops- INDEPENDENT loops generally use much less memory; -Mautopar DOs converted to equivalent FORALLs.
  • HPF_LIBRARY routines- Communication intensive parallel operations are generally more efficient if an equivalent HPF_LIBRARY routine exists
  • INTERFACE blocks- Allows usage of -Mhpf2 switch and generally provides PGHPF with info that enables optimizations
Other Notes:
  • Replicate lower-dimension data structures that are largely read-only
  • PGHPF automatically determines NEW variables on INDEPENDENT loops- Very convenient, but leads to non-conforming HPF codes
  • RANDOM_NUMBER is well-implemented- Executes in parallel, scales well, and has very good statistical properties
  • MATMUL and TRANSPOSE- Not optimized; often more efficient to do these operations using inline code rather than calling the intrinsics
Using the PGPROF Profiler
  • Compile/link with -Mprof=func, execute the program as usual, invoke pgprof on the resulting pgprof.out file
  • Look for routines that take the most time; if necessary recompile/relink the most expensive functions with -Mprof=lines and re-execute (NOTE: -Mprof=lines can be very intrusive)
  • Study line profiles for individual calls or statements that are inefficient

Final Program for CUG T3E meeting

The final program for the CUG T3E meeting is now available:

Guy Returns to Denali: Report on 50th Arctic Science Meeting

As posted in the last newsletter, the 50th AAAS Arctic Science meeting was held in Denali over the weekend and first few days of this week. A great deal of Arctic science was discussed including climate change, social factors, future funding, economics, wildlife populations etc.

As this was the 50th meeting, each evening there were a number of lectures which looked at how research had changed. Remote measurement, computers and helicopters were cited as being important developments in promoting successful research in the Arctic. At the first meeting, many of the papers presented were purely experimental or about data gathering, today about 75% used computers, either to gather or visualize results or as part of the scientific work itself.

For those of us who spend most of our time inside working with numbers or virtual views of Alaska it is always good to be reminded of the lengths some users have to go to to get data. Long stressful mountain climbs, spending months on floating ice or visiting bear dens to measure the temperature of hibernating bears were described in great detail and often with humor as to some of the adventures had along the way.

Many researchers felt Arctic research was healthy, indeed was growing rapidly, but care was needed not to over-publish results. While all the remote data gathering described above provides much information, longer- term change is harder to see and requires careful interpretation. Many papers presented results looking at the last century of change with both results from the modern era of data gathering and some of the first scientific trips to Alaska. One session discussed how to go back further and extract good scientific data from the records of previous visitors and the native residents.

More in the future as the results of this meeting become available.

Quick-Tip Q & A

A:{{ The "ls -lc" command shows modification time and sorts by 
  {{ modification time. The "ls -lu" command also shows modification 
  {{ time but sorts by access time.
  {{ How can I show file access time?

  Under IRIX (and many other Unixes), but NOT under UNICOS, you can use
  the "stat" command to dump formatted inode information:

    onyx5$ stat myfile
        inode 14682780; dev 1048585; links 1; size 2662
        regular; mode is rw-------; uid 1235 (baring); gid 4 (mail)
        projid 0        st_fstype: nfs3
        change time - Fri Sep 10 02:53:05 1999 <936960785>
        access time - Mon Jul 26 08:57:57 1999 <933008277>
        modify time - Mon Jul 26 08:57:57 1999 <933008277>

  Here's a moderately satisfying solution for UNICOS users:

    Use perl's "-A" operator which returns, not access time, but time
    since last access, in days.  Here's a command to print the age of the
    file, "myfile".

      perl -e 'printf "%f", -A "myfile"'

    Korn shell users can add the following function to their .profile
    file and execute it by typing, for instance, "age myfile".

      function age {
        perl -e "printf \"$1: age in days: %f\n\", -A \"$1\"" 

    Here's an expert version (thanks, Dale):

function age
  perl -e "printf <<EOS
contents last accessed %f days ago.
contents last modified %f days ago.
inode    last modified %f days ago.
  ,'-' x length \"$1\"
  ,-A \"$1\"
  ,-M \"$1\"
  ,-C \"$1\"

    (I don't know how a C-Shell user would implement these functions,
    but if you send me a solution, I'll print it.)

Q: I have about 150 files in a directory and need to "rm" about 125 of
   them.  No combination of wild-card characters will select the 
   "to delete" files without including some of the "to retain" files.

   How would you approach this task?

[ Answers, questions, and tips graciously accepted. ]

Current Editors:
Ed Kornkven ARSC HPC Specialist ph: 907-450-8669
Kate Hedstrom ARSC Oceanographic Specialist ph: 907-450-8678
Arctic Region Supercomputing Center
University of Alaska Fairbanks
PO Box 756020
Fairbanks AK 99775-6020
E-mail Subscriptions: Archives:
    Back issues of the ASCII e-mail edition of the ARSC T3D/T3E/HPC Users' Newsletter are available by request. Please contact the editors.
Back to Top