ARSC HPC Users' Newsletter 313, April 8, 2005



ARSC Faculty Camp 2005

Faculty Camp teaches skills ranging from programming and data visualization to collaborative environments. It is a series of seminars and hands-on experiences presented by ARSC staff, UAF/ARSC Joint Faculty, and current ARSC users. "Camp" commences on August 1st, and runs for three weeks.

All ARSC users are invited to apply, but Faculty Camp is geared toward UA affiliated faculty and researchers. Applicants must submit a description of the skills they want to develop and a project (using ARSC resources) they intend to pursue. The application deadline is May 18th. If you're interested, you'll find more information at:

Also, we encourage you to attend the Faculty Camp Informational Meeting (refreshments will be provided!):

Tues. April 26, 1-2pm, West Ridge Research Building (WRRB) room 103


New Training Opportunities

ARSC welcomes Simone Sbaraglia from IBM's Advanced Computing Technology Center (ATCT) to teach several classes on IBM performance analysis tools. All users are encouraged to attend and/or schedule one-on-one time.

  All three classes are scheduled for:
    Time: 9:00 am - Noon
    Location: West Ridge Research building (WRRB), Room 009

    Monday Apr. 18: 
      "HPC Toolkit"

    Tuesday Apr. 19: 
      "Sigma: An Infrastructure for Performance Analysis using 
      Symbolic Specification"

    Wednesday Apr. 20: 
      "Totalview Tutorial"

For complete course descriptions, see:

Simone will also be available to discuss performance analysis and optimization techniques with individual users. For more information on the classes or to schedule a consultation, email Tom Logan or call him at: 907-450-8624.


IBM XLF: Floating Point Exceptions and Traps

The IBM XL Fortran compiler allows floating point exception traps to be enabled using the flag "-qflttrap". When a specified floating point exception is detected, a trap signal (SIGTRAP) will be generated. By default this will cause the program to core dump, however the compiler flag "-qsigtrap" allows this behavior to be altered.

Several different exception handlers are available including:

 * xl__ieee     displays the floating point exception error message, 
      the contents of the floating point registers, and a stack trace to
      stderr then continues execution of the program.  This handler
      supplies the default IEEE result for the exception, so the results
      will be identical to results without "-qflttrap" enabled.

 * xl__trce     displays the floating point exception error message, 
      the contents of the floating point registers, and a stack trace to
      stderr then terminates the program after the first exception
      without dumping core.

 * xl__trcedump acts the same as xl__trce but also produces a core dump.

There are other exception handlers available that require changes to source code to use. The aforementioned trap handlers only require that the code be recompiled with the proper compiler flags.

I'll use the following sample Fortran code to demonstrate these features:

iceberg2 1%  cat float_except.f

! ==================================================================
program test
   implicit none
   real :: val1, val2, val3
   val1 = 0.0
   val2 = 0.0
   val3 = 0.0

   call divide_by_zero(val1)
   print *,"val1=", val1

   call underflow_overflow(val2, 45.00)
   print *,"val2=", val2

   call underflow_overflow(val3, -45.00)
   print *,"val3=", val3

   print *,"program complete"

end program

subroutine divide_by_zero(value)
    real value
    integer ii

    print *, "------ divide_by_zero subroutine:---------"
    value = (value * value) / value

end subroutine

subroutine underflow_overflow(value, power)
    real value
    real power

    print *, "------ underflow_overflow subroutine: ----"
    value = 10 ** power + 1.125

end subroutine
! ==================================================================

The following compile statement enables the detection of overflow, underflow, divide by zero, and invalid floating point exceptions. It also specifies that the trap handler be set to the xl__ieee exception handler. Thus, any floating point exceptions will be echoed to stderr without ending the program execution. The -g and -qfullpath flags allow the line numbers of each call and the full path of the source code to be shown in the traceback.

  iceberg2 2% xlf90_r float_except.f -g -qfullpath \
         -qflttrap=en:ov:und:zero:inv \
         -qsigtrap=xl__ieee -o fe_ieee

If you happen to have more than one source file, you will need to use the flttrap flag to enable the trap for each file. Fortran Libraries do not have support for flttrap.

Running the sample code yields the following:

iceberg2 3% ./fe_ieee
 ------ divide_by_zero subroutine:---------

  Signal received: SIGTRAP - Trace trap
    Signal generated for floating-point exception:
      FP invalid operation

  Instruction that generated the exception:
    fdivs fr01,fr01,fr02
    Source Operand values:
      fr01 =   0.00000000000000e+00
      fr02 =   0.00000000000000e+00

    Offset 0x000000dc in procedure divide_by_zero, near line 27 in file /gpfsu/u1/uaf/username/float_except.f
    Offset 0x00000064 in procedure test, near line 8 in file /gpfsu/u1/uaf/username/float_except.f
    --- End of call chain ---
 val1= NaNQ
 ------ underflow_overflow subroutine: ----

  Signal received: SIGTRAP - Trace trap
    Signal generated for floating-point exception:
      FP overflow

  Instruction that generated the exception:
    frsp fr01,fr01

    Offset 0x000000e0 in procedure underflow_overflow, near line 37 in file /gpfsu/u1/uaf/username/float_except.f
    Offset 0x00000130 in procedure test, near line 11 in file /gpfsu/u1/uaf/username/float_except.f
    --- End of call chain ---
 val2= INF
 ------ underflow_overflow subroutine: ----

  Signal received: SIGTRAP - Trace trap
    Signal generated for floating-point exception:
      FP underflow

  Instruction that generated the exception:
    frsp fr01,fr01

    Offset 0x000000e0 in procedure underflow_overflow, near line 37 in file /gpfsu/u1/uaf/username/float_except.f
    Offset 0x000001fc in procedure test, near line 14 in file /gpfsu/u1/uaf/username/float_except.f
    --- End of call chain ---
 val3= 1.125000000
 program complete

According to the XL Fortran User's Guide, the performance impact of using floating point traps is relatively low. However it's probably best to avoid using floating point traps in debugged, production codes.

While writing this article I discovered that two of the "-qflttrap" suboptions caused programs to stop even when using the xl__ieee exception handler. These were the "imprecise" and "inexact" suboptions.

The User's Guide notes that it may not be possible for the handler to substitute results when an "imprecise" result is detected, which explains the program termination. The "inexact" suboption results in a trap for signals generated at the beginning and end of subroutines only, which would make it impossible for the handler to provide results in all but the most trivial subroutines.


For more information on exception handlers and flttrap options see "man xlf" or the IBM manual:

XL Fortran for AIX: Language Reference (SC09-4947-01) Chapter 7 XL Fortran Floating-Point Processing

This manual is available on iceberg and iceflyer in pdf format, here:



X1: Useful pat_hwpc Counters to Sample

Default use of "pat_hwpc" on the X1 gives you more data than most of us know how to interpret. If you hunger for even more, here are four additional, useful counters to sample:

P:6:3 -- Stall_VU_No_Inst

CrayDoc Description:
"CPs VU has no valid instruction"

The value reported is time running in scalar mode without any overlapping computation in vector mode. Other fields reported by pat_hwpc, such as total "Vector ops" and "Scalar ops," hide the fact that scalar and vector processing often occurs simultaneously on the X1.

The Stall_VU_No_Inst counter reports the time the Vector Units are stalled because no Vector instruction is available to process. If this a large percentage of the code's total run time, the code may benefit from profiling and optimization to improve vectorization.

ARSC users are encouraged to ( email ARSC Consulting or call 907-450-8602 for assistance.

P:9:3 -- Stall_VLSU_LB

CrayDoc Description:
"CPs VLSU stalled waiting for load buffers (LB)"

Stall_VLSU_LB is the time the Vector Load/Store Unit is stalled because all the vector load buffers are already busy "talking" to vector cache and main memory.

A large value here may indicate that the performance of this code is limited by memory bandwidth. Again, profiling the code is the next step, but likely optimizations include restructuring bottleneck loops to increase the number of operations performed per load and/or fixing inefficiencies in the cache and/or memory access pattern.

P:12:3 -- Stall_VLSU_VM

CrayDoc Description:
"CPs VLSU stalled waiting for VU vector mask (VM)"

The vector mask registers are commonly used to vectorize loops containing conditional statements based on vector data. E.g.,

  do i=1,N
    if ( A(i) == 0.0 ) 
      B(i) = 0.0
      B(i) = func(A(i))

A large value for Stall_VLSU_VM may indicate that the complexity or nesting depth of vectorizable conditional loops is slowing the code down.

P:14:3 -- Stall_VLSU_Index

CrayDoc Description:
"CPs VLSU stalled waiting for VU index vector for gather or scatter"

A disproportionately large value suggests that vectorizable loops with indirect array access, such as this, should be evaluated:

      do i=1,N
         A( indx(i) ) = B(i) 

As always, profiling with cray_pat is the next step.


To report data on these four counters (in addition to the usual report), invoke pat_hwpc as:

  pat_hwpc -e 'P:14:3,P:12:3,P:6:3,P:9:3' 

Here's what the output looks like for a 1-MSP user code tested on klondike:

  %  pat_hwpc -e 'P:14:3,P:12:3,P:6:3,P:9:3' aprun -n 1 ./a.out

[ ... cut ...]
Stall VU No Inst          33.351 secs   13340419864 clks         
Stall VLSU LB             19.388 secs    7755336688 clks         
Stall VLSU VM             16.299 secs    6519742910 clks         
Stall VLSU Index          15.969 secs    6387416611 clks         
[ ... cut ...]


The list of all available counters is available at:

  Optimizing Applications on the Cray X1TM Series System - S-2315-53
      Appendix B. Hardware Performance Counters

(ARSC users, read "news documents" on klondike for instructions on obtaining the above manual.)


Quick-Tip Q & A

A: [[ I often find myself comparing versions of source files trying 
   [[ to figure out what changed between the versions.  Good 
   [[ old-fashioned diff works just fine, but there's got to be a more
   [[ modern solution.
   [[ Do you know of any text editors or other tools that have file
   [[ comparison functionality built in?

# From Martin Luthi

If you are using Emacs or XEmacs, try ediff. It is almost too useful
to not try out, even if you are not hooked on Emacs (yet). Since
Emacs also fully supports CVS (or other version control systems),
this feature is even more useful. (And of course Emacs does syntax
highlighting, file access on remote machines, ....).

To get an idea, look at these screenshots.

You can invoke ediff in a running Emacs with

M-x ediff-buffers    if the files are already visited by Emacs
M-x ediff-files      if the files are not loaded in Emacs

For comparison of three files, the analogous commands are

M-x ediff-buffers3 and M-x ediff-files3.

[M-x means pressing the Meta key and then hit x while Meta is still
pressed. The Meta key is often identical to Alt (PC Keyboards),
Diamond (Sun Keyboards), or you can press Esc x to obtain the same

# Thanks to James Keenan and former co-newsletter editor Guy Robinson

tkdiff uses Tcl/Tk and diff to give a graphical representation between
files. You can store your favorite diff options, such as -ib, for
each time you launch tkdiff. There is also a decent merge capability
that allows the user to move lines from one file to the other.


for more information.

# Editors Note: tkdiff even runs on the Cray X1 (slowly), and 
# doesn't require any compilation.

# Kate Hedstrom

The emacs flavor is built into xemacs. The vi flavor comes with vim
as gvimdiff. There is also xxdiff and I've heard that mgdiff has
similar functionality. They are all trying to emulate the old SGI
command xdiff.

# Jed Brown

Try vimdiff or the gui version gvimdiff. It displays two or three files
side-by-side with differences highlighted and automatic folding of the
boring sections. You can edit the files with realtime updating of the
diff. If you want to use the mouse more, there is the Qt-based kdiff3.
An interesting tool if you want to generate patches for binary files is

# Thanks to Greg Newby for a different take on the question

For single-author projects, try RCS.  The "ci" and "co" commands
are used to check in and check out code (or other text files) and
annotate them.  This is lightweight and easy to use.

For multiple-author projects, options get more powerful and complex.
SCCS, CVS and subversion are popular, and quite functional.  The
trick, for these, is to gracefully handle situations where different
people edit the same file.

# And, from Wendy Palm

Emacs has a number of commands that provide version control
information.  You can directly compare two files (M-x diff), or use
the version control commands

There are a number of tools (free) listed here, perhaps one will
appeal to you:

Generally, I've been able to use the different options of diff and get
the information I need.  Specifically, remember the use of "-u", "-c"
and "-y".  For example, given the following simple files:

  [hubble-e0:~] palm% cat file1
  [hubble-e0:~] palm% cat file2

diff by itself gives just the 2 lines that are different:

  [hubble-e0:~] palm% diff file1 file2
  < ccccccccccc
  > 33333333333
diff -u gives the changes in unified context where "-" is file1 and

  "+" is file2
  [hubble-e0:~] palm% diff -u file1 file2
  --- file1       Fri Mar 25 17:56:14 2005
  +++ file2       Fri Mar 25 17:56:27 2005
  @@ -1,4 +1,4 @@

diff -c gives the changes in context, showing 2 lines above and 2
lines below the changes where *** is file1 and --- is file2

  [hubble-e0:~] palm% diff -c file1 file2
  *** file1       Fri Mar 25 17:56:14 2005
  --- file2       Fri Mar 25 17:56:27 2005
  *** 1,4 ****
  ! ccccccccccc
  --- 1,4 ----
  ! 33333333333
and, finally, diff -y shows the changes side by side, where file1
is on the left and file2 is on the right, and differing lines are
identified with "

  [hubble-e0:~] palm% diff -y file1 file2
  aaaaaaaaaaa                               aaaaaaaaaaa
  bbbbbbbbbbb                               bbbbbbbbbbb
  ddddddddddd                               ddddddddddd
If you want a graphical interface, check out "xdiff".  It can handle
up to 4 files with the text presented side-by-side with the differences
color coded.

# And in the spirit of our April 1st issue: 

Bob Robins reports that like the editors, he prints the two files, but
then he uses a light table.  "I just put the two versions one on top of
the other and the differences pop right out."

Q: Emacs isn't available on the Cray X1.  Thus, I must edit source code
   on my desktop workstation and move it to the X1 for compilation and
   testing.  It goes like this:

         forget to move updated file to X1

         make (Ooops! now I realize I don't have the updated file.)

         ftp file to X1


    ... REPEAT ...
    ... REPEAT ...
    ... REPEAT ...

  There must be a better way!    

[[ Answers, Questions, and Tips Graciously Accepted ]]

Current Editors:
Ed Kornkven ARSC HPC Specialist ph: 907-450-8669
Kate Hedstrom ARSC Oceanographic Specialist ph: 907-450-8678
Arctic Region Supercomputing Center
University of Alaska Fairbanks
PO Box 756020
Fairbanks AK 99775-6020
E-mail Subscriptions: Archives:
    Back issues of the ASCII e-mail edition of the ARSC T3D/T3E/HPC Users' Newsletter are available by request. Please contact the editors.
Back to Top