ARSC HPC Users' Newsletter 318, June 16, 2005

Michael Hart Lecture on Project Gutenburg

Michael Hart, inventor of eBooks and founder of Project Gutenberg, will be in town Tuesday-Wednesday June 22-23. He will give a presentation, "A Million Dollar DVD," Wednesday evening, June 22, 7:00 pm at the Noel Wien Public Library in downtown Fairbanks.

Everyone is welcome to attend. Michael's visit is sponsored by ARSC.

Details:

http://www.arsc.edu/news/archive/projectgutenberg.html

Queue Changes on Klondike

Alert klondike users have read the "Message of the day" and "news queues" and already know that we're updating the X1 queues a bit. We're replacing the 30-minute "Quick" queues with one "debug" queue and we've increased the maximum run time for jobs in "small" from 8 to 16 hours and "medium" from 8 to 12 hours.

Excerpt from "news queues":


      Name of     max MSPs   Time limit
      queue       per job    (wallclock)
      -------     --------   -----------
      debug       30         30 minutes

      small       8          16 hours
      medium      30         12 hours
      large       60         8  hours
      xlarge      120        8  hours

More details and explanations for these changes:

Debug queue:

If you're doing debugging or development work, request 30 minutes or less, as always, but submit your job to "debug" instead of "default." To do so, simply change your qsub "-q" option to:


  #PBS -q debug

Like the obsolescent "Quick" queues, the new "debug" queue has high priority for relatively rapid turnaround. When needed, ARSC staff will manually checkpoint work in other queues to allow jobs waiting in the "debug" queue to start. The "debug" queue will be pushed ahead twice per day at 9:00 AM and 2:00 PM Alaska Time.

There are two primary goals behind this change:

  1. Formerly, the single "routing" queue, "default" serviced both the standard production queues and the high-priority "Quick" queues. Thus, there has been confusion between standard jobs which happen to be short versus debugging/development jobs which should have higher priority.
  2. We're replacing four queues with one, and thus achieving a net simplification.

The "Quick" queues will be deleted next Wednesday, June 22. Users who forget to change their qsub scripts will suffer no ill-effects as the "default" routing queue will remain intact and continue to service all the standard queues (small, medium, large, and xlarge). See "news queues" for more detail.

Time Limits:

Many users require large processor counts to obtain sufficient memory and/or processing power, but some users have flexibility in choosing the number of processors to use. The walltime limits changes, it is hoped, will encourage users to not use more processors than they need.

Communication costs in parallel programs typically grow non-linearly with increasing processor counts. Thus, using fewer processors can reduce time spent in MPI or other communication routines and improve efficient use of the processors. Also, when a job which could be run on more processors is run on fewer, more memory per processor is used, which increases memory utilization.

With the new time limits, the maximum total CPU time which individual small or medium jobs can obtain is a bit closer to that which large jobs can obtain:


  Queue:  (Max MSPs) X (Max Walltime) = (Max CPU-Time)
  ======  ============================================
  small:    8 MSPs  X  16 hrs  =  128 CPU-hrs
  medium:  30 MSPs  X  12 hrs  =  360 CPU-hrs
  large:   60 MSPs  X   8 hrs  =  480 CPU-hrs

We will be watching the effects of these changes and we welcome your feedback.

X1: Update to Default Programming Environment, June 22

During scheduled downtime on June 22, 2005, PE 5.4 will be made the default PrgEnv. The current default, PE 5.3, will be preserved as PrgEnv.old.

PE5.4 has been available for user testing since April 20.

Following this change, the programming environments will be configured as follows:


  PrgEnv.old : PE 5.3
  PrgEnv (the default): PE 5.4
  PrgEnv.new : PE 5.4

Floating Point Exception Handling with C99

The C99 standard provides an improved interface to deal with floating point exceptions.

The C99 header file "fenv.h" describes the interface to the new floating point exception routines, and defines the following macros to describe various floating point exceptions states:


  FE_DIVBYZERO   The dividend is finite/non-zero and divisor is zero.
  FE_INEXACT     The result is not represented exactly due to rounding,
                 etc.
  FE_OVERFLOW    The value is too large to represent (large positive
                 exponent).
  FE_UNDERFLOW   The value of the result is too close to zero to
                 represent (large negative exponent).
  FE_INVALID     Exception not covered by other exceptions (e.g. 0/0)
  FE_ALL_EXCEPT  all floating point exceptions (bitwise or'ed together)

Several functions provide the interface to deal with exceptions.

The function, fetestexcept, returns the values of any exceptions that are set from the input mask. To test for all floating point exceptions we would use the following:


  flags = fetestexcept( FE_ALL_EXCEPT );

A non-zero return value indicates that an exception has occurred. The sample code, below, demonstrates one method for determining which exception(s) occurred.

Another routine, feclearexcept, resets the exception state for floating point operations for one or more exceptions. For example we could clear the inexact flag by using the following:

E.g.

feclearexcept( FE_INEXACT );

If desired, an exception state can be set using the function feraiseexcept.

E.g.

feraiseexcept( FE_DIVBYZERO );

What follows is a sample code which includes a function designed to cause an overflow exception (cleverly named "overflow"). The testFE function runs through all possible exception values and prints any that are set:


/*
************************************************************************
************************************************************************
*/

/*
 * filename: fe_test.c
 * purpose:  demo of c99 floating point environment
 * compile:  c99 fe_test.c -lm -o fe_test
 *
 */

#include <stdio.h>
#include <stdlib.h>
#include <fenv.h>

#define MY_EXCEPTIONS FE_DIVBYZERO 
 FE_INVALID 
 FE_OVERFLOW 
 FE_UNDERFLOW

void testFE(const char * func, int flags, int exit_on);
float overflow(int val);

int main(int argc, char ** argv)
{
    printf("overflow(200)=%f\n", overflow(200));
    printf("overflow(300)=%f\n", overflow(300));

    return 0;
}


float overflow(int val)
{
    int flags=0;
    float v=1.5;

    for(int ii=0;ii<val;++ii)
        {
        v *= 1.5;
        }

    /* Check to see whether or the not any of the selected exceptions have
       occurred and terminates execution if appropriate.
    */
    if ( flags=fetestexcept(MY_EXCEPTIONS) )
        {
        testFE(__func__,flags,FE_INEXACT 
 FE_OVERFLOW 
 FE_UNDERFLOW);
        }

    return v;
}



void testFE(const char * func, int flags, int exit_on)
{
/* testFE check the flags input to see if any floating point exceptions
   are set.  The program will terminate if the exception is included in
   the exit_on mask
*/

    const int EXCEPTION_COUNT = 5;
    int fe[] = { FE_DIVBYZERO, FE_INEXACT, FE_INVALID, FE_OVERFLOW,
                 FE_UNDERFLOW };
    char * fe_name[] = { "FE_DIVBYZERO", "FE_INEXACT", "FE_INVALID",
                "FE_OVERFLOW","FE_UNDERFLOW" };

    int exit_set=0;
    for(int ii=0;ii<EXCEPTION_COUNT;++ii)
        {
        if ( ( fe[ii] & flags ) == fe[ii] )
            {
            fprintf(stderr,"Floating Point Exception Detected\n");
            if ( func )
                {
                fprintf(stderr,"function:  %s\n", func);
                }
            fprintf(stderr,"exception: %s\n",fe_name[ii]);
            if ( ( flags & exit_on & fe[ii] ) == fe[ii] )
                {
                exit_set=1;
                }
            }
        }

    if ( exit_set )
        {
        exit(1);
        }
}
/*
************************************************************************
************************************************************************
*/

When compiling, be sure to include the math library (-lm). Here's a sample compilation and run on the ARSC P6X complex:


iceberg1 255% c99 fe_test.c -lm -o fe_test

iceberg1 256% ./fe_test
overflow(200)=247937879422739046294637934885000000.000000
Floating Point Exception Detected
function:  overflow
exception: FE_OVERFLOW

Both Cray and IBM both provide some degree of support for C99. In the case of the above program, it works as expected when using the IBM C99 compiler, however I couldn't get it to work properly on the X1. Also be aware that the C99 exception handling functionality is not defined in the C++ standard, so using "fenv.h" with C++ files may not work. Even if it does, it may not be portable to other compilers or systems.

References:

O'Reilly's "C Pocket Reference" by Peter Prinz and Ulla Kirch-Prinz, 2003, ISBN 0-596-00436-2

Quick-Tip Q & A


A:[[ I'm trying to find the version of the IBM Fortran compiler, xlf,
  [[ but the man page doesn't list an option for this.  Where can I find
  [[ the version?


#
# Thanks to Kate Hedstrom and Charles Grassl for this solution.
#

f2n1 46% lslpp -L 
 grep xlf
  xlf.ndi                    8.1.1.4    C     F    XLF for AIX Non-Default
  xlfcmp                     8.1.1.6    C     F    XL Fortran Compiler
  xlfcmp.html.en_US          8.1.1.0    C     F    XL Fortran Compiler
  xlfcmp.idebug.html.en_US   8.1.1.0    C     F    Distributed Debugger
  xlfcmp.msg.en_US           8.1.1.3    C     F    XL Fortran Compiler Messages -
  xlfcmp.pdf.en_US           8.1.1.0    C     F    XL Fortran Compiler
  xlfcmp.ps.en_US            8.1.1.0    C     F    XL Fortran Compiler
  xlfrte                     8.1.1.6    C     F    XL Fortran Runtime Environment
  xlfrte.aix51               8.1.1.5    C     F    XL Fortran Runtime Environment
  xlfrte.msg.en_US           8.1.1.1    C     F    XL Fortran Runtime Messages -

The one of these you want is xlfcmp, or 8.1.1.6.

#
# Thanks John Skinner for several solutions
#

The only ways I know to do this under AIX are to use the AIX lslpp cmd
to list installed software products and grep for entries ending in
Compiler. Or one can read the default /etc/*.cfg files, which should
have version details inside a comment block at the top of each file.

tcsh% lslpp -IL 
 grep Compiler$
  vac.C                      7.0.0.0    C     F    IBM XL C Compiler
  vacpp.cmp.core             7.0.0.0    C     F    IBM XL C/C++ Compiler
  vacpp.cmp.rte              7.0.0.0    C     F    IBM XL C/C++ Compiler
  vacpp.msg.en_US.cmp.core   7.0.0.0    C     F    IBM XL C/C++ Compiler
  xlfcmp                     9.1.0.0    C     F    XL Fortran Compiler
  xlfcmp.html.common         9.1.0.0    C     F    XL Fortran Compiler
  xlfcmp.html.en_US          9.1.0.0    C     F    XL Fortran Compiler
  xlfcmp.pdf.en_US           9.1.0.0    C     F    XL Fortran Compiler
  xlfcmp.ps.en_US            8.1.1.0    C     F    XL Fortran Compiler


tcsh% egrep "Version
Edition" /etc/{vac,xlf}.cfg
/etc/vac.cfg:* IBM XL C/C++ Enterprise Edition Version 7.0 for 52
/etc/xlf.cfg:* IBM XL Fortran Enterprise Edition V9.1


#
# Finally thanks to Charles Grassl for pointing out that XLF v9.1
# can report the version.
#


Here are three solutions to the problem:
$ lslpp -L 
 grep xlf
The command will list all the components of the xlf compiler.  The
version number is in the second column

$ xlf -qlist -c myfile.f
The compiler version number is the first line of file myfile.lst.

$ xlf -qversion
Xlf version 9.1 has the new "-qversion" option which prints the version
number to standard out.


#
# Editor's Note: ARSC currently has XLF v8.1 installed once iceberg
# and iceflyer so "-qversion" doesn't work yet.  The "-qversion" flag
# does however work with Visual Age C v7.0.




Q: Sometimes I'll run an X1 PBS script interactively instead of through
  "qsub," to test the basic syntax of the shell script.  Here's a sample
  script (with just the basics remaining):

    #PBS -l walltime=4:00:00
    #PBS -l mppe=8
    #PBS -q default

    cd $PBS_O_WORKDIR
    aprun -n 8 ./a.out

  I'm totally annoyed, though, because I usually forget that in an
  interactive run, the PBS variable PBS_O_WORKDIR doesn't get set! 
  So, when the script hits this line:

    cd $PBS_O_WORKDIR

  it cd's my session to my home directory and everything fails until I
  remember to go back and comment out the "cd $PBS_O_WORKDIR".  Then, of
  course, when I'm done with interactive tests and submit the real,
  batch, run, I forget to UNcomment the "cd", and everything fails
  again!

  Any ideas to help me out? 

[[ Answers, Questions, and Tips Graciously Accepted ]]


Current Editors:
Ed Kornkven ARSC HPC Specialist ph: 907-450-8669
Kate Hedstrom ARSC Oceanographic Specialist ph: 907-450-8678
Arctic Region Supercomputing Center
University of Alaska Fairbanks
PO Box 756020
Fairbanks AK 99775-6020
E-mail Subscriptions: Archives:
    Back issues of the ASCII e-mail edition of the ARSC T3D/T3E/HPC Users' Newsletter are available by request. Please contact the editors.
Back to Top