ARSC HPC Users' Newsletter 297, August 13, 2004



Announcing ARSC Fall 2004 Science Seminars

You are invited to attend ARSC's Fall Science Seminars:

September 8-10, 2004 Arctic Region Supercomputing Center University of Alaska Fairbanks Fairbanks, Alaska

Wednesday September 8th:
  • "Supercomputing Applications"
    • James O'Dell, Chief Scientist, ARSC
  • "A Perspective on Ocean Modeling and Sea Ice Modeling and the Importance of Supercomputer Power to Their Evolution"
    • Albert Semtner, Professor, Naval Postgraduate School
  • "Development of the Earth System Modeling Framework"
    • Nancy Collins, Software Engineer, National Center for Atmospheric Research
Thursday September 9th:
  • "Use of Computers for Paleoclimate Research"
    • John Kutzbach, Professor, University of Wisconsin
  • TBD
    • Buck Sharpton, President's Professor, University of Alaska Fairbanks
Friday September 10th:
  • "Little Quarks and Big Computers"
    • Gerry Guralnik, Professor, Brown University
  • "On Molecular Sequence Alignment: Some Reasons for Doing It Well"
    • Tom Marr, President's Professor, University of Alaska Fairbanks

A final schedule will be released later. Check at: /arsc/support/news/hpcnews/hpcnews296/index.xml


Lectures by Visiting Mathematicians

Jerome Percus and Ora Percus of The Courant Institute of Mathematical Sciences and Department of Physics, New York University, will present the following series of lectures. All four talks will be held in the Elvey Auditorium, in the UAF Geophysical Institute:

August 24, 11:00 am
Fluids Under Tight Confinement
Jerome K. Percus
A preliminary study is made of the new phenomenology encountered when classical fluids are confined to enclosures that are of the order of the particle size in all but one spatial dimension. Self-diffusion is taken as the indicator of this phenomenology. The anomalous diffusion occurring in strictly one-dimensional flow is first reviewed, and then its extension to the single-file regime in which particles cannot pass each other. When the system enters the parametric regime in which particle exchange is first possible, a rapid transition to the characteristics of normal diffusion takes place, which is organized by the concept of "hopping time".
August 24, 3:00 pm
Can Two Wrongs Make a Right? Coin Tossing Games and Parrondo's Paradox.
Ora Percus
A number of natural and man-made activities can be cast in the form of various one-person games, and many of these appear as sequences of transitions without memory, or Markov chains. It has been observed, initially with surprise, that losing "games" can often be combined by selection, or even randomly, to result in winning games. Here, we present the analysis of such questions in concise mathematical form (exemplified by one nearly trivial case and one which has received a fair amount of prior study), showing that two wrongs can indeed make a right -- but also that two rights can make a wrong!
August 25, 11:00 am
Piecewise Homogeneous Random Walk with a Moving Boundary
Ora E. Percus
We study a random walk with nearest neighbor transitions on a one-dimensional lattice. The walk starts at the origin, as does a dividing line which moves with constant speed gamma, but the outward transition probabilities p_A and p_B differ on the right- and left- hand sides of the dividing line. This problem is solved formally by taking advantage of the analytical properties in the complex plane of an added variable generating function, and it is found that (p_A, p_B) space decomposes into four regions of distinct qualitative properties. The asymptotic probability of the walk being to the right of the moving boundary is obtained explicitly in three of the four regions. However, analysis in the fourth region is a sensitive function of the denominator of the rational fraction gamma, and encounters some surprises. Applications of random walk problems to sequential clinical trials will be mentioned.
August 25, 3:00 pm
Small Population Effects in Stochastic Population Dynamics
Jerome K. Percus
We focus on several biologically relevant situations in which small populations play a significant qualitative role, and take some first steps to incorporate such situations in the continuous dynamics format that has been so elegantly developed in the past. We first describe a small number of model systems in which the influence of small populations is evident. The we analyze in detail a toy model, exactly solvable, that suggests a path towards the attainment of our goal, and follow this by a formal vehicle for doing so. Application to model systems, and comparison with numerical solutions, indicates the potential utility of this approach.

Tips for Better I/O Performance on the Cray X1: Part II

[ Thanks to U.S. Naval Academy Midshipman Nathan Brasher for this report on his work at ARSC this summer. See: Part I of this two-part series. ]

As described in Part I of this study, I collected I/O bandwidth data on the X1 by writing files, 10MB in the formatted cases and 100MB files in the unformatted cases, and recording the elapsed time per write. I varied several parameters (all of which are under the user's control) to determine their effect on I/O performance. These parameters and the means by which an X1 user can vary them are shown here:

  I/O Parameter         Method of setting
  =================     ======================================
  Access method         Fortran open statement
  File format           Fortran open statement
  Buffer size           Unicos/mp assign command (assign -b)
  Chunk size            Fortran source code logic
  Blocking scheme       Unicos/mp assign command (assign -s)

In graphing the results, I only show the maximum speed obtained for each test over several runs. The maximum was chosen because it was felt that this best represented the potential for high bandwidth. Also by taking the maximum bandwidth and discarding other results I was able to ignore abnormally slow times resulting from sharing disk access with other users.

The four graphs all show I/O bandwidth as a function of buffer size, and have five curves plotted, one for each of five different chunk sizes. Each graph shows results for one combination of access method and file format:

Sequential, Formatted:

Sequential, Unformatted:


Direct, Formatted:

Direct, Unformatted:


I found three primary contributors to high I/O performance:

  1. unformatted files,
  2. arge buffer sizes,
  3. few large write statements as opposed to many small writes.

The first effect is particularly dramatic. Writing unformatted files is typically 100-200 times faster than writing their formatted equivalents.

When the computer encounters a formatted write statement, it must interpret the format before outputting results. As you can see by comparing the graphs for unformatted I/O to their counterparts, factors like buffer size and chunk size do not have quite the same effect on formatted I/O as they do on unformatted I/O. This is because the overhead created by a format statement overwhelms the other factors in terms of significance. The results is that formatted output is slow regardless of the circumstances. To achieve top performance, only use formatted I/O if you absolutely have to.

Direct access when combined with unformatted data is even faster. I have been unable to figure out exactly why direct unformatted produced the best results, but it was consistently faster than the other formats, often as much as twice as fast as sequential unformatted.

I tried changing the record access order, thinking that this speed up was a results of the somewhat artificial conditions of our test (direct access records were, in fact, accessed in sequential order). However, direct unformatted ran at the same speed regardless of the order in which the records were written. Thus, if you are looking for every possible tweak to cut down on I/O access time on klondike, try direct access, unformatted files.

Buffer size is another important consideration. As you can see from all four graphs, larger buffer sizes always result in better performance. This was anticipated from the knowledge that memory is much faster than disk access and by storing results in a large buffer one can cut down on the frequency of disk accesses. It is worth noting as well that the Cray default buffer size of 64 KB works well in most circumstances.

It is also interesting that while large buffers can partially compensate for small chunk sizes, they do not completely solve the problem (compare the curves for "whole array" and "10,000 elem chunks" on the two graphs of unformatted I/O). The data also show that more, smaller chunks always degrade performance.

I have chosen to describe, rather than graph, the effects of the final parameter in this study, the Fortran file blocking scheme. The reason is that, in all tests, the different blocking schemes, f77, f90, COS, and unblocked file I/O all performed within 3% or so of each other. If you need to specify a blocking scheme for portability, then go ahead and do it, otherwise using the Cray defaults works fine.

In conclusion, mostly what this study did is to confirm what we already suspected about the slowness of mechanical disk drives. Few large writes will run faster than many small writes.

If possible, store your data in large arrays and output it all at once using unformatted I/O. If you absolutely cannot use large arrays and write statements, at least allocate a large buffer size in order to compensate.

I hope that the production of this hard data on I/O performance will aid the users of klondike to improve their computational performance.


program test_prog
implicit none
include 'mpif.h'

real (kind=4),  dimension(:),   allocatable :: array
real, dimension(10) :: writetime
integer :: nelts,  chunksz, ntests, i, rep, j, ierr
real :: start, end

ntests= 20
nelts = 250000
chunksz = 250000

open(unit=1, status='replace', form='formatted', file='results')
write(unit=1,fmt='(I4)') ntests

do i=1, nelts
    array(i) = mod(i,10) / 10.0
end do

call mpi_init(ierr)

do i=1,ntests
    open(unit = 11, status='replace', form = 'unformatted', &
         access = 'sequential', file = 'outfile1')
    start = mpi_wtime()

    do rep = 1, 100
        do j=1, nelts, chunksz
                write(unit = 11) array(j:min(j+chunksz-1,nelts))
        end do
    end do

    end = mpi_wtime()

    writetime(i) = end-start
    close(unit = 11)
end do
write(unit = 1, fmt='(f10.6)') writetime

print *, 'Fastest Time : ',minval(writetime),' sec'
print *, 'Slowest Time : ',maxval(writetime),' sec'
print *, 'Average Time : ',sum(writetime)/ntests,' sec'

do i=1,ntests
    open(unit = 11, status='replace', form = 'formatted',  &
         access = 'sequential', file = 'outfile2')
    start = mpi_wtime()

    do rep = 1, 10
        do j=1, nelts, chunksz
                write(unit = 11, fmt='(100f4.1)')   &
        end do
    end do

    end = mpi_wtime()

    writetime(i) = end-start
    close(unit = 11)
end do
write(unit = 1, fmt='(f10.6)') writetime

print *, 'Fastest Time : ',minval(writetime),' sec'
print *, 'Slowest Time : ',maxval(writetime),' sec'
print *, 'Average Time : ',sum(writetime)/ntests,' sec'

do i=1,ntests
    open(unit = 11, status = 'replace', form = 'unformatted',   &
         access='direct', recl=4*chunksz, file = 'outfile3')
    start = mpi_wtime()

    do rep = 1, 100
        do j=1,nelts/chunksz
                write(unit = 11, rec=nelts*(rep-1)/chunksz+j)   &
        end do
    end do

    end = mpi_wtime()

    writetime(i) = end-start
    close(unit = 11)
end do
write(unit = 1, fmt='(f10.6)') writetime

print *, 'Fastest Time : ',minval(writetime),' sec'
print *, 'Slowest Time : ',maxval(writetime),' sec'
print *, 'Average Time : ',sum(writetime)/ntests,' sec'

do i=1,ntests
    open(unit = 11, status = 'replace', form = 'formatted',   &
         access='direct', recl=4*chunksz, file = 'outfile4')
    start = mpi_wtime()

    do rep = 1,10
        do j=1,nelts/chunksz
                write(unit = 11, rec=nelts*(rep-1)/chunksz+j,  &
                      fmt='(250000f4.1)') array(j:min(j+chunksz-1,nelts))
        end do
    end do

    end = mpi_wtime()

    writetime(i) = end-start
    close(unit = 11)
end do
write(unit = 1, fmt='(f10.6)') writetime

print *, 'Fastest Time : ',minval(writetime),' sec'
print *, 'Slowest Time : ',maxval(writetime),' sec'
print *, 'Average Time : ',sum(writetime)/ntests,' sec'

call mpi_finalize(ierr)

end program

Scripted Chaining of Batch Jobs and File Checks

[ Many thanks to Kate Hedstrom of ARSC, for yet another article! ]


In the new ARSC storage environment, files are now purged on the work areas. I'll be describing one way to deal with this on the IBM in your loadleveler job script. Something similar should work on the Crays.

The new mode of working might involve copying files from $ARCHIVE to the work area, for instance configurations files for a model. This can be done by hand before the job is submitted, or it can happen in a batch script. However, it can't happen in the main batch script since the compute nodes can't see $ARCHIVE. The purpose of the "data" loadleveler class on iceberg and the "work" class on iceflyer are to allow $ARCHIVE to be visible from batch jobs. We can manage this by job chaining:

  1. First phase fetches the files from $ARCHIVE and submits the second phase.
  2. Second phase does the big computation and submits the third phase.
  3. Third phase moves files to $ARCHIVE.

If the computation is really long and takes more time than the eight hours allowed by the standard class, the first job could check to see if the forcing files are still there from the previous stage before fetching them.

If the fetch fails, the batch script should not try to submit the second phase.

If the files aren't there for the second phase, the big computation should not try to run.

"if" in Shell Scripts

Let us review the syntax of "if" statements in shell scripts. Although I use tcsh for my interactive shell, I use the Korn shell (ksh) for scripts. In this case, it is exactly like the Bourne shell (sh). The only reason I switched to ksh is because you can export variables all in one line:

   export MP_SHARED_MEMORY=yes

rather than the sh version:


Back to the "if" statement. The general form is:

    if <some check>
             do something

with optional elif and else clauses. The simplest form of <some check> is simply a Unix command, which returns a value to the shell indicating success or failure:

    if date
       echo "date" ran
       echo "date" didn't run
Another form of <some check> involves a test on some condition. There are two equivalent forms of this:

    if [ -d /usr ]
       ls /usr

    if test -d /usr
       ls /usr

Note that the shell is fussy about whitespace in this example.

Here, "-d" is a test for whether the argument is a directory. There are other tests. An incomplete list is:

     -d  directory
     -f  regular file
     -r  readable file
     -w  writable file
     -a  file exists
     -s  file exists and is not empty

Putting "if" to Work

The script for the first phase (in the data/work class) could contain lines such as:

    # Copy file to current directory then submit the big job phase
    if cp $ARCHIVE/my_dir/my_file .
            /var/loadl/bin/llsubmit phase_2

If more than one file has to be copied, we could check for success after each one (here, "!" is a logical "NOT"):

    # Copy files to current directory or die
    if ! cp $ARCHIVE/my_dir/my_file_1 .
        echo problem copying my_file_1
    if ! cp $ARCHIVE/my_dir/my_file_2 .
        echo problem copying my_file_2
    /var/loadl/bin/llsubmit phase_2

For phase two, we want to make sure the files are there:

    # Check for needed files
    if [ ! -f my_file_1 ]
        echo my_file_1 not found
    if [ ! -f my_file_2 ]
        echo my_file_2 not found

    # Run the job and submit the cleanup phase if successful
    if ./my_big_job
        /var/loadl/bin/llsubmit phase_3

Phase three, moving results back to $ARCHIVE, is left as an exercise for the reader (if you get stuck, contact ARSC consulting which knows how to find me).


X1: Don't Forget to Link for SSP Mode

If your code multi-streams poorly, you should test it in single-streaming, or SSP, mode. To do so, compile all source files with "ftn -O ssp ..." or "cc -h ssp ...", and don't forget to link with "-O ssp" or "-h ssp" as well. If you link without the "ssp" option, your application will run in the default MSP mode using only one SSP per MSP.

A user had this problem last week and discovered it as follows. His was an OpenMP program, and he expected to run it with 16 OpenMP threads, one per SSP on a single shared-memory X1 node. The following command is correct, and tells the scheduler to run the application with 16 threads:

  aprun -d 16 ./a.out 
It, however, reported the error message:

 "aprun -d cannot exceed node size"

The problem: while he'd compiled for SSP mode, he'd linked for MSP mode, and there are only 4 MSPs per node. Relinking with "-O ssp" immediately solved the problem.

The "file" Unix command will immediately tell you if an executable file was linked in SSP or MSP mode:

  % ftn -O ssp -c tt.f
  % ftn -o tt tt.o
  % file tt
  tt:             ELF 64-bit MSB executable (not stripped) 
    MSP application NV1 - version 1
  % ftn -O ssp -o tt tt.o
  % file tt              
  tt:             ELF 64-bit MSB executable (not stripped) 
    SSP application NV1 - version 1

Quick-Tip Q & A

A:[[ C and Fortran compilers let me define pre-processor macros on the
  [[command line, like this, for instance:
  [[  cc -D VERBOSE -c mysource.c
  [[ But I use makefiles, and would prefer this:
  [[   make -D VERBOSE myapp
  [[ Is there a way to pass macro setting "through" a make command to
  [[ be used as compiler options?

  # Many thanks to Brad Chamberlain: 

  My strategy is to define a Makefile variable like "MYFLAGS".  You can then
  set Makefile variables on the command line.  For example, consider the
  following Makefile:

        @echo $(MYFLAGS)

  Running this normally results in a blank line:

        > make

  Running it with a command-line assignment to MYFLAGS yields:

        > make MYFLAGS=-foo

  You can also give multiple options using quotes:

        > make "MYFLAGS=-foo -bar"
        -foo -bar

  Thus, you can take your Makefile commands for compiling a .c file and add
  a variable name like this to the cc command specification in order to add
  additional flags (like your -D definitions) at a make command line.

  Note that command-line settings override those in files, thus I could have 
  added a line like:


  to the Makefile so that if I didn't specify anything on the command-line,
  I would've gotten the default behavior:

        > make

  The other two examples would work as before, though.

  # From Jed Brown:

  The quick and portable option is to do

    % make DEFS='-DFOO=1 -DBAR=0 -DQUX=1' target

  where makefile contains something like

    file.o : file.c
            $(CC) $(CFLAGS) $(DEFS) -c file.c


    CFLAGS+= $(DEFS)

  if applicable.

  If you really want to use syntax as stated, then put

    .for VAR in FOO BAR BAZ QUX
    .ifdef $(VAR)
    DEFS+= -D$(VAR)=1

  towards the top of the makefile. This will work with BSD style make, but
  GNU make has different syntax and does not support defining variables
  using "make -D VAR" syntax since "make VAR=1" is equivalent.

I've noticed that I can
sometimes get colors to work
with emacs,

other times
 I cannot.  Colors are great for syntax and variable
  [[ highlighting.  
  [[ A remote linux system + a Mac terminal window ($TERM=linux) does great
  [[ with colors, but many other combinations do not (including the emacs
  [[ that ships with the Macs).  Does anyone know good ways to find out
  [[ whether color is available in a particular emacs installation, and if
  [[ so how to get colors to display?

  # Better late than never!  In fact, if you've got a really good answer to 
  # a question from 7 years ago, send it in.
  # Thanks first to Martin Luthi: 

  Make sure that global-font-lock-mode is enabled, as many
  distributions have it disabled as default.

  M-x global-font-lock-mode toggles that behaviour.
  You can either customize the variable global-font-lock-mode, change it
  in the Options menu (Syntax highlighting) in modern Emacs (>22), or
  set it in your .emacs as:
    (setq global-font-lock-mode t)
  If you run emacs within a terminal (emacs -nw), the font lock
  capabilities depend on the terminal's capabilities.

  # And another thanks to Brad Chamberlain:

  Try using "M-x list-colors-display".  This should open a *Colors*
  buffer that lists all the color names and how they look as foreground
  and background colors.  I don't have a Mac, and all my emacs buffers
  support color, but I suspect that this would be one way of proofing
  whether colors are supported by an implementation.

Q: Uh oh...

   I just deleted some files that I don't even own!  How'd that
   happen?   Why'd it let me do that!  (Oh boy... I think I'm in

     % ls -l 
     total 2048
     -rw-------    1 fred     puffball       30 Aug 11 16:57 file.junk
     -rw-------    1 horace   heatrash 21922042 Mar 18 10:01 file.priceless
     -rw-------    1 fred     puffball       30 Aug 11 16:57 file2.junk
     -rw-------    1 horace   heatrash  4808440 Mar 19 11:21 file2.priceless
     % rm -f file*
     % ls -l
     total 0

[[ Answers, Questions, and Tips Graciously Accepted ]]

Current Editors:
Ed Kornkven ARSC HPC Specialist ph: 907-450-8669
Kate Hedstrom ARSC Oceanographic Specialist ph: 907-450-8678
Arctic Region Supercomputing Center
University of Alaska Fairbanks
PO Box 756020
Fairbanks AK 99775-6020
E-mail Subscriptions: Archives:
    Back issues of the ASCII e-mail edition of the ARSC T3D/T3E/HPC Users' Newsletter are available by request. Please contact the editors.
Back to Top