ARSC HPC Users' Newsletter 268, May 2, 2003



Batch Scripting Essentials 2: Uncoupling I/O from a Main Program

[ Thanks to Jeff McAllister for Part II in this series. Part I appeared in issue 259 . ]

It's been a while, but here's the next installment on the complex topic of automating tasks necessary for supercomputing.

In this article I will describe how to de-couple I/O tasks from a main program by a crude sort of "message passing" between processes.

For example, what if your program wrote files on a regular basis (i.e. outputs at the end of timesteps) but needed to do something additional like compress those files or move them around? These tasks, while not computationally intensive, can take a lot of time. You could depend on having a lot of space -- just write everything and deal with it later -- but this doesn't scale. Soon it's not just a matter of a bigger quota because there's not that much disk to be had anywhere. It would be nice if the auxiliary I/O processing could happen for each file as soon as it was written, and if it didn't involve changing much of the main model's code.

It turns out that tasks like this can often be handed off to other processes, even running in different scripts, with only minor changes to the original program.

First, some communication or synchronization must be established between the cooperating processes. (You don't, for instance, want to compress or move a file that is still being written.) There are many ways to do this, but what has always seemed simplest to me is just appending the name of each output file, right after the main program has finished writing it, to a "finished list" file.

The "helper" process running concurrently can wake up every few seconds, read this file, and determine whether there is anything new to do. (By the way, the time between checks isn't very sensitive -- each check is almost free. However, to avoid filling up more disk than necessary, the check rate should not be too much greater than the rate at which the files are created.)

Here are the steps which would need to appear in a batch submission script to start the two processes:

  1. submit (llsubmit, qsub, etc.) the script to start the helper process.
  2. start the main job.

This is just like "chaining" jobs (see Batch Scripting Essentials I in issue 259, except that both scripts run at the same time.

Second, the "helper" must know when the main program has terminated and, thus, when the "finished list" won't be getting any longer. That is, the "helper" must know when to quit. An easy solution is for the main program to create a special file to flag the "helper" that it's done. With a little extra logic, the helper script can detect this file and exit.

I've written a quick demo (see:

) that shows how this system might work. To run it you will need these two files:
  1. -- a "model model" to write regular files, and simulate the "messages" that a real program would need to provide the helper script,
  2. -- which compresses files as they are written.

Additionally, there are batch submission scripts for ARSC systems:

  • icehawk ( LLwrite_icehawk, LLshrink_icehawk )
  • iceflyer ( LLwrite_iceflyer, LLshrink_iceflyer )
  • yukon ( write_yukon.qsub, shrink_yukon.qsub )
  • chilkoot ( write_chilkoot.qsub, shrink_chilkoot.qsub )

Run the demo as follows:

  1. submit the "write" script. (This will submit the compress script too.)
  2. check the job listing. You should see that both scripts are active.
  3. type 'ls $WRKDIR' every few seconds. You should see files being created and compressed.

On iceflyer it should work something like this:

Submit the job:

  iceflyer 2% llsubmit LLwrite_iceflyer
  llsubmit: The job "iceflyer.3617" has been submitted.

Verify the two scripts are running:

  iceflyer 3% llq
  Id                       Owner      Submitted   ST PRI Class Running On
  ------------------------ ---------- ----------- -- --- ------------ -----------
  iceflyer.3617.0          jmtester    5/2  15:52 R  50  large        f1n1
  f1n2.487.0               jmtester    5/2  15:52 R  50  work         iceflyer

Watch the files as they are produced, then compressed:

  iceflyer 4% ls $WRKDIR
  finished.lst    test.file.0.gz  test.file.1.gz  test.file.2.gz

Both scripts should finish at approximately the same time.

If you need to compress files, you could probably use the script with no changes. Note, however, that these scripts are offered only as a demo. If you use them for your real application, you must provide your own validation and testing. That said, in your application program you would make two modifications. Your program must:

  1. append to the "finished.lst" file the names of the files the program has finished writing,
  2. create the special file, "done.tmp", just before it terminates.

This method can be expanded to a wide variety of tasks. Basically anything that can be initiated with a small signal relatively infrequently (there are overheads involved, especially with all of the file opens that need to occur). I've found it especially helpful for I/O and system interaction -- stuff that scripting languages are especially good at and Fortran isn't.


No promises when, but storage abstraction and file staging will be the next article in this "series."


CUG Archives

[ From John Stephens ]

To commemorate its 25th anniversary, the Cray User Group (CUG) is compiling a complete archive of CUG Proceedings and conference programs, and needs your help. We have gaps for many of the earlier conferences.

For some, we have no record of who organized the conference. For the third, we have only a reference that it occurred. We have no information at all for the first two meetings.

Specifically, for Conferences 1-6 we have only the following:

  • Conf. 6: Oct. 20-22, 1980, sponsored by United Computing Systems, Inc., Kansas City, Crown Center Hotel: no documentation; known only by a reference to it in the previous conference notes.
  • Conf. 5: April 22-24, 1980, post conference report by Howard Gordon
  • Conf. 4: Oct. 23-25, 1979: post conference report by Howard Gordon; travel info. and program outline
  • Conf. 3: April 26-27, 1979: post conference report by Howard Gordon; misc. correspondence by Robert L. Cave; attendee list; tentative agenda
  • Conf. 2: no information
  • Conf. 1: no information

If you have any material or information that will help fill in our gaps, we would very much appreciate your sending it to Bob Winget, for incorporation in the website. Send to :


John Stephens Director (Americas) and Publications Chair Cray User Group


CUG and Next Newsletter

The next newsletter will be delayed a week since the editors will be at the Columbus CUG. We'll see some of you there!


Quick-Tip Q & A

A:[[ I assign a full path and file name to an environment variable, with
  [[ a result similar to this:
  [[   setenv FILENAME /usr/local/include/mpi.h
  [[ all a _really_ need is the file name, like this:
  [[   mpi.h
  [[ These annoying little details are just like mosquitoes. How do I
  [[ strip off the path without writing yet another little perl script
  [[ that I'm sure's been written 10,000 times before?

  # This may be a record for number of responses when no prize was
  # involved!  Thanks to Martin Luthi, Gene McGill, Kurt Carlson, John
  # Metzner, Rich Griswold, Nic Brummell, John Skinner, and Kate Hedstrom.
  #   Here's the range of answers: 

  You can get just the filename:
    % basename $FILENAME

  Or the directory name:
    % dirname $FILENAME
  Or the filename without the suffix:
    % basename $FILENAME .h
  # For CSH users: 
  Wow!  An easy one.  The shell has variable modifiers.
    % echo $FILENAME:h

    % echo $FILENAME:t
  # CSH, and portability:
  The basename method is the more portable of the two. The :modifier [:t,
  :h, etc] method won't work under UNICOS, AIX, or SOLARIS C shells (on
  our systems) for true environment variables. However, the :modifier
  method WILL work for regular shell variables on all these OS's.  

    % set shoo=/why/not; echo $shoo:t
  Both methods do work under the LINUX csh/tcsh shell however, for those
  of us with linux clusters.  Is nothing in life ever really portable but
  death and taxes?!?
  # KSH, and performance:
  To accomplish this with ksh use:
    %  echo ${FILENAME##*/}
  And to get the directory instead of the filename:
    % echo ${FILENAME%/*}
  A single '#' above will get everything after the first '/' instead of
  the last '/'.  Likewise, a double '%' will strip everything after the
  last '/'.   Here's another parsing example:
    % TEST="aaa:bbb:ccc" 
    % echo ${TEST#*:} 
    % echo ${TEST%%:*} 
  These are ksh built in capabilities, which means they are much more
  efficient than invoking commands or perl scripts.  These are posix
  extensions which are not available in true Bourne (sh) shells still
  supported on some UNIX variants. 

Q: I run a series of batch jobs (NQS, LoadLeveler, PBS, whatever). Each
   run must create its own directory for output. My current method is to
   manually edit the batch script for each run, typing the name for the
   output directory, like this:


   this variable is used later in the script, e.g.,:

     mkdir $OUTDIR 
     cd    $OUTDIR

   I don't care much what names are used for the directories.  Can you
   recommend a way, if there is one, to come up with reasonable names,

[[ Answers, Questions, and Tips Graciously Accepted ]]

Current Editors:
Ed Kornkven ARSC HPC Specialist ph: 907-450-8669
Kate Hedstrom ARSC Oceanographic Specialist ph: 907-450-8678
Arctic Region Supercomputing Center
University of Alaska Fairbanks
PO Box 756020
Fairbanks AK 99775-6020
E-mail Subscriptions: Archives:
    Back issues of the ASCII e-mail edition of the ARSC T3D/T3E/HPC Users' Newsletter are available by request. Please contact the editors.
Back to Top