ARSC HPC Users' Newsletter 322, August 12, 2005

Faculty Camp Talks

This year's Faculty Camp attendees will be giving their final presentations at 1:00 PM on Friday, August 19th in West Ridge Research Building (WRRB) Room 010. Faculty Camp is an annual three week introduction to High Performance Computing taught by ARSC staff, joint faculty and other lecturers.

Chaining and PBS Dependencies

In the issues > 319 and > 320 of the newsletter, we took a look at the dependency capabilities of PBS. We considered the following scenario. A user has an application which does periodic checkpointing, but may require more than one job to complete. The user would also like to copy the output to long term storage at the completion of the computational job(s). To automate this process we looked at using the dependency functionality of PBS. This functionality allows a user to specify when a job should run based on the exit status of another job (i.e. whether or not an error occurred). At that time, we had problems handling the post processing step. In this article we take a fresh look at the problem and combine "traditional" chaining techniques with the PBS dependency functionality. This method allows an arbitrary number of jobs to be chained without specifying dependencies beforehand It also allows us to handle the post processing step. Each computation script is set up as such:

  1. First submit the dependent job script using the job id from the current job. PBS sets the environment variable PBS_JOBID to the job id for the running job.

    
       e.g.
       qsub -W depend=afternotok:$PBS_JOBID nextstep.pbs
       
    

    This job will only run when the current job step exits in error (i.e. "afternotok"). With this strategy we can avoid manually setting up dependencies as we did in issue 320.

    NOTE: We recommend that self-submitting scripts be avoided. In the past we have seen self-submitting scripts flood the queues with jobs due to simple mistakes. Should you choose to write a script for each job, the final script in the chain will not have a dependent job script.

  2. Next setup the current job and run the application as you would normally do.
  3. Lastly submit a post processing job to copy results to long term storage or to perform other tasks on the results.

If the job runs longer than the wall clock limit, PBS will kill the job which will prevent post processing script from being submitted. Should this happen it will also trigger the dependent step to be released and run the next job in the chain. When the application completes successfully the post processing job will be submitted and run.

Let's look at a sample script which puts everything together.


klondike 1% cat step.1.pbs
#PBS -q default
#PBS -l mppe=4
#PBS -l walltime=8:00:00
#PBS -j oe
#PBS -S /bin/ksh

cd $PBS_O_WORKDIR
# 1) Submit the dependent step:
#  The script step.2.pbs will only run if step.1.pbs exits with an 
#  error (e.g. if the job was killed by PBS)
#
qsub -W depend=afternotok:$PBS_JOBID step.2.pbs

# 2) Run the executable:  
#  If the job is killed while the executable is still running 
#  the remainder of the script (step.1.pbs) will not execute.
aprun -n 4 ./a.out

# 3) Check for error status:
#  If the application didn't exit in error, submit the post processing 
#  script.
#
if [[ $? -eq 0 ]]; then
    qsub post.pbs
else
    echo "Error: The application exited with an error!"
fi 

Step 3 should also handle the post processing dilemma we ran into in issue 320.

2005 Summer Intern Final Talks

The 2005 Alaska Research Summer Challenge Interns will be giving their final presentations on August 18th and 19th in the West Ridge Research Building (WRRB) room 010.

Presentations start every half hour during the following times:


  Thursday August 18, 2005 
    10:00 AM-Noon
     1:30 PM-3:30 PM

  Friday August 19, 2005 
    10:00 AM-Noon

The Alaska Research Summer Challenge is sponsored by the National Science Foundation and gives interns a chance to develop research skills while working with scientists from the UAF community.

Quick-Tip Q & A


A:[[ I have a large number of files that I want to copy to new file 
  [[ names. Every file with the pattern "1234" somewhere within the 
  [[ name should have "1234" replaced with "4567" in the copy.
  [[ 
  [[ E.g.
  [[ ab1234.dat -> ab4567.dat 
  [[ cd1234efg  -> cd4567efg
  [[ 
  [[ Since I have a large number of files to rename I would really 
  [[ like to find a way to automate this process.  
  [[


#
# Editors Note: This question ended up being more ambiguous than we
# anticipated, so we have accepted answers which either copy or move 
# the files to the new name.  Most of the answers here can be easily
# changed from one version to the other with minor changes.
#


#
# Thanks to Bill Homer for a few solutions using perl.
#

Here's one way, using Perl to do both the name transformations and the
renaming:


  $ ls -1 *1234* 
 perl -nle '$old=$_; s/1234/5678/; rename $old, $_'

For interactive use, you might prefer to generate mv commands to
inspect, and then pipe them to a shell for execution if they do what
you want:

  $ ls *1234* *5678*
  ab1234.dat  cd1234efg

  $ ls -1 *1234* 
 perl -nle '$old=$_; s/1234/5678/; print "mv $old $_"'
  mv ab1234.dat ab5678.dat
  mv cd1234efg cd5678efg

  $ ls -1 *1234* 
 \
    perl -nle '$old=$_; s/1234/5678/; print "mv $old $_"' 
 \
    ksh

  $ ls *1234* *5678*
  ab5678.dat  cd5678efg


#
# Thanks to Kate Hedstrom, for sharing the perl script she uses and
# introducing us to the rename command.
#

I have a Perl script that used to come with the Perl distribution ages
ago and I've been using it ever since. It has the syntax:

  % rename 's/1234/4567/' *1234*

I've put it in:

   http://people.arsc.edu/~kate/Perl/rename

Now, there is a Linux command of the same name with a friendlier syntax:

NAME
       rename - Rename files

SYNOPSIS
       rename from to file...

DESCRIPTION
       rename  will  rename  the specified files by replacing the
       first occurrence of from in their name by to.

or

  % rename 1234 4567 *1234*


#
# Thanks to Jed Brown and Greg Newby for submitting the following ksh 
# and sed solution.
#

One solution is to roll a loop on the command line and use sed:

  % for n in *1234* ; do mv $n `echo $n
sed s/1234/5678/`; done


#
# Greg also submitted a csh version
#

  % foreach i ( *1234* )
      mv $i `echo $i 
 sed 's/1234/4567/g'`
    end

Explanation: 
- the *1234* is an expression that will match all files with a "1234"
as part of their names.  (I'm presuming you don't have any special
characters like spaces and $ in your filenames.)

- the "mv" takes two arguments: the old filename, and the new.  The
new filename is created by piping the old filename (via echo) to sed.
sed, the Unix stream editor, substitutes all occurrences of 1234 with
4567.

This is a general technique for simple changes.  If your filename
changes were a little more complex, it might require some additional
steps in the "for" loop.


#
# Thanks to Jesse Niles of ARSC for sharing yet another apply based
# solution.
#

These are my quick-tip solutions for BSD-derived systems lucky enough
to have the 'apply' command:

For a shift of 3 for all numbers:
  % apply 'cp %1 `echo %1 
 tr 1234567890 4567890123`'

For this specific case:
  % apply 'cp %1 `echo %1 
 sed "s/1234/4567/g"`'

And for non-BSD-derived systems unlucky enough to not have the 'apply'
command:

For a shift of 3 for all numbers:
  % ls -1 
 xargs -i% ksh -c 'cp % `echo % 
 tr 1234567890 4567890123`'

For this specific case:
  % ls -1 
 xargs -i% ksh -c 'cp % `echo % 
 sed 's/1234/4567/g'`'


#
# Thanks to Wendy Palm for the following submission.
#

"sed" is perfect command for this.

To replace a pattern (in this case "1234" with "4567") in any string,
echo ab1234.dat 
 sed s/1234/4567/
would result in ab4567.dat

Here's a possible script to automate changing a bunch of filenames
(provided as arguments to the script):

  % cat changename.sh
  #! /bin/sh

  oldpattern="1234"
  newpattern="4567"

  for oldfilename in $*
  do
    newfilename=`echo "$oldfilename" 
 sed s/$oldpattern/$newpattern/`
    mv $oldfilename $newfilename
  done

  % ls
  ab1234.dat cd1234efg

  % changename.sh ab1234.dat cd1234efg

  % ls
  ab4567.dat cd4567efg


#
# Last but not least, thanks to Andrew Tsai for sharing the following perl 
# one liner, which was the inspiration for this question.
# 

  % ls -1 *1234* 
 perl -n -e 'chomp;$a=$_;if(s/1234/5678/){`cp $a $_`}'



Q:  It takes forever to untar my .tar file, and all I really need from
    it is one file.  Has anybody written a tool to extract just one 
    file from a tar file?  This is really ridiculous!

[[ Answers, Questions, and Tips Graciously Accepted ]]


Current Editors:
Ed Kornkven ARSC HPC Specialist ph: 907-450-8669
Kate Hedstrom ARSC Oceanographic Specialist ph: 907-450-8678
Arctic Region Supercomputing Center
University of Alaska Fairbanks
PO Box 756020
Fairbanks AK 99775-6020
E-mail Subscriptions: Archives:
    Back issues of the ASCII e-mail edition of the ARSC T3D/T3E/HPC Users' Newsletter are available by request. Please contact the editors.
Back to Top