ARSC HPC Users' Newsletter 378, January 25, 2008

ARSC Summer Internship Program

The Undergraduate Research Challenge Summer Intern Program at the Arctic Region Supercomputing Center is now accepting applications for internships for the period of June 9 - Aug. 15, 2008. Undergraduate students are invited to spend ten weeks investigating leading-edge computational science activities on the campus of the University of Alaska Fairbanks.

The ARSC Research Summer Challenge attracts competitive applications from throughout the U.S. Interns work 40 hours per week on research projects, under the direction of a mentor. More information and application mate are available.

Application Deadline: March 16, 2008 .

Spring Training at ARSC

ARSC is offering these training opportunities this spring:
02/08 Introduction to Midnight
02/15 Introduction to Paraview
02/22 Introduction to IDV
02/29 Introduction to Matlab
03/07 Using Fortran 90
03/18 Introduction to Gaussian
03/21 Python, 2-Day Series
03/22   (continuation) Python, 2-Day Series
03/28 Using Subversion
04/01 Using netCDF
04/15 Matlab Special Topics
04/25 Midnight Performance Analysis Tools

Running Multiple Serial Executables from a PBS script

[ By: Don Bahls ]

Occasionally we have people ask us how to run a serial executable multiple times on the same node. Here's an example showing how you could run matlab:


#!/bin/bash
#PBS -q standard
#PBS -l select=1:ncpus=4:node_type=4way
#PBS -l walltime=6:00:00
#PBS -joe

cd $PBS_O_WORKDIR

# Load the matlab module
. /usr/share/modules/init/bash
module load matlab

# start 4 matlab processes in the background
#
matlab -nodisplay < input001.m > output001.out 2>&1 &
matlab -nodisplay < input002.m > output002.out 2>&1 &
matlab -nodisplay < input003.m > output003.out 2>&1 &
matlab -nodisplay < input004.m > output004.out 2>&1 &

# issue the "wait" command so that the shell will pause until
# all the background processes have completed.
wait

# end of script
#

Since there are 4 different matlab processes running, the stderr and stdout for each matlab process are redirected to different files. You aren't strictly required to separate the output streams like this, but if your code produces a lot of output it can be difficult to tell which process created which output.

Another alternative is to separate output based on jobid.

E.g.:

matlab -nodisplay < input001.m > output001.${PBS_JOBID}.out 2>&1 &

The key component of this script is the "wait" statement. This keeps the shell from proceeding until all matlab processes have completed. Without this the job will immediately exit and PBS will kill the backgrounded matlab processes. The "wait" command is available in csh and tcsh as well as ksh and bash.

While you could request more than one node with the PBS script, the executables will all run on the first node assigned by PBS unless you make the script significantly more complicated. It's generally easiest to stick to a single node when running a serial process multiple times.

You could also run more that 4 tasks on a 4 processor node, however the performance typically drops off quickly. Also, remember that the processes share the memory of the node, so if your code has significant memory requirements, you may not be able to run even one processor per core.

If 16 GB and 4 cores is insufficient, try running on one of midnight's 16-way nodes, which offer 64 GB and 16 cores each.

PathScale Fortran Compiler Optimization Flags Simplified

[ By: Ed Kornkven ]

The PathScale compilers are the default compilers on Midnight. They have a dizzying variety of flags for improving execution performance (see the Midnight man pages "man eko", "man pathf90" and "man pathcc" as well as the Midnight web pages ).

It isn't feasible to try all possible combinations of compiler flags for a program, so where should a person start? Thankfully, the PathScale Compiler Suite User Guide (chapters 6 and 7, and especially section 6.5) gives some suggestions for choosing tuning flags, and in my experience a simple set of flags gives good results.

Basic optimization is specified by the familiar "-On" flags, where "n" is an optimization level, 0-3. "-O2" is the default and enables extensive but conservative optimizations. The optimizations at this level are designed to avoid changing floating point results and always improve performance. This is the recommended starting point when beginning to optimize a program.

As more aggressive optimization techniques are tried you may notice undesirable effects such as:

  • compilation time greatly increasing,
  • no performance benefit from the optimization or even a slowdown, or
  • unacceptable numerical differences in the program output.

Therefore, the User Guide suggests gradually stepping up the optimization level. The next levels after "-O2" are, in order of aggressiveness: -O3 -O3 -OPT:Ofast -Ofast

The options "-OPT:Ofast" and "-Ofast" are shorthand equivalents with the following meanings: -OPT:Ofast means "-OPT:roundoff=2:Olimit=0:div_split=ON:alias=typed" -Ofast means "-O3 -ipa -OPT:Ofast"

We're not going to get into the meaning of all those options here (see the User Guide or man pages for that) but we'll note that the "-ipa" option enables inter-procedural analysis which is a mechanism for performing optimization on the whole program instead of only on individual source files. The "-Ofast" option implies "-ipa" which in turn has the greatest potential benefit when used with the "-O3" level optimization. Compiling with "-ipa" also requires "-ipa" when linking since the PathScale inter-procedural analysis has its own linker. This may be getting confusing (and we've only scratched the compiler option surface), so let's get back to the point of this article: try the optimization flags in increasing degrees of aggressiveness, "-O2," "-O3," "-O3" "-OPT:Ofast," "-Ofast", and back off if your numerical results are getting messed up. If that happens, the User Guide offers a few more words of advice:

  1. If your code is slower at "-O3" than at "-O2," try "-O3 -LNO:prefetch=0.
  2. If you are having numerical problems with "-O3 -OPT:Ofast," try -O3 -OPT:Ofast:ro=1 or -O3 -OPT:Ofast:div_split=OFF.

There are many other techniques that are available to the determined user who needs to squeeze out all the performance possible, including the "pathopt2" tool for running through compiler flag experiments, more detailed fiddling with options, including "-ipa," and feedback-directed optimization which gathers run-time profiles to aid in recompilation of the code. See the User Guide for details. But the few options presented here seem to be adequate for most codes.

For an example of using these options on a pair of tsunami models, see > Tom Logan's recent article .

Finally, readers are reminded of a utility for determining what compiler options were used to create an executable, "pathhow-compiled" which was discussed as a > Quick-Tip answer in issue 368

Quick-Tip Q & A



A: [[ I just moved my code to midnight and ran my first job.  Now
   [[ I've got a netcdf output file (output.nc).  Is there an easy way
   [[ to compare my new output file with the results I got from iceberg
   [[ for the same input.  I want to make sure the optimizations I'm
   [[ using haven't given me incorrect results.  Say I want to verify
   [[ the differences between corresponding values in the two files
   [[ isn't larger than 0.001, is there a straightforward way to do that?
   [[
   [[ I did an ncdump of each file and started comparing them manually,
   [[ but it's taking way too long!



#
# Thanks to Jed Brown for this solution
#

There are two tools that I find quite useful for working with netCDF
files.  The netCDF Operators toolbox (
http://nco.sourceforge.net/
)
performs lots of operations on netCDF files.  There are several
command line programs that serve different functions.  Ncview
(
http://meteora.ucsd.edu/~pierce/ncview_home_page.html
) is very handy
for quick visualization.  The only format restriction is that you need
to have `dimension variables' (which are a good idea anyway).

$ ncdiff one.nc two.nc one-minus-two.nc
$ ncview -minmax all one-minus-two.nc

You can look at the color plots which will show you where the
differences are occurring.  The total range of data will be shown at the
top of the screen.  You can step through time and move the active slice
around.  If you really want to automate the comparison, you can write a
script using NCO, but there are some corner cases to take care of.

#
# Editor's note: NCO and ncview are available on iceberg, midnight
# and ARSC Linux workstations
#


#
# Ed Kornkven shared this solution
#

While not specific to netCDF files, I have used "ndiff" for comparing
files of numbers:

  
http://www.math.utah.edu/~beebe/software/ndiff/index.html


Ndiff is an attempt to relax the character-by-character differencing
of "diff" and make comparisons based on the numerical meaning of the
files' contents.  There are flags for specifying the threshold of what
numerical difference between two values constitutes the "same value" and
that threshold can be expressed as either an absolute or relative error.
The source is freely available from the web site.



Q: Do vi or VIM offer any way to repeat commands issued at the colon
   prompt (i.e., "ex" commands)?  For instance, I might want to execute
   the following search and replace on one line and then hop to another
   line and do it again:

       :s/-/=/g

[[ Answers, Questions, and Tips Graciously Accepted ]]


Current Editors:
Ed Kornkven ARSC HPC Specialist ph: 907-450-8669
Kate Hedstrom ARSC Oceanographic Specialist ph: 907-450-8678
Arctic Region Supercomputing Center
University of Alaska Fairbanks
PO Box 756020
Fairbanks AK 99775-6020
E-mail Subscriptions: Archives:
    Back issues of the ASCII e-mail edition of the ARSC T3D/T3E/HPC Users' Newsletter are available by request. Please contact the editors.
Back to Top