ARSC HPC Users' Newsletter 311, March 11, 2005



X1 Default Programming Environment now PE 5.3

On 03/09/2005 during scheduled downtime, ARSC made PE the default X1 programming environment. This environment has been available as for testing since Jan 19th. The previous default, PE5.2, will remain available as PrgEnv.old until the next major PE upgrade.

The programming environments are now configured as follows:

PrgEnv.old : PE PrgEnv (the default): PE : PE


Iceberg High Performance Switch

Iceberg was one of the first systems to be deployed with IBM's High Performance Switch (HPS), also known as the Federation Switch. This switch has lower latency and improved transfer rates over previous IBM switches. Each of the p655+ nodes on iceberg has two Federation network adapters while the p690+ nodes have 4 Federation network adapters each.

A job can select to use one or all available network adapters. The "adapter set" for a job is normally defined via the Loadleveler "network" keyword.


# @ network.MPI = sn_single,shared,us
# @ network.MPI = sn_all,shared,us

Though it can also be set via the poe environment variable MP_EUIDEVICE. E.g.:

export MP_EUIDEVICE=sn_single
export MP_EUIDEVICE=sn_all

The sn_single network "adapter set" will use a single network adapter on each node when the job is run. Users familiar with Power3 and first generation Power4 may be accustomed to the adapter set css0. The adapter set css0 can be used interchangeably with sn_single to designate that only one of the Federation network adapters be used.

The sn_all network "adapter set" will use all available Federation network adapters for each task. As of HPS Service Pack 11 the primary benefit of using sn_all vs sn_single is that it provides an additional degree of fault tolerance for a job by providing a redundant communication link. The adapter set csss from previous IBM switch versions can be used interchangeably with sn_all to designate that a job use all available adapters.

RDMA transport:

During April the latest switch service pack is scheduled to be installed on iceberg. This service pack will include the enabling of Remote Direct Memory Access (RDMA) transport functionality. RDMA allows a portion of the segmentation and reassembly of messages to be offloaded to the network adapters reducing the CPU overhead while communications are occurring. Codes using non-blocking MPI calls should benefit in particular because of the reduced burden placed on the CPUs while communications are occurring. RDMA also enables the protocols which allow a message or multiple messages from a single task to be striped across different network adapters. In April or May we hope to tell you about our first experiences with RDMA and striping so stay tuned!


Scicomp 11 and CUG 2005

---------------------------------------- IBM System Scientific User Group Meeting: ---------------------------------------- ScicomP 11: May 31 to June 3, 2005 SP-XXL: May 30 to June 3, 2005 Hosted by EPCC University of Edinburgh Scotland

This is a co-located meeting:- ScicomP and SP-XXL will be running parallel, but separate, programs.

More information:

Important Dates:

8 April 2005 (Friday) - Deadline for Abstracts 1 May 2005 (Sunday) - Early Registration Closes 31 May 2005 (Tuesday) - ScicomP Tutorial(s) 1-3 June 2005 (Wed-Fri) - ScicomP Meeting

------------------------ Cray User Group Meeting: ------------------------ Announcing CUG 2005 Albuquerque, New Mexico, May 16-19, 2005 Hosted by Sandia National Laboratories Albuquerque Marriott Pyramid North

More information:


X1 Performance Analysis, Zooming In

When profiling a code in an attempt to locate performance bottlenecks, you might discover that the subroutine which consumes most of the time is itself quite large. It can then be helpful to zoom in further, to the level of individual loops or lines. Cray's X1 performance analysis tool, PAT, gives a couple of methods for focusing in on small code blocks.

PROFILING METHOD: =================

The first method shows the percentage of time spent on each individual line of code, and is the easiest method to use. Follow the usual procedure for running a call stack PAT experiment, and then instruct pat_report to include line numbers in the report (details below).

The report will look something like the following:

  Table 1:  -d samples%
            -b ssp,function,line






















The above sample output indicates, for instance, the main program unit consumed 59.5% of the total time, of which 24.5% was spent on line 161. Similarly, the subroutine bessy1 consumed 11.1% of the total time, of which 40.8% was spent on line 301 (where this is line 301 of the source file which contains the routine bessy1).

Here's a script to remind you of the steps required to produce a report like this, and to actually perform the report generation step for you. (A similar script, which produces a call-tree report, was provided in issue #305 .)

Script name: ./patfuncline


# Script:   ./patfuncline
# This script generates a PAT report.  
# Tom Baring, ARSC, Dec 2004


SCRPT=$(basename ${0}) 
SYNTX="Syntax: $SCRPT [-h] <instrumented_exec_name> <name_of_xf_file>" 
case $1 in
  "-h" ) 
echo $SYNTX
echo "Preparation:"
echo  " 1. Instrument the executable: pat_build <executable_name> <instrumented_exec_name>"
echo  " 2. In the PBS script (or interactive environment):"
echo  "    [ksh]  export PAT_RT_EXPERIMENT=$PAT_RT_EXPERIMENT"
echo  "    [csh]  setenv PAT_RT_EXPERIMENT $PAT_RT_EXPERIMENT"
echo  " 3. Run the instrumented executable. E.g., mpirun -np 4 ./<instrumented_exec_name>"
echo  " 4. This produces the .xf file needed by this script "
return 0 ;;


D="-d samples%"
B="-b ssp,function,line"
S="-s percent=relative"

echo "Generating report:   ${RPTFILE}"
pat_report $D $B $S -i $EXEFILE -o $RPTFILE $XFFILE

TRACING METHOD: ===============

You can also define and trace any arbitrary region of code you may find interesting, as another way to "zoom in." To do so, modify your C or Fortran source code, adding calls to the PAT API routine "pat_trace_user." Bracketing the interesting regions with "pat_trace_user" makes them appear, in the PAT report, as if these regions of code were actual subroutines.

Fortran programs must "use pat_api", C programs must "#include pat_api.h". See "man pat_build" for more on the PAT API.

The pat_trace_user routine takes one argument, any string you chose to identify the region being traced. Regions may not be nested, as a region ends with the next call to pat_trace_user. This example creates two regions for tracing, identified as "loop_1" and "loop_2":

       use pat_api 

       ! ... 

       call pat_trace_user ("loop_1")  !---- start region "loop_1"

       DO i = 1,c-1 ; DO j = 1,d-1
           ! do work
       END DO ; END DO

       call pat_trace_user ("loop_2")  !---- start "loop_2", end "loop_1"

       DO i = c+1,n ; DO j = d+1,m
           ! do work
       END DO ; END DO

       call pat_trace_user ("")        !---- end "loop_2"

To perform a tracing experiment, you recompile the code as usual and then run pat_build as follows:

   pat_build -w -t <trace.routines> <exec> <output_instrumented_exec>

where you've created the file <trace.routine> and it contains a list of all the identifiers appearing in the "pat_trace_user" calls, and also the names of any regular subroutines/functions you'd like to trace. (Note, in this file, regular subroutine/function names must be given in all lower-case with a trailing "_" added. They should appear as they do in the output of the "nm <exec>" command.)

For instance, the file <trace.routines> might contain this text:


The file <exec> is the compiled executable. The <output_instrumented_exec> file is the output of the pat_build command.

The next step is to run the <output_instrumented_exec> binary. This will produce a file (with the extension ".xf") of output trace data. And finally, you run pat_report against the .xf file to produce the report which will will show the regions of code you identified:

  Table 1:  -d time%,cum_time%,time,traces,P,E,M
            -b ssp,function,callers



Here's a script to simplify using PAT to produce a tracing report like this.

Script name: ./pattrace


# This script generates a PAT report.  
# Tom Baring, ARSC, Mar 2005

SCRPT=$(basename ${0}) 
SYNTX="Syntax: $SCRPT [-h] <instrumented_exec_name> <name_of_xf_file>" 
case $1 in
  "-h" ) 
echo $SYNTX
echo "Preparation:"
echo  " 1. Instrument the executable: pat_build -w -t <routine_list> <executable_name> <instrumented_exec_name>"
echo  " 2. Unset the PAT_RT_EXPERIMENT environment variable, if it's set"
echo  " 3. In the PBS script (or interactive environment) set PAT_RT_SUMMARY:"
echo  "    [ksh]  export PAT_RT_SUMMARY=0"
echo  "    [csh]  setenv PAT_RT_SUMMARY 0"
echo  " 4. Run the instrumented executable. E.g., mpirun -np 4 ./<instrumented_exec_name>"
echo  " 5. This produces the .xf file needed by this script "
return 0 ;;


B="-b ssp,function,callers"

echo "Generating report:   ${RPTFILE}"
pat_report $B -i $EXEFILE -o $RPTFILE $XFFILE

Quick-Tip Q & A

A:[[ Is there an easy way to get the contents of a web page via the
  [[ command line?  I would like to download several web pages.  It
  [[ would be nice to automate the process some how.  Is this kind of
  [[ functionality available in any programming languages or in a
  [[ command line utilities?

# Thanks to Martin Luthi

The standard tool for this is wget. It has a bazillion options,
including link-rewriting, mirroring, extracting specific
document types...

If it is not installed, you'll find the software and documentation here

You can easily write a shell script and put it into your crontab.

If you need really sophisticated options, somebody has done it in
either Python or Perl (Google is your friend).

# Thanks to Jed Brown

Most text based web browsers have a -dump option which writes the page
to stdout. I sometimes use w3m like this if I want a quick glance at a
url. But it sounds like maybe you want to download, for instance, a
selection of web pages so you can browse them later when you are
off-line. GNU Wget is a wonderful tool for such things. It will
non-interactively mirror sites, act as a web spider, use cookies,
download recursively (very customizable) and handles unstable network
connections well.

# Thanks to Greg Newby

Three common utilities for this are wget, lynx and curl.  Lynx is a
text-only Web browser.  It's fully functional, other than
understanding secure http (https) and ignoring Javascript.  Attentive
Web page designers often test their page with Lynx, because it's
commonly used by visually impaired persons, people with low-bandwidth,
and others.

The wget and curl commands are similar, and both have a slew of
options.  They can get Web pages and associated images, and even
follow links to create a mirror of a remote site.

These three commands are not currently available on most ARSC systems,
though I found Lynx on the SGIs and curl on the MacOS systems.
They're easy to find, download and compile for your own Unix/Linux
system, though.

Here's some sample usage.  These programs can also access FTP and a
few other protocols, similar to your standard graphical Web browsers.
The question asked about Web pages, but I'll give an FTP example, too.

a) Use Lynx to interactively view Web pages (navigate with your
keyboard's arrow keys):

b) Use Lynx to download a plain text formatted copy of a Web page:
        lynx --dump
  or the HTML source code:
        lynx --source
  or save the HTML source code to a file:
        lynx --source > arsc-index.html

c) Use curl to retrieve a file via FTP and display it to the screen:
   or output to a file:
        curl > README

d) Use wget to retrieve a file.  It's saved to a local
   file automatically:
        wget  (saves "index.html" to your directory)

e) Use wget to get an eBook of Muir's Travels in Alaska,
   then unzip and use Lynx to read it:
        unzip  (unzips several files)
        lynx ./trlsk10h.htm
  or, use Lynx to format a few pages for your printer (if 'lpr' is
  set up):
        lynx --dump ./trlsk10h.htm 
 head -1000 

f) Use wget to download all the copies of the ARSC HPC User's
        wget --mirror --no-host-dir --no-parent

  (This will create a subdirectory "support/news/" with copies of all
  the files, automatically descending to subdirectories, but not
  following external links or ascending to parent directories on the

Each command has somewhat different functionality and limitations.
They are collectively excellent for downloading content to a Unix
system via http, https or FTP.  Putting suitable commands in a shell
script or loop is easy, as is redirecting output to particular files.

A major advantage of these commands over using your home/office
systems' graphical Web browser is that the commands can run on the
Unix/Linux hosts where you are doing your work.  This avoids the need
to download to one system, then upload to another - and might even
avoid some upload/download errors due when binary files or alternate
character sets are involved.

# Thanks to Brad Chamberlain

Yes!  You want to use the "wget" program, available on most unix systems
I've looked at lately (most of which have been linux/gnu-oriented).  In
its most basic form, you simply use:

        wget <URL>

So, for example, to get a copy of the ARSC front page, you can do:


It's pretty easy to drop invocations of this into your favorite
scripting language to get a series of files with predictable names, for
example.  There are a number of additional options that allow you to
supply usernames and passwords (though note that these might be visible
to others via a unix ps command), etc.

Another tool which you'll want to know about if you don't already, is
lynx, which is a text-oriented web browser that uses a pine/elm style
interface.  It provides a nice way to surf the web from the command
line, which can sometimes be lighter-weight than dealing with your
browser, and the output is generally quite readable.  It also supports
downloading of files, images, etc. to save them to disk.  Lots of good
online help available from within the program as well.  I seem to
remember once discovering a way to have lynx execute a repetitive set
of commands over and over (which provided a sloppier way to grab a
bunch of web pages in series than wget), but I can't seem to remember
how offhand.  I suspect I figured it out reading its help/options.

# Thanks to Derek Bastille

The command line tool that I normally use is curl.  This tool allows
you to use a URL to access stuff via a variety of protocols.  For
getting, say, the ARSC home page you could use:

bastille >curl > file.html
  % Total    % Received % Xferd  Average Speed          Time            
                                 Dload  Upload Total    Current  Left   
100  4909  100  4909    0     0  33394      0  0:00:00  0:00:00  0:00:00  281k
bastille >head file.html
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<title>Arctic Region Supercomputing Center</title>

Curl also allows you to use wildcards in a URL so that you can download
multiple pages with one command line.

Q: Okay, I've settled on one favorite shell (ksh) which works on all six
   (yes, 6!) different flavors of Unix I use every week.  Now my problem
   is the shell initialization files, .profile and .kshrc.  I'll add a
   new alias on one system, a library path on another, a nifty prompt
   here, a function there... and now my dot files are different, and I
   can't remember which alias I've got here, which there, and it's
   making me crazy.  Has anyone else ever had this problem?  And solved

[[ Answers, Questions, and Tips Graciously Accepted ]]

Current Editors:
Ed Kornkven ARSC HPC Specialist ph: 907-450-8669
Kate Hedstrom ARSC Oceanographic Specialist ph: 907-450-8678
Arctic Region Supercomputing Center
University of Alaska Fairbanks
PO Box 756020
Fairbanks AK 99775-6020
E-mail Subscriptions: Archives:
    Back issues of the ASCII e-mail edition of the ARSC T3D/T3E/HPC Users' Newsletter are available by request. Please contact the editors.
Back to Top