ARSC HPC Users' Newsletter 343, July 07, 2006

$PET_HOME

The User Productivity Enhancement and Technology Transfer (PET) initiative of the High Performance Computing Modernization Program provides a set of common libraries in the $PET_HOME directory on all HPCMP resources including Iceberg and Klondike. This ensures that many of the most common scientific libraries are available on any HPCMP resource you might use.

The environment variable $PET_HOME is defined as the location of the software installed by the PET group.


iceberg2 1% ls $PET_HOME 
MATH     bin      bin32    bin64    doc      example  include  
info     lib      lib32    lib64    libexec  man      pkgs     
share    src      tau

The installed libraries include:

Library Purpose
PETSc Ordinary Differential Equations & Partial Differential Equations
FFTW Fast Fourier Transforms
ScaLAPACK High Performance Linear Algebra Routines for distributed memory message passing
LAPACK High Performance Linear Algebra Routines
PAPI Performance Analysis Libraries
Atlas Automatically Tuned Linear Algebra Software
HDF Hierarchical Data Format
NetCDF Network Common Data Form
SuperLU Direct solution for non-symmetric sparse matrices
TAU Tuning and Analysis Utilities

You can see the installation status for ARSC and other centers here:

http://rib.cs.utk.edu/rib3app/catalog?rh=35&class=Deployment

If you're wondering how to use any of these libraries, be sure to read the next article.

Numerical Libraries & Parallel Performance Evaluation Workshops

ARSC is pleased to offer two workshops in July taught by the PET group. Registration is required for these events, see the links below for more information.

Numerical Libraries Workshop

When: July 11 - 12th, 8:00 AM - 5:00 PM Where: ARSC Classroom, WRRB 009 Instructors: Piotr Luszczek and Julien Langou, both Research Scientists at the Innovative Computing Lab at the University of Tennessee, Knoxville. Course description:

This course will describe widely used open-source mathematical libraries that are available at ARSC. They can deliver performance, functionality, support and adaptativity that is often on par if not exceeding similar offerings from supercomputer vendors.

The specific libraries to be discussed are:

ARPACK is a package for eigenvalue computations that uses a reverse communication approach to solve sparse eigenvalue computations on sequential or parallel distributed platform.

SuperLU is a LU factorization package for sparse matrices available in sequential, threaded and parallel versions.

ATLAS is a self-tuning library that implements Basic Linear Algebra Subroutines (BLAS) and a portion of LAPACK.

PETSc solves PDE's and ODE's in parallel.

FFTW is a library that can greatly enhance various Fourier-type transforms on discrete data.

Target Audience: This course is geared toward application developers and support staff responsible for assisting application developers. More Information: Registration is required for this training. Please see the following link for instructions. http://www.arsc.edu/support/training/NumLibs20060711.html

Parallel Performance Evaluation Tools Workshop

When: July 25 - 26th, 8:00 AM - 5:00 PM Where: ARSC Classroom, WRRB 009 Instructors: Dr. Shirley Moore & Dr. Sameer Shende Course Description:

To meet the needs of computational scientists to evaluate the performance of their parallel, scientific applications, we present four parallel performance evaluation tools: TAU, PAPI, KOJAK, and PerfSuite, all of which are available at ARSC.

This workshop will focus on performance data collection, analysis, and performance optimization.

The workshop will present how performance data can be collected using either TAU's automated instrumentation or from hardware performance counters using PAPI.

The bulk of the workshop will cover how to analyze the performance data collected and drill down to find performance bottlenecks and determine their causes.

The workshop will include some sample codes that illustrate the different instrumentation and measurement choices available to users.

We will attempt to collect and analyze performance data for additional user codes during the hands-on portion of the workshop. Users and developers are welcome to contact the instructors ahead of time to begin collecting data so as to have it on hand for the workshop.

Target Audience: This course is geared towards computational scientists. More Information: Registration is required for this training. Please see the following link for instructions. http://www.arsc.edu/support/training/PerfTools20060725.html

Postdoctoral Seminars

The ARSC Postdoctorial Seminar series continues on July 19th with a talk by Dr. Peter Webley.

Topic: Ash Dispersion Modeling of North Pacific Volcanoes Where: ARSC Conference Room WRRB 010 When: July 19, 1:00-2:00 PM

For more information on this and other Postdoctorial Seminars, see:

http://www.arsc.edu/science/postdocseminars2006.html

Quick-Tip Q & A


A:[[ Can ftp perform recursive "get" and "put"?  I want to retrieve a 
  [[ directory and everything it contains.  If ftp can't do this, is 
  [[ there a different way?

#
# Thanks to Kate Hedstrom:
#

I would use tar to make a "tarball" of the directory tree I wanted to
transfer:

tar cvzf dir.tar.gz dir

ftp dir.tar.gz over, then unpack it:

tar xvzf dir.tar.gz

This is assuming gnu tar. Leave the z out of the list for older tars,
then do the gzip/bzip2 on your own.


#
# Thanks to Don Morton and Rich Griswold
# 

GNU wget will do recursive gets with the -r (recursive) option.
You'll probably also want to use the --no-parent option to prevent
wget from fetching contents of parent directories.  For example,
if you wanted to get everything in the games directory of the x.org
ftp site, you'd use:

  wget -xcr --no-parent ftp://ftp.x.org/contrib/games

The 'x' in -xcr will tell wget to build a directory tree instead of
dumping everything in the same directory, and the 'c' will tell wget
to continue where it left off if the download is interrupted.


#
# Thanks to Jed Brown
#

You almost certainly want to use rsync.  It works over an ssh tunnel
(by default on modern systems).  The `-a' option means `archive' which
preserves symbolic links, permission, ownership, etc.  It will only
copy files which are not already (up to date) in the destination dir.

rsync -a host:path/to/dir dest


#
# Thanks to Kurt Carlson and Greg Newby 
# 

One can probably get recursive with scp, but scp has the disadvantage
of encrypting everything (whether you want it to or not) which greatly
slows down the copy and consumes large amounts of CPU on both the
sending and receiving hosts for large files.

Both rcp and krcp have '-r' (recursive) options.  With rcp you are
limited to systems which permit the old less-secure R-services.
For krcp, you need to specify -X to avoid the encryption penalty.


# 
# And last but not least thanks to Greg Newby for this cool method
# using tar and ssh 
#

This is a method I use when I have many files in lots of directories
to transfer from one system to another.  My reasoning is that another
method, such as "scp -r" or FTP, will require many individual file
transfers between the two systems.

By using tar and a pipe, the tar program on either system handles
the individual files.  This allows the network link to transfer data
as quickly as possible, without needing to negotiate each individual
file transfer.

Here is a basic example, from my desktop Mac to nanook.arsc.edu.
This assumes I already have a kerberos ticket for nanook, can login,
and I have a kerberized version of ssh.  Let's imagine I have a whole
directory tree to send:

  tar cf - dirtree 
 ssh nanook.arsc.edu "tar xf -"

The "-" in tar says to write to standard output (with "c" for create)
or read from standard input (with "x" for extract).  So, I'm basically
creating a stream of data on STDOUT on my Mac, and sending it to the
"tar" command reading STDIN on nanook.

"dirtree" in this example is the name of a directory.  It will be
untarred and created on nanook.arsc.edu.

There are some twists, of course.  First, the idle process timeout
might impact your ssh session if it lasts too long.  Second, you
probably don't have sufficient disk quota in your home directory for
really big directory trees, so you'll want to change directories first.
This is easy: just cd first, and separate the tar command with a
semicolon.  The semicolon is a way of putting multiple commands on
a line:

  tar cf - dirtree 
 ssh nanook.arsc.edu "cd archive ; tar xf -"

In this case, I already have a symlink of my ${ARCHIVE} to
${HOME}/archive .  Do you think you could use a variable set in your
shell initialization file, such as ${ARCHIVE} ?  Why not find out.

One of the neatest parts about this method is that you don't need to
create an intermediate tar file.  Instead, the tar file is ethereal:
it's just a series of data on the pipe, which in my example uses ssh
to cross between systems.

Tar is one of very few Unix commands for which the hyphen before an
option is not required.  So, "tar -xf -" is the same as "tar xf -".
In my example, and in daily usage, I opt to leave the optional
hyphen out.  But I'm a lazy typist -- I even have an alias for "l"
instead of "ls" to save me typing that extra "s"


Q: I have an existing library written in C/C++ that I would like to
   call from a python script.  Is there a way to call this existing
   code without reimplementing the code in python?
      

[[ Answers, Questions, and Tips Graciously Accepted ]]


Current Editors:
Ed Kornkven ARSC HPC Specialist ph: 907-450-8669
Kate Hedstrom ARSC Oceanographic Specialist ph: 907-450-8678
Arctic Region Supercomputing Center
University of Alaska Fairbanks
PO Box 756020
Fairbanks AK 99775-6020
E-mail Subscriptions: Archives:
    Back issues of the ASCII e-mail edition of the ARSC T3D/T3E/HPC Users' Newsletter are available by request. Please contact the editors.
Back to Top