ARSC HPC Users' Newsletter 289, March 26, 2004

ARSC Events and User Training

Supercomputing Lecture

2 pm Tuesday, March 30 Butrovich Building, room 109

Topic: Experience with Unified Parallel C Speaker: Dr. Tarek El-Ghazawi

Abstract:

UPC, or unified parallel C, is an explicit parallel extension of ISO C, which follows the Partitioned Global Address Space (GPAS) programming model. UPC combines the ease of programming of the Shared Memory Model with the ability to control and exploit data locality, as in the message passing paradigms.

UPC has been implemented for the Cray X1, Cray T3D/T3E, SGI O2000/O3000, Beowulf Clusters, and IBM SP.

In this talk, we will share some of the benchmarking results of UPC against other popular programming paradigms using the NAS Parallel Benchmark and other workloads. Scalability and programming efforts results will be discussed. We will also show that should UPC compilers employ a number of simple optimizations, UPC will compare favorably with current popular paradigms in both execution and code development time.

The Speaker:

Dr. El-Ghazawi is a Professor in the Department of Electrical and Computer Engineering at The George Washington University, http://www.gwu.edu/ and founding Co-Director of the High-Performance Computing Lab (HPCL).

Discovery Tuesday

First Tuesday of every month.

Next topic:

Visualizing Space Weather: Real-Time Operation of the UAF Eulerian Parallel Polar Ionosphere Model

Presented by:

Sergei Maurits, ARSC Visualization Specialist

1 pm Tuesday, April 6, 2004 375C Rasmuson ARSC Discovery Lab

More details:

http://www.arsc.edu/news/archive/disctuesday.html

ARSC User Training

9:15-11:15 am, Tuesdays and Thursdays Gruening Building, room 211

TOPICS

30-Mar :: Storage and Managing Data 1-Apr :: Visualization and Complex Data Presentation, Part 1 6-Apr :: Visualization and Complex Data Presentation, Part 2 8-Apr :: Visualization and Complex Data Presentation, Part 3 13-Apr :: Visualization and Complex Data Presentation Examples 15-Apr :: Debugging

More details:

http://www.arsc.edu/support/training.html

Using TurboMPI 3.0

[ Thanks to Don Bahls of ARSC for this article. ]

TurboMP is a set of two API libraries developed by IBM which take advantage of on-node shared memory. These libraries consist of a small set of collective functions from the MPI API as well as an implementation of SHMEM. This article will focus on TurboMPI, the optimized version of MPI in TurboMP.

TurboMPI provides implementations of MPI collective functions such as MPI_Barrier, MPI_Allreduce, MPI_Reduce, MPI_Bcast, MPI_Alltoall, and MPI_Alltoallv.

Usage:

The TurboMPI versions of these functions can be linked in by adding the following libraries and library path to your link line:


  -L/usr/local/lib -lturbo1 -lxlf90_r

Note that in the current release, TurboMP 3.0, the xlf90 library is required for both C and FORTRAN programs.

MPI-1 functionality is in the turbo1 library (libturbo1.a), while MPI-2 functionality is in the turbo2 library (libturbo2.a). The turbo2 library is a superset of turbo1 so only one of these should be included. The documentation recommends linking with turbo2 only if MPI-2 functionality is needed.

Performance:

To assess the performance gain from TurboMP over the default IBM MPI implementation, I ran the Pallas MPI benchmark on both a single P655+ and P690+ node of ARSC's recently installed SP cluster, "iceberg." The benchmark was run with and without the TurboMPI libraries and all runs, including those using the default MPI, set "MP_SHARED_MEMORY=yes".

On both the P655+ and P690+ nodes I saw the same thing. MPI_Reduce and MPI_Allreduce performed better than the standard MPI implementations of these functions for all message sizes up to 4 MB (the largest message size in the Pallas benchmark). For messages under 1024 bytes, these calls completed consistently in under half the time of the standard MPI calls

With other operations such as MPI_Bcast and MPI_Alltoall there were no noticeable differences in performance.

Results can obviously vary quite a bit from program to program, but this quick experiment suggests that if your code spends a lot of time in MPI_Reduce or MPI_Allreduce calls, TurboMPI could be an easy way to shave some run-time off your model.

X1: make or gmake ?

Cray supports many open source utilities on the X1 (dubbed "Cray Open Source" or "COS" to distinguish them from the set of regular Cray utilities). To access the open source versions, load one of two modules provided by ARSC:


     open
     open_lastinpath

If you load "open", the paths to the COS utilities will be prepended to your path variables. If you load "open_lastinpath", the COS paths will be appended.

Some of the open source utilities are not duplicated in the standard Cray utilities (e.g., less, gnuplot, vim, seq). Other utilities, however, are duplicated, but, in some cases, provide different features and command line options (e.g., make, find, diff, tar).

The reason ARSC provides the two different modules is to make it easier for you to select which version of the utilities will execute. Here are some options for setting up your account:

  1. Load module open (the default):

    EFFECT: Always use the COS version when it exists, otherwise use Cray.

  2. Don't load module open or open_lastinpath (you'll have to comment out the "module load open" command from your .login or .profile):

    EFFECT: You'd only get Cray's standard utilities, and never the COS version.

  3. Load module open_lastinpath (in your .login or .profile, replace "module load open" with "module load open_lastinpath"):

    EFFECT: Always use the Cray version when it exists, otherwise use the open source version.

  4. Mix and match... On top of the above blanket configurations, you can always specify explicit paths to any of the utilities, or define personal shell aliases.

    For instance, using korn shell syntax, the following would get you gnutar but standard make (not gmake), regardless of which ARSC "open" module you loaded:

    alias tar=/opt/open/open/bin/tar alias make=/usr/bin/make

EXAMPLE SESSION:

This is a pretty ugly picture, but if a picture's worth 1000 words, this may help clarify:


  KLONDIKE$      module list
    Currently Loaded Modulefiles:
      1) modules      5) craylibsci   9) craytools   13) open
      2) X11          6) CC          10) cal
      3) PrgEnv       7) mpt         11) totalview
      4) craylibs     8) cftn        12) pbs
  KLONDIKE$      whence make   
    /opt/open/open/bin/make
  KLONDIKE$      whence less   
    /opt/open/open/bin/less
  KLONDIKE$      module unload open
  KLONDIKE$      whence make       
    /usr/bin/make
  KLONDIKE$      whence less       
  KLONDIKE$      module load open_lastinpath
  KLONDIKE$      module list
  Currently Loaded Modulefiles:
    1) modules           6) CC               11) totalview
    2) X11               7) mpt              12) pbs
    3) PrgEnv            8) cftn             13) open_lastinpath
    4) craylibs          9) craytools
    5) craylibsci       10) cal
  KLONDIKE$      whence make                
    /usr/bin/make
  KLONDIKE$      whence less                
    /opt/open/open/bin/less
MORE ON MODULES:

Here are some basic module commands:

  • show all available modules:

    % module avail

  • list those loaded in my environment:

    % module list

  • unload

    % module unload <module_name>

  • load

    % module load <module_name>

Quick-Tip Q & A


A:[[ I almost always need my loadleveler script to start executing in the
  [[ same directory from which I "llsubmit" it.
  [[
  [[ Thus, my first executable command is generally a "cd" right back to
  [[ whatever that directory is.  Sometimes I copy a script to a new
  [[ directory and forget to change the "cd" command, and of course
  [[ then everything goes wrong.  Any advice?


  # 
  # From Matt MacLean
  # 

  From the Load Leveler manual, the environment variable
    LOADL_STEP_INITDIR 
  will give you the initial working directory where the job was
  submitted.

  So, starting a script with:
    cd ${LOADL_STEP_INITDIR}
  will set up the directory correctly.

  # 
  # From Kate Hedstrom
  # 

  Delete that cd line - it's not needed. The script will automatically
  start in the directory from which it is submitted.

  # 
  # Editor's note: 
  # 
  
  The other batch systems used at ARSC also provide environment
  variables which point back to the initial working directory:

    PBS (e.g., on the X1):       PBS_O_WORKDIR 
    NQS (e.g., SV1ex or T3E):    QSUB_WORKDIR

  Both PBS and NQS start your script in your home directory as if it
  were you, logging on. Thus, you must include an explicit "cd" to the
  correct working directory.



Q: Let's repeat a good one from 1998, issue #146:

     What's your favorite shell alias?  

   If you would, send the alias and a very brief explanation.
[[ Answers, Questions, and Tips Graciously Accepted ]]

Current Editors:
Ed Kornkven ARSC HPC Specialist ph: 907-450-8669
Kate Hedstrom ARSC Oceanographic Specialist ph: 907-450-8678
Arctic Region Supercomputing Center
University of Alaska Fairbanks
PO Box 756020
Fairbanks AK 99775-6020
E-mail Subscriptions: Archives:
    Back issues of the ASCII e-mail edition of the ARSC T3D/T3E/HPC Users' Newsletter are available by request. Please contact the editors.
Back to Top