[Menu Bar] Resourses at ARSC Science at ARSC Newsroom Support About ARSC ARSC Home

 

ARSC HPC Users' Newsletter 289, March 26, 2004

Newsletter Index Quick-Tip Index Search Newsletters

Contents

 

ARSC Events and User Training

Supercomputing Lecture

2 pm Tuesday, March 30
Butrovich Building, room 109

Topic: Experience with Unified Parallel C
Speaker: Dr. Tarek El-Ghazawi

Abstract:

UPC, or unified parallel C, is an explicit parallel extension of ISO C, which follows the Partitioned Global Address Space (GPAS) programming model. UPC combines the ease of programming of the Shared Memory Model with the ability to control and exploit data locality, as in the message passing paradigms.

UPC has been implemented for the Cray X1, Cray T3D/T3E, SGI O2000/O3000, Beowulf Clusters, and IBM SP.

In this talk, we will share some of the benchmarking results of UPC against other popular programming paradigms using the NAS Parallel Benchmark and other workloads. Scalability and programming efforts results will be discussed. We will also show that should UPC compilers employ a number of simple optimizations, UPC will compare favorably with current popular paradigms in both execution and code development time.

The Speaker:

Dr. El-Ghazawi is a Professor in the Department of Electrical and Computer Engineering at The George Washington University, http://www.gwu.edu/ and founding Co-Director of the High-Performance Computing Lab (HPCL).
Discovery Tuesday

First Tuesday of every month.

Next topic:

Visualizing Space Weather: Real-Time Operation of the UAF Eulerian Parallel Polar Ionosphere Model

Presented by:

Sergei Maurits, ARSC Visualization Specialist

1 pm Tuesday, April 6, 2004
375C Rasmuson
ARSC Discovery Lab

More details:
http://www.arsc.edu/news/disctuesday.html

ARSC User Training

9:15-11:15 am, Tuesdays and Thursdays
Gruening Building, room 211

TOPICS

30-Mar :: Storage and Managing Data
1-Apr :: Visualization and Complex Data Presentation, Part 1
6-Apr :: Visualization and Complex Data Presentation, Part 2
8-Apr :: Visualization and Complex Data Presentation, Part 3
13-Apr :: Visualization and Complex Data Presentation Examples
15-Apr :: Debugging

More details:

http://www.arsc.edu/support/training.html

 

Using TurboMPI 3.0

[ Thanks to Don Bahls of ARSC for this article. ]

TurboMP is a set of two API libraries developed by IBM which take advantage of on-node shared memory. These libraries consist of a small set of collective functions from the MPI API as well as an implementation of SHMEM. This article will focus on TurboMPI, the optimized version of MPI in TurboMP.

TurboMPI provides implementations of MPI collective functions such as MPI_Barrier, MPI_Allreduce, MPI_Reduce, MPI_Bcast, MPI_Alltoall, and MPI_Alltoallv.

Usage:

The TurboMPI versions of these functions can be linked in by adding the following libraries and library path to your link line:

  -L/usr/local/lib -lturbo1 -lxlf90_r

Note that in the current release, TurboMP 3.0, the xlf90 library is required for both C and FORTRAN programs.

MPI-1 functionality is in the turbo1 library (libturbo1.a), while MPI-2 functionality is in the turbo2 library (libturbo2.a). The turbo2 library is a superset of turbo1 so only one of these should be included. The documentation recommends linking with turbo2 only if MPI-2 functionality is needed.

Performance:

To assess the performance gain from TurboMP over the default IBM MPI implementation, I ran the Pallas MPI benchmark on both a single P655+ and P690+ node of ARSC's recently installed SP cluster, "iceberg." The benchmark was run with and without the TurboMPI libraries and all runs, including those using the default MPI, set "MP_SHARED_MEMORY=yes".

On both the P655+ and P690+ nodes I saw the same thing. MPI_Reduce and MPI_Allreduce performed better than the standard MPI implementations of these functions for all message sizes up to 4 MB (the largest message size in the Pallas benchmark). For messages under 1024 bytes, these calls completed consistently in under half the time of the standard MPI calls

With other operations such as MPI_Bcast and MPI_Alltoall there were no noticeable differences in performance.

Results can obviously vary quite a bit from program to program, but this quick experiment suggests that if your code spends a lot of time in MPI_Reduce or MPI_Allreduce calls, TurboMPI could be an easy way to shave some run-time off your model.

 

X1: make or gmake ?

Cray supports many open source utilities on the X1 (dubbed "Cray Open Source" or "COS" to distinguish them from the set of regular Cray utilities). To access the open source versions, load one of two modules provided by ARSC:

     open
     open_lastinpath

If you load "open", the paths to the COS utilities will be prepended to your path variables. If you load "open_lastinpath", the COS paths will be appended.

Some of the open source utilities are not duplicated in the standard Cray utilities (e.g., less, gnuplot, vim, seq). Other utilities, however, are duplicated, but, in some cases, provide different features and command line options (e.g., make, find, diff, tar).

The reason ARSC provides the two different modules is to make it easier for you to select which version of the utilities will execute. Here are some options for setting up your account:

  1. Load module open (the default):

    EFFECT:
    Always use the COS version when it exists, otherwise use Cray.

  2. Don't load module open or open_lastinpath (you'll have to comment out the "module load open" command from your .login or .profile):

    EFFECT:
    You'd only get Cray's standard utilities, and never the COS version.

  3. Load module open_lastinpath (in your .login or .profile, replace "module load open" with "module load open_lastinpath"):

    EFFECT:
    Always use the Cray version when it exists, otherwise use the open source version.

  4. Mix and match... On top of the above blanket configurations, you can always specify explicit paths to any of the utilities, or define personal shell aliases.

    For instance, using korn shell syntax, the following would get you gnutar but standard make (not gmake), regardless of which ARSC "open" module you loaded:

    alias tar=/opt/open/open/bin/tar
    alias make=/usr/bin/make

EXAMPLE SESSION:

This is a pretty ugly picture, but if a picture's worth 1000 words, this may help clarify:

  KLONDIKE$      module list
    Currently Loaded Modulefiles:
      1) modules      5) craylibsci   9) craytools   13) open
      2) X11          6) CC          10) cal
      3) PrgEnv       7) mpt         11) totalview
      4) craylibs     8) cftn        12) pbs
  KLONDIKE$      whence make   
    /opt/open/open/bin/make
  KLONDIKE$      whence less   
    /opt/open/open/bin/less
  KLONDIKE$      module unload open
  KLONDIKE$      whence make       
    /usr/bin/make
  KLONDIKE$      whence less       
  KLONDIKE$      module load open_lastinpath
  KLONDIKE$      module list
  Currently Loaded Modulefiles:
    1) modules           6) CC               11) totalview
    2) X11               7) mpt              12) pbs
    3) PrgEnv            8) cftn             13) open_lastinpath
    4) craylibs          9) craytools
    5) craylibsci       10) cal
  KLONDIKE$      whence make                
    /usr/bin/make
  KLONDIKE$      whence less                
    /opt/open/open/bin/less
MORE ON MODULES:

Here are some basic module commands: