ARSC HPC Users' Newsletter 304, November 19, 2004

ScaLAPACK Intro: Part I of IV

ScaLAPACK ( ), the "Scalable Linear Algebra Package," is the distributed-memory follow-on to LAPACK. Although source code for ScaLAPACK and the lower-level libraries on which it depends, like BLACS, BLAS, and PBLAS are freely available via netlib, the major HPC vendors put considerable effort into optimizing these routines for their specific architectures. (Cray includes ScaLAPACK routines in craylib. IBM offers the ScaLAPACK functionality through a different API and set of libraries, PESSL.)

If your code calls LAPACK routines, like SGETRF and SGETRS, and you want to dramatically expand your problem size, you might consider parallelizing it to use the corresponding ScaLAPACK routines, like PSGETRF and PSGETRS.

This series of articles demonstrates the conversion of one very simple LAPACK code to ScaLAPACK. You'll immediately discover that the greatest hurdle to using ScaLAPACK is not in calling the routines, but in set-up.

The parallel solvers expect that:

  1. the processors are arranged in a "virtual processor grid," and
  2. the data are physically distributed amongst the processors in block-cyclic layout.

The concept of a "virtual processor grid" is already known to all MPI users. When you launch an MPI program with NPROCS tasks, they are arranged for you in a 1D grid, counting from 0 to NPROCS-1. E.g., if NPROCS were 6, the tasks would be arranged as: 012345 Each task knows its unique number in this 1D grid and the extend of the grid (i.e., NPROCS).

ScaLAPACK is built on BLACS ( ), the "Basic Linear Algebra Communication Subprograms," which is similar, but arranges processors in a 2D grid. E.g., if NPROCS were 6, the tasks might be arranged in a 2x3 grid, counted as: (0,0)(0,1)(0,2) (1,0)(1,1)(1,2) Each task knows its unique row and column number in this 2D grid and the extent of the grid (i.e., NROWS and NCOLS).

The concept of block-cyclic data distribution is far more difficult to visualize. We'll touch on it, but the example in this series of articles uses utilities from the "ScaLAPACK Tools" library to hide most issues of the block-cyclic distribution from the programmer.

What follows is our example LAPACK program. It initializes and solves AX=C, where A is an n x n matrix, and X and C are vectors of length n.

      program slv
! Test program, sets up and solves a simple system of equations
! using LAPACK routines.
! Compilation: Cray X1:
!   ftn -o slv slv.f90
! Compilation: IBM Power4+ Cluster: 
!   xlf90_r -q64 -qextname -qsuffix=cpp=f90 -qipa -c slv.f90
!   xlf90_r -q64 -qipa -qextname  -o slv slv.o -lessl
! ARSC, November 2004
      implicit none
      integer   ::       n,istat,info,i,j
      real,dimension(:,:),allocatable  :: a
      real,dimension(:),allocatable  :: c
      integer,dimension(:),allocatable :: ipiv

      parameter (n = 13)

        subroutine init_my_matrix (n,a)
        implicit none
        integer :: n
        real    :: a(:,:)
        end subroutine init_my_matrix

        subroutine init_my_rhs (n,c)
        implicit none
        integer :: n
        real    :: c(:)
        end subroutine init_my_rhs
      end interface 
! -----   Allocate LHS, RHS, pivot -----
      allocate (a(n,n), stat=istat)
      if (istat/=0) stop "ERR:ALLOCATE FAILS"
      allocate (c(n), stat=istat)
      if (istat/=0) stop "ERR:ALLOCATE FAILS"
      allocate (ipiv (n), stat=istat)
      if (istat/=0) stop "ERR:ALLOCATE FAILS"
! -----    Initialize and print the matrices -----
      call init_my_matrix (n,a)
      do i=1,n
        write (6, '("a(",i3,") ",13f4.0)' ) i,(a(i,j),j=1,n)

      call init_my_rhs (n,c)
      do i=1,n
        write (6, '("c(",i3,") ",f6.2))') i,c(i)
! -----    Solve -----
      call sgetrf( n, n, a, n, ipiv, info )
      print*,'info = ', info

      call sgetrs( 'n', n, 1, a, n, ipiv, c, n, info )
      print*,'info = ', info
! -----    "Post-process" -----
!     Print the solution vector.
      do i=1,n
        write (6, '("X(",i3,") ",f6.2))') i,c(i)

!     Cleanup arrays
      deallocate (a,c,ipiv)

      end program slv
      subroutine init_my_matrix (n,a)
      implicit none
      integer :: n
      real    :: a(:,:)

      integer :: i, j
      real    :: val

      do j = 1, n
        do i = 1, n
           if (i.eq.j) then 
               a(i,j) = 20. 
               a(i,j) = 0.

      subroutine init_my_rhs ( n, c)
      implicit none
      integer :: n
      real :: c(:)

      integer :: i

      do i= 1, n
        c(i) = i*100.0/n


Here's output from a run:

%   ./slv
a(  1)  20.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
a(  2)   0. 20.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
a(  3)   0.  0. 20.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
a(  4)   0.  0.  0. 20.  0.  0.  0.  0.  0.  0.  0.  0.  0.
a(  5)   0.  0.  0.  0. 20.  0.  0.  0.  0.  0.  0.  0.  0.
a(  6)   0.  0.  0.  0.  0. 20.  0.  0.  0.  0.  0.  0.  0.
a(  7)   0.  0.  0.  0.  0.  0. 20.  0.  0.  0.  0.  0.  0.
a(  8)   0.  0.  0.  0.  0.  0.  0. 20.  0.  0.  0.  0.  0.
a(  9)   0.  0.  0.  0.  0.  0.  0.  0. 20.  0.  0.  0.  0.
a( 10)   0.  0.  0.  0.  0.  0.  0.  0.  0. 20.  0.  0.  0.
a( 11)   0.  0.  0.  0.  0.  0.  0.  0.  0.  0. 20.  0.  0.
a( 12)   0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0. 20.  0.
a( 13)   0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0. 20.
c(  1)   7.69
c(  2)  15.38
c(  3)  23.08
c(  4)  30.77
c(  5)  38.46
c(  6)  46.15
c(  7)  53.85
c(  8)  61.54
c(  9)  69.23
c( 10)  76.92
c( 11)  84.62
c( 12)  92.31
c( 13) 100.00
 info =  0
 info =  0
X(  1)   0.38
X(  2)   0.77
X(  3)   1.15
X(  4)   1.54
X(  5)   1.92
X(  6)   2.31
X(  7)   2.69
X(  8)   3.08
X(  9)   3.46
X( 10)   3.85
X( 11)   4.23
X( 12)   4.62
X( 13)   5.00

In Part II of this series, we'll start converting this code to run on a distributed memory parallel processor, using ScaLAPACK.

IBM Lightweight Core Files

Core files can be a real time saver when try to track down bugs in your code. But traditional core files require a debugger to get useful information out of them. There is another alternative for MPI programs on IBM systems such as iceberg and iceflyer- lightweight core files. These core files are significantly smaller than traditional core files and provide a stack trace and other useful information in human readable form.

Here's how to dump lightweight core files:
  1. Compile with an mpi compiler (e.g. mpxlf_r, mpcc_r, mpCC_r, etc.)

  2. Add -g to the compiler flags so that debugging information will be added to the executable. Without this flag line numbers and other useful information will not be included in the stack trace.

    E.g.: iceberg1 37% mpCC_r example.cpp -g -o example

  3. Add the environment variable MP_COREFILE_FORMAT=STDERR to your loadleveler script. This will cause the lightweight core information to be written to standard error. You may also want to add MP_LABELIO=yes in case more than one MPI task dumps core (MP_LABELIO=yes prepends the task number writing output to the each line of output). E.g.:

    iceberg1 50% cat example.ll
    # @ job_type         = parallel
    # @ node             = 1
    # @ error            = example.err
    # @ output           = example.out
    # @ environment      = COPY_ALL
    # @ tasks_per_node   = 8
    # @ network.MPI      = sn_single,shared,us
    # @ class            = standard
    # @ wall_clock_limit = 0:30:00
    # @ queue
    setenv MP_LABELIO yes

If the application crashes, it will now write the lightweight core file to standard error. Below is sample output from the example application. If you have set MP_LABELIO to yes, you can now look at the output for individual tasks using grep.

Here we are looking at the information for task 5 only:

iceberg1 53% grep ' 5:' example.err
   5:+++LCB 1.0 Fri Nov 19 19:27:57 2004 Generated by IBM AIX 5.2
   5:+++ID Node 0 Process 430208 Thread 1
   5:***FAULT "SIGSEGV - Segmentation violation"
   5:setArray__FPii : 41 # in file <example.cpp>
   5:main : 22 # in file <example.cpp>
   5:---ID Node 0 Process 430208 Thread 1
   5:+++ID Node 0 Process 430208 Thread 2
   5:_p_sigtimedwait : 0x00000490
   5:sigwait : 0x0000001c
   5:pm_async_thread : 0x0000072c
   5:_pthread_body : 0x00000100
   5:---ID Node 0 Process 430208 Thread 2

Note that you can get lightweight core files, even if you don't compile with -g. You'll get less information, but better performance. We recommend that you do not compile with -g for production runs!

Checking Your Disk Quota on the ARSC IBMs: qcheck

ARSC provides a locally written script, "qcheck," for checking your disk quota on the iceberg and iceflyer. This makes it as easy to check your quota on the IBMs as on the Cray, where you should use "quota -v".

Simply type:

  %    qcheck

You'll get a self-explanatory report like this:

  %    qcheck
     filesystem  used(MB) quota(MB) avail(MB) capacity(%)
          $HOME       1.0     100.0      99.0    1.0
        $WRKDIR    8941.1   40000.0   31058.9   22.4

Without the "qcheck" script, IBM users would need two different quota commands, one for each file system, and would have to remember to rsh to the appropriate node before running them.

Address questions regarding qcheck to

Book Review: Perl in a Nutshell

[ Thanks to Lee Higbie for this review. ]

Perl in a Nutshell , Patwardhan, Siever & Spainhour, O'Reilly, $40

In life, as in art, the beautiful moves in curves. -- Bulwer-Lytton

Whatever you think is right, is wrong. -- Slightly hyperbolic decorating advice to me from my wife, who knows engineers prefer Cartesian arrangements.

Avoid programming languages that lend themselves to contests of guessing what a line of code means. -- Corollary to Murphy's Law.

Perl is an abominable language. It is, unfortunately, the lingua franca of Unix scripting. Because we need it all the time and because of its usefulness in string processing, it is a necessary tool for our kit.

There are scores of books on Perl -- a search of returned over 7800 hits on "perl." This book is for people who know Perl and don't want to wade through hundreds of man pages to answer questions.

Perl in a Nutshell does a good job of covering most aspects of the language, from its basics to its modules and connections to CGI. Whenever I am forced to use Perl, it is the book I have on my desk. It is useful for checking items from the correct syntax for concatenation, where no two languages seem to agree, to the proper routines and modules for file operations or large number arithmetic.

A nice feature is the chapter, "Function Reference." Perl functions are listed by category and then alphabetically. When I need to open a file, for example, I go to "File Functions" at the beginning of the chapter to see my options. When I want the exact syntax and semantics of the "substr" function, I go straight to it in the alphabetical listing.

A few chapters later, they have a similar structure for the hundreds of standard modules. When I needed to handle many-digit numbers, for example, I found "bigint" and "bignum" modules there, which provided unlimited precision integers. Overkill for my use, but it worked. The reference was crucial at that time and easy to stumble across in the book.

One weak aspect of the book is the index. It does not list as many variations on terms as I need. I've added my own cross references to normal terms for various items, diacriticals and punctuation marks. For example,

  1. The list of symbols in the index omits the three (very) different types of quote marks.
  2. There are all sorts of references to the Math modules, but none to the common max and min functions.
  3. There is an index entry for the "sleep" function, but no reference from such common synonyms as "pause" or "hibernate." The more Perl programming a person does, the less important the missing synonyms in the index.

Ah well. It is a good book. I hope the next edition is even better.

Fall Training: Nov 23, LoadLeveler; Nov 30 - Dec 1, UPC Workshop

The next courses in ARSC's Fall Training calendar:

Title: LoadLeveler Details Instructor: Tom Logan, ARSC Date: November 23 at 1pm Location: WRRB 009

Title: Unified Parallel C (UPC) Workshop Visiting Instructor: Tarek El Ghazawi, George Washington University Dates: November 30 - December 1 Location: WRRB 009 and WRRB 010 More information: Please send email to:

Complete training schedule:

Santa Letters, Postmarked "North Pole"

Uncles, Aunts, Parents, and other friends of kids: here's an offer from the ARSC HPC Newsletter editors.

The town of North Pole, Alaska is a mere 15 miles from here. If you'd like a letter, postmarked "North Pole," delivered to someone, seal your letter in a stamped, addressed envelope, and instead of mailing it from your local post office, enclose it in a larger envelope and send this to us. On about December 13th we'll mail the letters from North Pole.

Plan extra time for mail to/from Alaska... If you post these to us before Dec. 6th, there should be no problem.

Send to:

Tom Baring and Don Bahls UAF / ARSC P.O. Box 756020 Fairbanks AK 99775-6020

Quick-Tip Q & A

A:[[ I'm looking for an easy, automatic way of scanning through a list of
  [[ files to change all lines like this:
  [[   #include "some_string"
  [[ to this:
  [[   #include <some_string>

  # Richard Griswold

  This is easy to do with perl:

    perl -pi -e 's/(#include )"(some_string)"/$1<$2>/' file ...

  Since you are using perl's s/// operator, you have a full range of regex 
  options available to you.  If you are a bit unsure of your changes and 
  want to keep backups, use "-pi.bak" instead of "-pi" on the command line.  
  You can also combine this with find and xargs to change files in multiple 

    find . -name "*.cpp" 

      xargs perl -pi -e 's/(#include )"(some_string)"/$1<$2>/'

  # Kate Hedstrom

  A Perl one-liner:

    perl -i -pe 's/#include "([^"]*)"/#include <$1>/'  file.c

  The pattern matches double quotes around any number of non-quote
  characters and stores it in the $1 buffer. The -i option means to
  replace the file with the changes. Otherwise it would be:

    perl -pe 's/#include "([^"]*)"/#include <$1>/'  file.c > file_new.c

  # Dale Clark

    Here's one solution:

    perl -pi -e 's/#include "some_string"/#include <some_string>/g' \
      `find . -type f`

  This 'find' expression would act on all plain files in the current
  directory tree. Some other expression (like "ls *.c") could of course
  be used in its place, as appropriate.

  # Greg Newby

  There are lots of standard Unix/Linux utilities to do this sort of
  thing, as well as more fully-featured languages with good pattern
  substitution functionality such as Perl (I expect someone else will
  present a single-line Perl script to do the same thing).  A simple
  case would be to take each file and transform it to a new file.  If
  the new file is different than the old file, we presume that
  substitutions took place.  Replace it (perhaps saving a copy).

  I'd use "sed" for this, which should be available on any Unix/Linux
  system.  There are some limitations to sed on some systems, but
  assuming you're just talking about source code files, sed is a safe
  choice (don't use sed for binary data, or files with very long lines,
  without testing first).

  Here's a quick little script for all .cpp files:


    # Loop for each .cpp file:
    for i in *.cpp ; do

      # Send the file as input to "sed".  We need a backslash
      # before special characters such as ", #, < and >:
      cat $i 
 sed 's/\#include \"some_string\"/\#include \<some_string\>/' > /tmp/$i

      # See whether the newly transformed files are different than the old:
      diff $i /tmp/$i > /dev/null 

      if [ $? -ne 0 ] ; then
        cp $i ${i}.old          # Make a backup
        cp /tmp/$i $i           # Copy the changed file over the old file


  # Wendy Palm

  It sounds like sed with some regular expressions may be one way to go.
  This miniscript will change any line that has #include "...." to
  #include <....>, accounting for an arbitrary number of whitespace
  characters (anywhere but leading whitespace).

  #! /bin/sh

  for file in $*
    cat $file 
 sed '/^#[ \t]*include[ \t]*/s/\"/\</' 
 sed '/^#[ \t]*include[ \t]*/s/\"/\>/' > $file$$
    mv $file$$ $file

Q: I was recently on a walk and discovered a phone number, that I 
   wanted to remember, posted on a tree.  I had time to cogitate, but
   nothing on which to write.

   Do you have any tips, methods, or systems for memorizing numbers?

   If you want, you could demonstrate your system using one or more of
   the following numbers (these are the ARSC Consult line, our latitude,
   and the ISBN for everyone's favorite book, "Parallel Programming in

     64 deg. 51.590 min.

[[ Answers, Questions, and Tips Graciously Accepted ]]

Current Editors:
Ed Kornkven ARSC HPC Specialist ph: 907-450-8669
Kate Hedstrom ARSC Oceanographic Specialist ph: 907-450-8678
Arctic Region Supercomputing Center
University of Alaska Fairbanks
PO Box 756020
Fairbanks AK 99775-6020
E-mail Subscriptions: Archives:
    Back issues of the ASCII e-mail edition of the ARSC T3D/T3E/HPC Users' Newsletter are available by request. Please contact the editors.
Back to Top