ARSC HPC Users' Newsletter 301, October 8, 2004

Definitions of "Supercomputer"

Here are results from the 300th Issue Contest:


Define the term: "Supercomputer."


See: /arsc/support/news/hpcnews/hpcnews300/index.xml


Many thanks to Richard Barrett, ARSC Research Liaison, and Jim O'Dell, ARSC Chief Scientist, who served as judges. I presented the list of entries, with the names of the entrants removed, to the judges. They chose their top three picks, without ranking them.


  • Bob Clark :

    Supercomputer     Inspiration         Translation             Calculations at the speed of light         Understanding     Inspiration

  • Derek Bastille :

    A supercomputer is one of the largest, fastest and most capable computers available at any one point in time relative to any other available systems. It can be composed of a single system or a tightly coupled collection of systems.

  • Trey White :

    Supercomputer: Where you are when you accidentally sit on your laptop.


  • Ed Anderson :

    "A supercomputer is a computing system with sustained performance within an order of magnitude of the fastest production system in the world."

  • Lee Higbie :

    A supercomputer is a computer whose description elicits a gasp from at least (4/3 pi)^e percent of the population. In other words it is a OMGWAVLCI (Oh my gosh what a very large computer indeed).

  • Derek Bastille :

    A supercomputer is any system that provokes instant geek lust and jealousy in more than 75% of the people who see it.

  • The Editor :

    It wouldn't be complete without this, attributed to Seymour Cray: "A supercomputer is a device for turning compute-bound problems into I/O-bound problems."

  • Trey White :

    Supercomputer: More than supercompute, but less than supercomputest.

  • Brad Chamberlain :

    Faster than a speeding bullet, more power-full than a locomotive, able to fill tall buildings in a single round (of funding). It's for nerds, could be a Cray, it's *supercomputer* !

  • Brad Chamberlain :

    Definition for my grandmother:

    Imagine that you have a task that takes you 1000 hours. If 1000 friends worked on it with you, it might only take one. Except that you must spend lots of time coordinating them and having them coordinate. Supercomputers are the same.

A Comparison of IBM Pwr4 and Cray X1

[ Thanks to Kate Hedstrom for this article. ]

I run an ocean circulation model known as ROMS (Regional Ocean Modeling System). I have previously described the process of porting the serial version to the X1 (issue #280). I have also been using its benchmark application to exercise the IBM and get some idea of the scalability. I have finally gotten around to performing similar timings on the X1 and can now compare the two systems. On the IBM I am using the hpmavg script described in issue #281.

The ROMS model is distributed with a handful of analytic problems predefined. It is simply a matter of asking for the corresponding C preprocessor option. One predefined problem is an idealized periodic southern ocean domain, available at three different resolutions. I am using the finest resolution (BENCHMARK3) since I want to make sure things work for problems large enough to require 64-bit memory addressing (larger than 2 GB). This domain has 2048x256x30 grid points and requires roughly 5 GB.

First, the IBM numbers:

    # cpu            wall time    max size/cpu    Mflips/cpu

      1              18646 sec      4.8 GB            385
      8               2914 sec      627 MB            308
     64                347 sec      109 MB            327
    512                 70 sec       55 MB            208

Cray MSP numbers (from pat_hwpc):

    # cpu        user+sys time    max size/cpu    Mflips/cpu

      1               2510 sec      4.7 GB           2804
      8                335 sec      3.0 GB           2641
     64                 66 sec      1.0 GB           1735

What we see from this is that each X1 processor is running much more quickly on this problem than each Pwr4 processor. However, there are a lot more processors in iceberg than in klondike. If we assume that each X1 processor is "worth" 8 Pwr4's, then the numbers line up surprisingly well:

     # Pwr4      # X1        Pwr4 time       X1 time

      8            1        2914 sec         2510 sec
     64            8         347 sec          335 sec
    512           64          70 sec           66 sec

This raises several questions. Is eight is the correct ratio between the two systems? Which one to run on? It will depend on the wait in the queues, of course. Also, how many processors to use? I would say the scaling for the first factor of eight here is pretty darn good, while the second factor of eight is not as good. How good is good enough to justify the use of the additional processors?

For me, the biggest mystery of all is the memory used by the X1. Note the "max size/cpu" columns in the first two tables, above. The 64 MSP job required over 64 GB total memory for a job that uses less than 5 GB in scalar mode. This strikes me as quite incredible. For seriously large problems, say when we add many biological tracers, the memory use may force me onto the IBM.

Loupe Performance of the Cray X1: Part I

[ Thanks to Lee Higbie for this series of articles. ]

You can always make one machine run slower than another -- Levesque's Law

I've undertaken a thorough analysis of simple loops on Klondike, louped them, if you will. Actually, I've thoroughly analyzed four of the 224,711 dimensions to the problem. And even that work has received important support from Tom Baring and Ed Kornkven. Any tale of perfidy, treachery and attacks by aliens needs to be spread over several newsletters, so stay tuned. The tale began...

It was a dark and stormy beagle, at least that's how I imagined the sender of the email, who asked why:

        do i=1, n1
           do j=1, n2
              do k=1, n3
                 ind = indexSet(i,j,k)
                 u3(ind) = x3(ind)/d3(ind) 
                 v3(ind) = y3(ind)/d3(ind) 
                 w3(ind) = z3(ind)/d3(ind) 

        indexSet(i,j,k) = i+(j-1)*n2+(k-1)*n2*n1

ran slower on Klondike than on his PC.

The initial snooping around, rewriting the code with three subscripts instead of one, lowered the run time to 1% of the original's. This prompted the investigation I will describe over the next few weeks. Seventy-two different versions of this computation were timed, each with multiple array sizes and compilation options. If you think I have leekemia and are trying to confirm your diagnosis, consider this, a more complex loop code structure was also tested in dozens of variations. This second boucle nest, as the French may call them, vaguely resembled some common Klondike codes.

Thousands of timings were made, some because of the large number of variations in the loop structure and so on, and some to verify that the timings were accurate. If this last option dogs your good sense, you'll be interested to know that the time for some specific loop variants varied by more than 15% from one test to the next. This test-to-test variability is enough to make criminalists shudder and Cochran rejoice.

There's morale in this story. There's also a moral that will give a hint of the ending without ruining the plot. The moral is actually in many parts and I will only give a part of it here:

Trust only profiling -- don't believe only in vectorization.

X1: Submitting Pre/Post Processing Jobs in PBS

You can run long builds, backup operations, and other non-application work using PBS scripts on the X1. If the task only requires system commands, like make, ftn, cp, tar, gzip, etc... then it will run entirely on the command nodes, and your script should request zero application nodes:

  #PBS -l mppe=0

A sample script might look like this:

  #PBS -l mppe=0
  #PBS -l walltime=2:00:00
  cd $WRKDIR
  tar cf xyz.tar xyz
  cp xyz.tar $ARCHIVE
  rm -r xyz
  rm xyz.tar
  cp -R -p abc $ARCHIVE
  • As opposed to running the commands interactively, you might forget the operation is in progress, and tamper with the directories being backed up (or files being compiled, etc...).
  • Once you submit the job, you can log off and go home!!!
  • Record keeping
  • Advanced scripting constructs, like looping through lists of directories and checking the existence of files are available. See Kate Hedstrom's article: /arsc/support/news/hpcnews/hpcnews297/index.xml

And the biggest advantage:

  • Post-processing work using zero application processors can be submitted by big parallel jobs as their last act. This frees the application processors for other users, and you don't get charged for them when you're not actually using them.

SC04, Early Registration Deadline: Midnight Oct 8

SC is the premier, annual conference on supercomputing, highly recommended to all in this field. Come see us in the ARSC booth, #951. Early registration ends Midnight EDT, Friday, October 8.

Adieu to Chilkoot and Yukon

Yukon, a 272 processor T3E was ARSC's work horse for 7 impressive years, and almost 100% utilized to the end.

Chilkoot, a 32-CPU SV1ex, was, early in its life, binary compatible with systems all the way back to the Y-MP. It kept vector processing alive and well at ARSC until the introduction of the X1, Cray's scalable vector system which is the upgrade for both the T3E and SV1ex.

Both yukon and chilkoot were retired from service late last Thursday.

ARSC Newsletter Adds New Co-Editor: Don Bahls

Don Bahls will be joining Tom Baring as co-editor of this newsletter, commencing with issue #302.

Don started at ARSC three years ago as a student assistant, while finishing up his BS in computer science at UAF, and has been a full-time User Consultant for one year. Don's primary focus at ARSC is the IBM systems.

Quick-Tip Q & A

A: I frequently want to delete a bunch of files from a large
   directory... but with a catch.  A simple wild-card expression
   describes the files I want to RETAIN, but there's no simple
   expression for the files I need to remove.

   E.g., I want to delete everything except:  "*.f90"

   Is there an "rm" option like grep "-v", which says "select the files
   not matching the expression"?

# Editor's note:

Alert for the Uninitiated:  

You might stick with "rm -i".  It asks before deleting a file unlike
"rm" takes no prisoners.

As the first three responses demonstrate, you can replace "rm" with
"echo" or "ls" in find or xargs commands, thus performing a dry run to
display the imperiled files, before actually deleting them with "rm".

# From Bob Clark:

Here is one way:

 grep -v '\.f90$' 
 xargs rm

I used /bin/ls because I have my ls aliased to 'ls -F' (which can append
extra characters to the file names).  grep -v does the desired selection
and xargs runs rm with the resulting arguments.  To be safe, you could
test first with echo:

 grep -v '\.f90$' 
 xargs echo

# Liam Forbes

Short answer:
  % find . -type f ! -name \*.f90 -exec rm -i {} \;

Long answer:
Find is your friend when searching for and matching file names, just be
careful not to be too expansive.  In fact, before doing the "rm", you
should exec an "ls" and review the listing, especially if there's so
many files that you don't want to remove them interactively.

[ngc6397-e0:~/test] lforbes% ls -l
total 0
drwx------  13 lforbes  staff   442 27 Sep 07:47 ./
drwx------  40 lforbes  staff  1360 27 Sep 07:46 ../
-rw-------   1 lforbes  staff     0 27 Sep 07:46 1.f90
-rw-------   1 lforbes  staff     0 27 Sep 07:47 123556
-rw-------   1 lforbes  staff     0 27 Sep 07:46 2.f90
-rw-------   1 lforbes  staff     0 27 Sep 07:46 3.f90
-rw-------   1 lforbes  staff     0 27 Sep 07:46 4.f90
-rw-------   1 lforbes  staff     0 27 Sep 07:46 5.f90
-rw-------   1 lforbes  staff     0 27 Sep 07:47 abc.txt
-rw-------   1 lforbes  staff     0 27 Sep 07:46 apple
-rw-------   1 lforbes  staff     0 27 Sep 07:47 find.c
-rw-------   1 lforbes  staff     0 27 Sep 07:47
-rw-------   1 lforbes  staff     0 27 Sep 07:46 zebra

[ngc6397-e0:~/test] lforbes% find . -type f ! -name \*.f90 -exec ls -ld {} \;
-rw-------  1 lforbes  staff  0 27 Sep 07:47 ./123556
-rw-------  1 lforbes  staff  0 27 Sep 07:47 ./abc.txt
-rw-------  1 lforbes  staff  0 27 Sep 07:46 ./apple
-rw-------  1 lforbes  staff  0 27 Sep 07:47 ./find.c
-rw-------  1 lforbes  staff  0 27 Sep 07:47 ./
-rw-------  1 lforbes  staff  0 27 Sep 07:46 ./zebra

[ngc6397-e0:~/test] lforbes% find . -type f ! -name \*.f90 -exec rm -i {} \;
remove ./123556? y
remove ./abc.txt? y
remove ./apple? y
remove ./find.c? y
remove ./ y
remove ./zebra? y

[ngc6397-e0:~/test] lforbes% ls
./      ../     1.f90   2.f90   3.f90   4.f90   5.f90

# Martin Luthi

You can use (with caution!!)

rm `ls -d * 
 grep -v .f90`

ls -d          builds first a list of the entries in your directory
grep -v .f90   filters out all entries (filenames) containing ".f90"
               (you could also use a more elaborate regexp like 'f90$'
               to find f90 at the end of the file name)

This list of files is then passed to the rm command

To see the list of files (and to make sure you get the list of files you
want) you can use

echo `ls -d * 
 grep -v .f90`

# Hank Happ


  rm -f `ls -1 
 grep -v '.f90'` 

This will also keep files with a '.f90' anywhere in the name, but I'm
presuming that's not a problem.

# Lee Higbie

Solution #1:
  mkdir Save ; mv <keep-file-templ> Save ; rm * ;  mv Save/* . ;  rmdir Save

Solution #2:
The following is more hazardous in that it will descend the entire file
hierarchy deleting files.  Also, all special characters, like the ! near
the beginning and the semicolon on the end and any *s in file templates,
must be escaped.

  find . \! -name  <keep-file-templ> -exec rm {} \;

You can easily check that the files it will delete with the following

  find . \! -name  <keep-file-templ> -print


"find" searches an entire directory tree starting at its first argument
and performs commands or adds condition based on the subsequent

  find <starting directory> <commands and conditions>

In the "finds" above, the starting directory is where you are (denoted
by the period): 

o the -name is a condition that "find" checks.  In this
  case it is negated by the ! and verifies that all names do not match
o the -exec command says to execute the subsequent
  command, up to the semicolon (that has to be escaped).  It passes the
  file name in the braces, {}.  
o the -print command prints out the file path.

Another use for the "find" command is to look for files containing a
specific string.  For example, suppose you want to search all files
except tar files, down the directory tree from <start here path>, for
the strings "neat stuff" or "great stuff".  Try this:

  find <start here path> \! -name \*.tar -exec grep -l "neat stuff" {} \; -o -exec grep -l "great stuff" {} \;

In this command the back slashes are escape characters that prevent the
shell from processing them before handing them onto the find command.
The exclamation mark before -name says "not with this name." "-o" means
"or", so that even if the first grep fails, "find" will execute the
second grep.

Q: I know all about redirecting files in Unix.  Like, I can do this:

     %    cat f.newinfo >> my.big.file

   which puts the contents of "f.newinfo" at the end of "my.big.file".
   What I really want, though, is to put the contents of "f.newinfo" at
   the TOP, not the bottom, of "my.big.file".  I tried this:

     %    cat f.newinfo ^^ my.big.file

   but it didn't work.  How do you prepend files? 

[[ Answers, Questions, and Tips Graciously Accepted ]]

Current Editors:
Ed Kornkven ARSC HPC Specialist ph: 907-450-8669
Kate Hedstrom ARSC Oceanographic Specialist ph: 907-450-8678
Arctic Region Supercomputing Center
University of Alaska Fairbanks
PO Box 756020
Fairbanks AK 99775-6020
E-mail Subscriptions: Archives:
    Back issues of the ASCII e-mail edition of the ARSC T3D/T3E/HPC Users' Newsletter are available by request. Please contact the editors.
Back to Top