ARSC HPC Users' Newsletter 412, April 30, 2010

ARSC Included in $45 Million DoD Award for New Supercomputers

[ Editors’ note: Following is a press release announcing ARSC’s newest platform, to be named "Chugach". Thanks to ARSC Communications Group Lead Debra Damron. ]

Fairbanks, Alaska– The Arctic Region Supercomputing Center at the University of Alaska Fairbanks has acquired new supercomputing resources under a $45 million award from the U.S. Department of Defense to Seattle- based supercomputer manufacturer Cray Inc. ARSC is the sole provider of open research computing capabilities for DoD’s High Performance Computing Modernization Program. ARSC will receive an 11,648 compute core Cray supercomputer. ARSC also operates a 3,456 core Cray XT5 named Pingo, and a 2,312 processor Sun Opteron cluster called Midnight. In addition to ARSC, the $45 million award to Cray provides for the purchase of new high- performance computing (HPC) machines at DoD centers in Mississippi and Ohio.

The new Cray for ARSC will be installed later this year in the National Petascale Computing Facility (NPCF) at the University of Illinois and remotely operated by ARSC staff from Fairbanks. NPCF, now in its final phase of construction, is run by the National Center for Supercomputing Applications (NCSA). The new ARSC supercomputer is considered by Cray to be one of its next generation supercomputing systems, code-named “Baker.” It will feature a new interconnect chipset known as “Gemini” as well as enhanced system software to boost performance and productivity.

In addition to the ARSC supercomputer, NPCF will house Blue Waters, a massive supercomputer funded by the National Science Foundation that will be capable of performing quadrillions of calculations every second. “We think this partnership with one of the largest academic supercomputing centers in the U.S. provides HPCMP and its users with more opportunities for collaboration and shared use of technologies co-located at NCSA,” said ARSC Director Frank Williams. NCSA Director Thom Dunning is also supportive of the new partnership. “This cooperative arrangement with ARSC opens up exciting opportunities for further collaboration between our two centers, and potentially between NCSA and the Department of Defense,” he said. “We look forward to working with ARSC to use our expertise and experience to help the scientists and engineers supported by DoD meet their research goals,” Dunning said.

The partnership with NCSA is also expected to provide DoD’s HPC Modernization Program with a venue conducive to developing strategic opportunities with other federal agencies, and especially the National Science Foundation, according to ARSC Director Williams. “ARSC continues to take a leadership role in connecting academic, research, computing and defense communities with the computers, data storage systems, high-speed networks and next-generation experimental systems necessary for discovery in engineering and science,” he said.

“It also provides significant growth and sustainment opportunities for ARSC,” Williams said. “Running the new Cray remotely frees up much needed space and high-demand power supplies so that we can install and operate new academic supercomputing systems here in Fairbanks.” Earlier this year, ARSC announced that in partnership and collaboration with researchers at UAF and the University of Hawaii, a new supercomputer and a new data storage system funded by a National Science Foundation cyberinfrastructure grant will be installed in the Butrovich Building this spring. Those and related systems support more than 280 UAF researchers whose work would not be possible without the use of ARSC supercomputing resources.

Supercomputing resources at ARSC are used by a global community of researchers, within the U.S. Defense Department, the University of Alaska and other locations to advance scientific discovery for national competitiveness, global security and economic success. Projects include study of the world’s oceans, which include creating models that predict the force and direction of tsunami waves, the impact of changes to the marine ecosystem on the Alaska fishing industry or the potential for ice-free summers in the Arctic.

-30-

Links:

National Center for Supercomputing Applications:      http://www.ncsa.illinois.edu

NCSA Petascale Facility:      http://www.ncsa.illinois.edu/AboutUs/Facilities/npcf.html

Blue Waters:      http://www.ncsa.illinois.edu/BlueWaters/

DoD High Performance Modernization Program:      http://www.hpcmo.hpc.mil

Cray Inc.:      http://www.cray.com

How I Caught a Heisenbug

[ By Patrick Webb ]

This is an interesting "bug" that only manifested in certain situations when dealing with C standard library’s random number generator, srand().

The code in question set up a series of dummy particles for testing out an animation path. Each particle received a random start point, and random end point. Running the test application resulted in the strange bug of every particle having the exact same start point and end point. Using Totalview, I was able to determine that the animation start point and end points were receiving the same value somehow. The curious part about debugging was that the problem only manifested when the code was run straight through. If I stepped through the code line-by-line and watched the values in the variables being assigned then everything appeared to work as it should. I had found a Heisenbug! ( http://en.wikipedia.org/wiki/Heisenbug#Heisenbug )

What I found out was really going on was the way that the seed value was being sent to the srand() function. The way the code was written, I was only using srand() in one function so naturally I placed all random-number related operations in that function including the seeding. Each time that function was called, the srand() function would be seeded with the current system time.

In the main body of code, the animation start points called the random function, seeding the srand() function and then almost immediately after that the animation end points called the random function RE-seeding the srand() function with the system time, which had not yet had a chance to increment in between the two separate calls. Because srand() is a pseudo- random number generator, given the same seed the function will output the same sequence of pseudo-random numbers. The reason the bug did not manifest in Totalview was due to the fact that stepping through line-by-line allowed the clock to advance and the seed be different!

The solution was simple: Either delay the calls so the system clock has time to advance, or only do the seeding once! By placing srand() seeding at the beginning of the code, say in the main() function, I was able to eliminate this strange and frustrating bug.

Octave/Matlab Compatibility

[ By Craig Stephenson ]

I was blown away when I first discovered the OpenOffice.org application suite. How could an entire office suite, including applications for word processing, spreadsheets, and presentations, among others, be completely free? I wanted to know if there was a catch, so I made a point of using OpenOffice.org instead of Microsoft Office for my remaining three years of college. I learned to use its word processor for essays and technical papers; its spreadsheet to better visualize comma-separated files and to create graphs; and its presentation application to make slide shows. I even invested a bit of time into learning its equation editor for physics lab reports. Throughout these three years of school, nobody seemed to have the slightest idea I was using an alternative to Microsoft Office. This is the paradox of open source software. How can something that’s free be anywhere near as good as something that costs a lot of money?

But this article is not about OpenOffice.org. It’s about another open source package, GNU Octave, which is installed on pingo and midnight in the following location:


  $PET_HOME/bin/octave

I decided to try Octave after hearing its name mentioned once or twice in the office as a potential alternative to Matlab. The official Octave website gives the following description:

  • GNU Octave is a high-level language, primarily intended for numerical computations. It provides a convenient command line interface for solving linear and nonlinear problems numerically, and for performing other numerical experiments using a language that is mostly compatible with Matlab. It may also be used as a batch-oriented language.

This begs the question, "Just how compatible is Octave with Matlab?" The following Wikibook delineates some of the differences between the two, but the list is not exhaustive:

     http://en.wikibooks.org/wiki/MATLAB_Programming/Differences_between_Octave_and_MATLAB

What about the similarities? With as little experience in Matlab as I have, I figured the odds of coming up with a Matlab script that confused Octave were minimal. Thus, I scoured the web for various Matlab scripts I could run through Octave just to see what would happen. I will be using a few example Matlab scripts written by John Burkardt from the following website to demonstrate:

     http://people.sc.fsu.edu/~burkardt/m_src/m_src.html

Dr. Burkardt has kindly allowed me to host the source code of his examples along with this newsletter. I encourage motivated readers to download the examples to follow along or experiment. All three examples mentioned in this article are included in the following tar file:

     http://www.arsc.edu/files/arsc/news/HPCnews/misc/misc412/matlab_examples.tar.gz

arpack (script #2)

For a description of this script, please see the following web page:

     http://people.sc.fsu.edu/~burkardt/m_src/arpack/arpack.html

I attempted to run arpack (script #2) as my first test. Since the arpack_test.m script was written as a function, I first launched Octave and then typed the name of the function:


  % $PET_HOME/bin/octave
  ...
  octave:1> arpack_test

(Note: The next two examples were also run this way.)

The arpack script ran into compatibility problems with the delsq, numgrid, and eigs functions. Comparing the Matlab and Octave manuals suggested that each of these Matlab functions have comparable functions in Octave. I was unable to port this script to Octave after half an hour of digging through the manuals and experimenting, however. Launching Octave using the "--traditional" flag for better Matlab compatibility did not help. Things were looking bleak at this point.

asa005 (script #3)

For a description of this script, please see the following web page:

     http://people.sc.fsu.edu/~burkardt/m_src/asa005/asa005.html

Next, I moved on to asa005 (script #3), which ran through Octave with no modifications. I decided to run the same script through Matlab and use the "diff" command to compare the output between the two. Of the 30 lines of numerical output produced by each program, only two lines differed due to miniscule floating point precision differences, shown here:


    Octave:                         Matlab:
    8.7740117993789823e-01          8.7740117993789846e-01
    9.9811304880974128e-01          9.9811304880974105e-01

ball_and_stick_display (script #32)

For a description of this script, please see the following web page:

     http://people.sc.fsu.edu/~burkardt/m_src/ball_and_stick_display/ball_and_stick_display.html

And finally, although the thought alone seemed absurd to me, I attempted to run ball_and_stick_display (script #32) through Octave. It seemed absurd because this was the first script I found on the website that produced a graphical plot. Remember, Octave "provides a convenient command line interface," not a GUI like Matlab. But maybe Octave would write plots to image files, I wondered.

When I ran script #32 through Octave, despite a series of warnings I was amazed to discover that the plot popped up in a window automatically. Octave invokes gnuplot to provide its plotting functionality. The plots generated by Octave vs. Matlab are shown below for comparison.

Octave:

    

Matlab:

    

(Note: The plot generate by Octave is not completely compatible, as the lines are much thinner and the balls at the ends of lines are missing -- features that were deliberately added to the Matlab script by the author.)

Unfortunately, most of the more complex plotting scripts from that particular website failed to display at all in Octave. But many Matlab plotting scripts I found on other websites displayed through Octave without a hitch. Furthermore, despite Octave’s problems with the arpack script, the vast majority of non-plotting Matlab scripts I tried ran through Octave without any modifications. But in most cases I did not compare the Octave output with Matlab output to verify its accuracy.

More evidence is needed, but so far Octave seems like a potentially viable alternative for very basic Matlab usage. More ambitious projects will likely require frequent trips to the Octave manual. If plotting is required, anything more complex than the most rudimentary plots may be a limiting factor. It would also be wise to verify the accuracy of output produced by Octave before assuming a ported script is fit to go into production.

If you have any Octave experiences of your own that you would like to share with the newsletter, we would love to hear them.

Quick-Tip Q & A


A:[[ Occasionally I use the "tee" command to save the output from the make
  [[ command to a file.  Unfortunately, the "tee" seems to mess up my error
  [[ checking.  Here is an example script:
  [[
  [[  #!/bin/bash
  [[
  [[  make 2>&1 
 tee make.eo
  [[  if [ $? -ne 0 ]; then
  [[   echo "Error running make.";
  [[  else
  [[   echo "make was successful.";
  [[  fi
  [[
  [[ A "make" error yields this disappointing result:
  [[
  [[  mg56 % ./build.bash
  [[  make: *** No rule to make target `fred.c', needed by `fred.o'.  Stop.
  [[  make was successful.
  [[
  [[ It would be really nice to have my script exit if the "make" fails!  How
  [[ can I fix this?  I can use another language to do this if that's what I
  [[ have to do.
  [[

#
# Scott Kajihara points out that bash has already anctipated this problem:
#

So, traditionally, the exit status of a pipe has been the exit status of
the last command executed, in this case, the tee(1) command which is
[unfortunately] successful. The only simple solution is to use bash(1)
[which is Bourne sh, Korn ksh on Linux-derived systems], and to set the
pipefail option with ``set -o pipefail''. This will make the exit condition
the last non-zero [unsuccessful] exit condition of a command as noted
from the right-hand-side.

#
# Your editors add that bash and ksh also have an array variable called 
# PIPESTATUS into which are saved the exit status values of a pipeline.  
#

For example, running this script:
    #!/usr/bin/sh
    make -f bad_Makefile 2>&1 
 tee make.eo
    echo "PIPESTATUS = ${PIPESTATUS} but rc = $?"

produces this output:
    make: *** No rule to make target `m.c', needed by `small.o'.  Stop.
    PIPESTATUS = 2 but rc = 0

If we add Scott's suggestion, as in:
    #!/usr/bin/sh
    set -o pipefail
    make -f bad_Makefile 2>&1 
 tee make.eo
    echo "PIPESTATUS = ${PIPESTATUS} but rc = $?"

we get:
    make: *** No rule to make target `m.c', needed by `small.o'.  Stop.
    PIPESTATUS = 2 but rc = 2

#
# And ARSC User Consultant Lawrence Murakami suggested that this problem 
# might be avoided by using nested scripts, or perhaps writing your own 
# tee-like script to intercept the exit code.
#



Q: Every so often I find myself needing to verify that a file is the same 
 on two different hosts.  Usually what I end up doing is performing an 
 "md5sum" command on the file on each of the two hosts, then visually 
 skimming the md5 checksums to make sure they look similar.

 This seems sloppy to me though.  It's possible that the two checksums 
 could look similar enough that a hasty glance would lead me to believe 
 they are identical when they are not.  Is there a way to perform some kind 
 of remote "diff" command between the two hosts instead?
 

[[ Answers, Questions, and Tips Graciously Accepted ]]


Current Editors:
Ed Kornkven ARSC HPC Specialist ph: 907-450-8669
Kate Hedstrom ARSC Oceanographic Specialist ph: 907-450-8678
Arctic Region Supercomputing Center
University of Alaska Fairbanks
PO Box 756020
Fairbanks AK 99775-6020
E-mail Subscriptions: Archives:
    Back issues of the ASCII e-mail edition of the ARSC T3D/T3E/HPC Users' Newsletter are available by request. Please contact the editors.
Back to Top