ARSC HPC Users' Newsletter 284, January 9, 2004



Scientific Computing Lecture, Next Tuesday

Title: Creating and Maintaining Effective Large-Scale Scientific Computing Applications. BalancingEase-of-Use, Extensibility, and Performance Requirements
When: Tuesday, January 13, 2004, 2:00-4:00pm
Location: Butrovich 109, UAF Campus
Speaker: Richard Barrett, Los Alamos National Laboratory

On October 2, 1992, President Bush signed into law the FY1993 Energy and Water AuthorizationBill that established a moratorium on U.S. nuclear testing. President Clinton extended themoratorium on July 3, 1993. These decisions ushered in a new era by which the U.S. ensuresconfidence in the safety, performance, and reliability of its nuclear stockpile. The AdvancedSimulation and Computing Program (ASCI) is an integral and vital element of our nation's StockpileStewardship Program. ASCI provides the integrating simulation and modeling capabilities andtechnologies needed to combine new and old experimental data, past nuclear test data, and pastdesign and engineering experience into a powerful tool for future design assessment andcertification of nuclear weapons and their components.

These computational physics simulations consist of hundreds of thousands of lines of code,written using multiple programming languages. These codes must execute accurately, consistently,and efficiently on a variety of computing platforms throughout their multiple decade lifetimes.They must withstand the participation of many code developers, each of whom brings different skillsets to the project. They must adapt to dynamic user requirements, and thus be amenable to theinclusion of new algorithms and other improvements.

Barrett's focus is on abstracting the necessary complexities of the distributed memory parallelprocessing environment in a way that is natural to the code developer, yet enables theincorporation of sophisticated computer science ideas Under-the-hood.

Barrett will illustrate how these requirements have been managed by describing a variety ofspecific applications and computational kernels. These applications include hydrodynamic algorithmsoperating on unstructured and semi-structured dynamic meshes, various radiation transportapproaches (Sn and Monte Carlo), and an approach to solving linear systems when the systemproperties are poorly understood.

About The Speaker:

Richard Barrett is co-author of: "Templates for the Solution of Linear Systems: Building Blocksfor Iterative Methods and contributing programmer for the ASCI Stockpile Stewardship Program"

Barrett has been a technical staff member at Los Alamos National Laboratory for the past tenyears, contributing to a variety of physics simulation code development projects, mainly within theASCI program. Prior to going to Los Alamos he was a charter member of the Innovative ComputingLaboratory at the University of Tennessee.

A Tale of Slicing NetCDF Files

[ Thanks to Kate Hedstrom for this article. ]

I am in the process of setting up a large model run using the ROMS model. One of the initialization steps is to interpolate a friend's model fields onto my grid. He supplied the Matlab routines for doing the job, but my grid is so large that Matlab ran out of memory, due to the 2 GB limit for 32-bit executables. (Let's not get into how I feel about depending on Closed Source tools, but concentrate instead on my solution.)

All of our grid, forcing, and initial conditions files are in the NetCDF format (references below). The model grid is Arakawa-C, meaning that variables are not all co-located on the grid, but the grid is a structured rectangle. A look at the header (ncdump -h) will show the dimensions:

netcdf NPAC_grid_10 {
        xi_psi = 2113 ;
        xi_rho = 2114 ;
        xi_u = 2113 ;
        xi_v = 2114 ;
        eta_psi = 1345 ;
        eta_rho = 1346 ;
        eta_u = 1346 ;
        eta_v = 1345 ;

Here, the four xi dimensions are all in the i direction, while the four eta dimensions are in the j direction. I figured that I could create smaller grids and process each one in turn, then merge them all together at the end.

My first attempt was to split it in half in both the i and j directions, making each a quarter of the original in size. It was still too big! I would have to try smaller pieces. Because the grid is staggered, I realized it would be easier to cut in just one direction. My favorite cutting tool is "ncks" from the NetCDF operators package. Here's the command to create one grid strip (I made 11 subgrids, from a to k).

  ncks -d xi_rho,600,801 -d xi_psi,600,800 -d xi_u,600,800  \
       -d xi_v,600,801

The last two arguments are the original and output NetCDF files. I'm splitting on all four of the i dimensions.

The 11 subgrids thus created were finally small enough, and I processed them successfully with Matlab. This created 11 subgrids of the initial conditions, which then needed to be glued back together.

NCO can concatenate multiple records, but I don't think it can handle our case of patching these slightly overlapping grids. Instead, I turned to one of the many other packages with NetCDF support, the NCAR Command Language (NCL). First, I let Matlab create the large file with the right dimensions, although it didn't write the values into it. Here is part of the NCL code to glue "ubar", a variable on the xi_u points:

; Create the file handles
    ininame = ""
    ncout = addfile(ininame, "rw")
    in_1 = ""
    ncin_1 = addfile(in_1, "r")
    in_11 = ""
    ncin_11 = addfile(in_11, "r")

; Read in the fields, including the empty big one
    ubar = ncout->ubar
    tmp_a = ncin_1->ubar
    tmp_k = ncin_11->ubar

; Copy the subgrids into the big grid
    ubar(:,:,0:200) = tmp_a
    ubar(:,:,200:400) = tmp_b
    ubar(:,:,2000:) = tmp_k

; Write out the resulting ubar
    ncout->ubar = ubar

This needed to be repeated for each variable.

Finally, to make sure that the glue worked, I tried plotting the resulting fields with ncview. It failed to allocate enough memory for the 3-D fields, since it too was compiled 32-bit. The 2-D fields look reasonable, although they are too large for the whole grid to be plotted on my monitor. People working with small grids are so lucky!

In conclusion, NetCDF is a wonderful file format and there are many tools which can not only read and write these files, but also slice and dice them.


(Note, the tools mentioned in this article are not available on all ARSC platforms. Contact if you can't find what you need.)


X1 Interlanguage Procedure Calls

Beginning with the X1, Cray has adapted to generally accepted mechanisms for interlanguage communication (i.e., Fortran calling C functions or C calling Fortran routines). If you're porting to the X1 from another vendor's system you may get away without making changes, but you'll have to modify your code if migrating from another Cray. Interlanguage calls are a common porting issue, regardless of platform.

If porting from an earlier Cray to the X1, here are some specific concerns:

  1. On the X1, the format for C function names called from Fortran is different (C names must now be lower-case with a trailing underscore.)
  2. The way in which Fortran CHARACTER variables and C strings are passed is different.
  3. The way numeric variables are exchanged is not changed.

    (However, unlike earlier Cray vector platforms, the X1 supports 32- and 64-bit data types so you must verify that C and Fortran types are compatible. If you casually switch between different ftn "-s <size>" options, which modify the size of default variables, you might get into trouble. E.g., a C "float" always matches a Fortran "REAL(KIND=4)", but it won't match a "REAL" if the "-sdefault64" ftn option is used.)

For details, see Cray's on-line manual:

Migrating Applications to the Cray X1 System - S-2378-51 Chapter 4. Interlanguage Communications

(Note, the Cray manuals are available to current ARSC users only. Read "news documents" on klondike for the current URL, login, and password.)


Pre-defined Macros

It's handy to know what macros are predefined on a given system. Using pre-processor logic and cpp, you can use them to compile different bits of code based on operating system, architecture, vendor, etc.

Typically, predefined macros are tested using pre-processor #ifdef's. For instance, the code might print a message identifying the system on which it's running:

#ifdef _UNICOSMP
      write (*,'(A)') "Good morning. I'm a Cray X1."

There are many pre-defined cpp macros, but the most important is probably that which simply identifies the system. For ARSC's current systems, here they are:

   Macro Name     Defined as "1" on These Systems
  ============    ===============================
    _AIX            IBM systems running AIX
    _UNICOSMP       Cray X1
    _CRAYSV1        Cray SV1 series
    _CRAYT3E        Cray T3E series
    _SX             Cray SX series
    __sgi           SGI systems

Exhaustive lists of predefined macros are available as follows:


Execute this command: $ cpp -dM /dev/null

Cray (including the SX-6):

Search the on-line manuals for the term "predefined macros".

(Note, these manuals are available to current ARSC users, only. Read "news documents" on klondike, chilkoot, yukon, or rimegate for the current URL, login, and password.)

For more, see Kate Hedstrom's recent 2-part series of articles, "Conditional Compilation", in issues #274 and #275 .


Oh, Come On!

No, this is true. A user called last week complaining about a buffer overflow that crashed his MPI performance analysis tool. The sys-admins have been debugging the core file, and they're starting to suspect it was the work of Buffer, the VAMPIR slayer.


Quick-Tip Q & A

A:[[ Are data written from a Fortran "implied do" incompatible with a
  [[ regular "read"?  If so, is there a way to make them compatible,
  [[ without rewriting the code?  
  [[ I just want to read data elements one item at a time from a
  [[ previously written file.  Here's a test program which attempts 
  [[ to show the problem:
  [[ iceflyer 56% cat unformatted_io.f
  [[       program unformatted_io
  [[       implicit none
  [[       integer, parameter :: SZ=10000, NF=111
  [[       real, dimension (SZ) :: z
  [[       real :: z_item, zsum
  [[       integer :: k
  [[       zsum = 0.0
  [[       do k=1,SZ
  [[         call random_number (z(k))
  [[         zsum = zsum + z(k)
  [[       enddo
  [[       print*,"SUM BEFORE: ", zsum
  [[       open(NF,file='test.out',form='unformatted',status='new')
  [[       write(NF) (z(k),k=1,SZ)
  [[       close (NF)
  [[       zsum=0.0
  [[       print*,"SUM DURING: ", zsum
  [[       open(NF,file='test.out',form='unformatted',status='old')
  [[       do k=1,SZ
  [[         read(NF) z_item
  [[         zsum = zsum + z_item
  [[       enddo
  [[       close (NF)
  [[       print*,"SUM AFTER: ", zsum
  [[       end
  [[ iceflyer 57% xlf90 unformatted_io.f -o unformatted_io
  [[  ** unformatted_io   === End of Compilation 1 ===
  [[  1501-510  Compilation successful for file unformatted_io.f.
  [[ iceflyer 58% ./unformatted_io  
  [[   SUM BEFORE:  5018.278320
  [[   SUM DURING:  0.0000000000E+00
  [[  1525-001 The READ statement on the file test.out cannot be completed 
  [[  because the end of the file was reached.  The program will stop.
  [[ iceflyer 59%

  # Thanks to Jim Ott:

  They are incompatible. When the  write statement is used with the do
  inside, the values are written out one after the other, as one line.

    write(NF) (z(k),k=1,SZ) : one line of data

  When the read statement is within a do loop, each read starts from a
  new line. The file test.out has 1 line of data with SZ entries, not SZ
  lines with 1 entry each. I am not sure how to correct the problem w/o
  rewriting the code. The simplest method would be to read them in as
  one line, taking the read outside the do loop.

  #    ... BONUS ANSWER ... 
  # Here's yet another slick solution to the grep+find question.
  # Thanks to Dale Clark: 

  find . -name "*.f" -exec grep -i flush6 {} \; -print

  # The trick is putting "-print" last...
  # When "grep" matches something, it returns 0.  When the value 
  # returned to "-exec" is 0, then "-exec" returns TRUE and find must
  # evaluate its next expression, "-print".  Thus, the file name is 
  # only printed when the file contains a match.

Q: My overworked nephew, the elementary school teacher, emailed yet 
   another arithmetic assignment to his 6th graders without knowing the
   answers first. I'd like to help him out. Is there a quick way I can
   compute the answers for him?  The problems all have the same basic
   format, as shown in this sample from his ASCII email:

   myworkstation$ tail -n 12 mathproblems.txt

    14)  Division problems:
        56 / 2.34 = ________
        2.34 / 56 = ________
    15)  Multiplication: 
        33.3 * 12.3 = ________
        12 * 22 * 8 = ________
        33.3 * 12.3 * 12 * 22 * 8 = ________
    Extra Credit:
        1 + 2 + 3 + (4) * 2 = ________
        1 + 2 + (3 + 4) * 2 = ________
        1 + (2 + 3 + 4) * 2 = ________

[[ Answers, Questions, and Tips Graciously Accepted ]]

Current Editors:
Ed Kornkven ARSC HPC Specialist ph: 907-450-8669
Kate Hedstrom ARSC Oceanographic Specialist ph: 907-450-8678
Arctic Region Supercomputing Center
University of Alaska Fairbanks
PO Box 756020
Fairbanks AK 99775-6020
E-mail Subscriptions: Archives:
    Back issues of the ASCII e-mail edition of the ARSC T3D/T3E/HPC Users' Newsletter are available by request. Please contact the editors.
Back to Top