ARSC HPC Users' Newsletter 399, January 16, 2009

Pingo is Now in Production

The new ARSC Cray XT5, Pingo, went into production on Thursday, January 8th, 2009. Documentation for Pingo is available on the ARSC website.

Common Wisdom and Modular Programming

[ By Lee Higbie ]

Many programs are still written in Fortran 77 with constructs that are relatively difficult to maintain compared to those available in Fortran 90/95. This article encourages you to take advantage of some of the newer constructs to improve the software engineering of your Fortran. I will describe modules and how they lead to more understandable and safer code than common blocks.

Some Terminology

The "heap" is a section of memory that is managed separately from other system memory and, for most HPC machines and all ARSC machines, is the memory with the largest addresses. Statically allocated data and instructions are limited to the first 2 GB of address space by default for the AMD x86_64 architecture. This is known as the small mcmodel, the McModel, I guess. Currently Pingo and Ognip only support this small memory model, which is insufficient for very large programs. One way for Fortran codes to take advantage of the great wasteland of memory with large addresses, is to dynamically allocate memory. All allocated data are on the heap, starting at the largest available address and working down toward 2G.

I use the non-standard term "parameters" for the items in a call or function statement and "arguments" for those in the subroutine or function statement itself. You give parameters (to your children) and get arguments (from your parent).

Out with the Old

So, you have a program, maybe old maybe new, maybe it looks nice with nice structure, no goto statements and free format text and using features that were not in the standard until Fortran 90/95, but it may not be a Fortran 90/95 program. In my terminology, a program is not Fortran 90/95 if it or any of its subroutines use any of the following:

  1. fixed form or column-oriented source code
  2. common blocks
  3. data statements
  4. multiple declarations statements applying to a single variable
  5. statement functions

This list is hardly comprehensive. It includes some common Fortran 77 structures that I think should be banned.

First a description of modules for Fortran 77 users. Conceptually the data part of modules is like common that is referenced by name instead of location. Referencing by name means that the name is the same in all parts of the program and there are no aliases for the data it references.

In addition to providing program-wide names that refer to specific arrays, for example, modules allow you to

  1. Describe all the characteristics of the data in one place (type, rank, volatility, ...).
  2. Select only those data items you want (without a dummy variable to skip over them, as you need in common).

The Fortran standards groups have tried to make modules more like OO objects with functions and subroutines allowed in the module. One important consequence of putting subprogames in modules is that compilers verify that parameter lists match argument lists, so some of the most common interface problems will be caught by the compiler.

Compilers have to check parameter type and shape because of subroutine and function name overloading. (Several similar subprograms can have the same name but have different type or shape of arguments.)

Suppose, probably for reasons of saving memory space, a legacy program uses the same memory locations for two arrays. In Fortran 77, it might have:


double precision absTmp (lonDm, latDm, hgtDm)
common /tempbl/ absTmp
!  uses 8 * (lonDm * latDm * hgtDm) bytes
and in the output routines

real celTmp (lonDm, latDm, hgtDm), fahTmp (lonDm, latDm, hgtDm)
common /tempbl/ celTmp, fahTmp
!  uses 4 * (lonDm * latDm * hgtDm + lonDm * latDm * hgtDm) bytes, the same as before

The problems I see with this approach are the compiler has no way to verify much of anything about any of the temperature variables. If a new programmer sees one of these variables and tries to use it at a time when the data in it are the other type, it will most assuredly lead to interesting results. Also, bizarre outcome is likely if the relative memory allocation of the two data types varies, as can happen when porting to a new machine. If the declarations are not inserted by an include statement, then all sorts of other mistakes that the compiler cannot detect are likely.

In with the New

In Fortran 90/95, one might write something like:


module temperatureData
  real (kind = 8), allocatable, dimension(:, :, :) :: absTemperature
  real (kind = 4), allocatable, dimension(:, :, :) :: celsTemperature, &
                                    fahrTemperature
  ..... functions relating to temperatures ....
end module temperatureData

Notice here that the longer names are self-descriptive. There is much less chance of a new programming team member thinking celsTemperature is a cell-temporary variable (as with the F77 celTemp), improving safety.

With the module you need to


allocate (absTemperature (longitudeDim, latitudeDim, heightDim) )
...
deallocate (absTemperature)

and


allocate (celsTemperature (longitudeDim, latitudeDim, heightDim), &
fahrTemperature (longitudeDim, latitudeDim, heightDim) )
....
deallocate (celsTemperature, fahrTemperature)

With this structure, an attempt to use data when it is not allocated will generate a trap.

Summarizing, if your programs include common, you should start a project to rewrite them to use modules instead. Fortran modules provide the data sharing visibility of common blocks, but in a manner that provides for much more maintainable and safer code, and they provide other features that improve code clarity.

Resolving Linker Errors - Part IV

[ By Don Bahls ]

Modern compilers often hide gory linking details from the user. Under normal circumstances you wouldn't want to see everything the compiler is doing, but on occasion it can be useful to know what the compiler (or linker) is doing under the covers.

Most compilers have a verbose option (e.g. "-v") that will display libraries that are being using under-the-covers.

Here are a few instances where I've found the verbose option to be useful:

  1. Determining which libraries the compiler is using by default. e.g.
    
       pingo5 % cc -v
       /opt/xt-asyncpe/1.0c/bin/cc: INFO: linux target is being used
    
       /usr/bin/ld /usr/lib64/crt1.o ...  -lsci_quadcore -lsci -lsma \
       -lmpichf90 -lmpich -lrt ...
        
    
    This might help you determine which BLAS, or other library the compiler is using on your behalf.
  2. Determining which libraries should be used when linking a mixed language application (e.g. Fortran and C). Oftentimes the Fortran compiler may have library dependencies that aren't included by the C compiler. The verbose option can be a handy starting point if you don't have compiler documentation right in front of you.
  3. Determining the order that libraries appear when passed to ld. For most, if not all compilers, the order in which libraries appear on the command line is important. For example on the Cray XT5, the PGI libsci library has unresolved symbols which are present in the PGI Fortran libraries, so the Fortran libraries need to appear after libsci so that they symbols can be resolved. The verbose flag can show the order that libraries are passed to ld.

Quick-Tip Q & A


A:[[ I am developing an MPI program and would like to display some
  [[ basic information about the systems the MPI tasks are running on.
  [[ Some of the nodes I'm using have more than one task running on 
  [[ each.  Is there a way display this information only once per node? 
  [[ Is there a way to determine the number of tasks running on each
  [[ node?
  [[
  [[ If my job was running on the following nodes:
  [[
  [[   nid00008
  [[   nid00008
  [[   nid00008
  [[   nid00008
  [[   nid00009
  [[   nid00009
  [[   nid00010
  [[
  [[ I might want to see something like this:
  [[
  [[   node        tasks
  [[   ----------  --------
  [[   nid00008    4
  [[   nid00009    2
  [[   nid00010    1
  [[
  [[  The machine I'm using doesn't seem to have an MPI hostfile, so it
  [[  looks like I will have to do this with MPI calls alone.
  [[

#
# Here's a C++ solution from one of the editors.
#

Here's a solution that uses the C++ STL map class.  

pingo5 % less get_counts.cpp
#include <mpi.h>
#include <map>
#include <iostream>
#include <string>

void print_tasks_per_node();

int main(int argc, char ** argv)
{
    MPI_Init(&argc, &argv);
    print_tasks_per_node();
    //
    // Everything else...
    //
    MPI_Finalize();
    return 0;
}


void print_tasks_per_node()
{
    char * recvbuf;
    char sendbuf[256];
    int length=256;
    int size=0;
    int rank=0;

    MPI_Comm_rank(MPI_COMM_WORLD, &rank);     //get the rank
    MPI_Comm_size(MPI_COMM_WORLD, &size);     //get the task count

    if ( rank == 0 )
    {
        recvbuf=new char[256 * size];         //allocate a gather buffer 
    }

    MPI_Get_processor_name(sendbuf, &length);  
                                             //get processor name
    MPI_Gather( sendbuf, 256, MPI_CHAR, recvbuf, 256, MPI_CHAR, 0, MPI_COMM_WORLD);

    if ( rank == 0 )                         //summarize node names with rank 0
    {
        std::map < std::string, int > counts;  
        for(int ii=0; ii< size; ++ii)    
        {
            std::string v(&recvbuf[256 * ii]);
            counts[v]++;                    //count occurances of each node name
        }
        //print a table of the results.
        std::cout << "node    \ttasks" << std::endl;
        std::cout << "------- \t-----------" << std::endl;
        std::map < std::string, int >::iterator itr;
        for(itr=counts.begin(); itr != counts.end(); ++itr)
        {
            std::cout << (*itr).first << "\t" << (*itr).second << std::endl;
        }
        delete [] recvbuf;                  //deallocate the buffer.  
    }
}


pingo3 % aprun -n 32 ./get_counts
node            tasks
-------         -----------
nid00262        8
nid00263        8
nid00264        8
nid00265        8
Application 304142 resources: utime 0, stime 0


Q: For some reason the "make clean" rule in my Makefile is no longer
   working.  It's a really simple Makefile. I'm baffled as to why it 
   suddenly quit working.

   Here it is:
   
   bash-3.2$ more Makefile
   CXX=CC

   %.o : %.cpp
          $(CXX) $< -c

   a.out: main.o lirp.o
          $(CXX) main.o lirp.o

   clean:
          rm -f a.out main.o lirp.o


   When I run "make clean", make reports:

   bash-3.2$ make clean
   make: `clean' is up to date.

   An "ls" shows that it didn't work!

   -bash-3.2$ ls
   Makefile        clean           lirp.cpp        lirp.o          
   main.cpp        main.o          a.out

   What's going on here?

[[ Answers, Questions, and Tips Graciously Accepted ]]


Current Editors:
Ed Kornkven ARSC HPC Specialist ph: 907-450-8669
Kate Hedstrom ARSC Oceanographic Specialist ph: 907-450-8678
Arctic Region Supercomputing Center
University of Alaska Fairbanks
PO Box 756020
Fairbanks AK 99775-6020
E-mail Subscriptions: Archives:
    Back issues of the ASCII e-mail edition of the ARSC T3D/T3E/HPC Users' Newsletter are available by request. Please contact the editors.
Back to Top