ARSC HPC Users' Newsletter 359, April 6, 2007

Introduction to C++ Templates

[ by Alec Bennett ]

One of the powerful features of the C++ language is its support for template classes and functions. Templates allow for the abstraction of data types from the operations being performed. Rather than writing near carbon copies of functions, templates make it possible to write one function that applies to many different data types.

One of the common uses for templates is to implement a generic stack class. Instead of defining a separate stack for integers, floating point values, etc., you may define a single stack object, and allow the template to have the flexibility to apply to many different data types.

For non-CS readers, a stack is a data structure which two basic operations: push and pop. A push adds a new item to the top of the stack, which pop removes the item on top of the stack and returns the value.

An integer stack class might be implemented as follows:


    // Note: The examples shown here are a simple implementation and
    // lack proper error checking!  
    //

    class IntStack{
            private:
                    int top;
                    int stack[100];
            public:
                    IntStack() : top(-1) {};
                    ~IntStack();
                    int push(int i){ stack[++top] = i; }
                    int pop(){ return stack[top--]; };
    };

In order to use this class, you may do something like:


    IntStack intstack = IntStack();
    intstack.push(25);
    intstack.push(10);
    ...

The problem with this design is that in order to use the stack for a different data type (e.g. double, int, bool, etc.), a separate class definition must be written for each data type. Each of these classes would be fairly similar to the original IntStack. By changing just a few parts of this code, however, you can create a class which is adaptable to a variety of data types. Compare the initial example to that of a template class:


    template <class T> class Stack {
            private:
                    int top;
                    T stack[100];
            public:
                    Stack() : top(-1) {};
                    ~Stack() {};
                    T push(T i){stack[++top] = i; }
                    T pop(){ return stack[top--]; }
    
    };

The only changes are the prepending of the "template <class T>" before the class declaration, and the conversion of the "int" type to "T" in several sections of the code. Here the "T" designates a particular class or data type. When creating instances of this class, another minor change is necessary, as follows:


    Stack<int> intstack;
    intstack.push(25);
    intstack.push(10);
    ...

Similarly a stack of doubles could be declared as follows:


    Stack<double> dblstack;
    dblstack.push(1.25);
    dblstack.push(2.125);
    ...

While this still requires specifying a data type for instances of the class, overall changes are usually much simpler than writing several classes. Instead of changing the stack class for ints, doubles, and bools, you merely change the template. This helps to limit the chance of errors, as well as making code much more concise. Hopefully, this will help to limit tedious changes, and allow for more focus on the important sections of programs.

Templates can also be used to generalize functions. Here's a simple example which adds the elements of an array.


    nelchina % cat template.h 
       
    template <class T> T addArray(T * arr, int elements)
    {
        T sum=0;
        for(int ii=0; ii< elements;++ii)
        {
            sum += arr[ii]; 
        }
        return sum; 
    }

NOTE : This template code is included in a header file rather than a .cpp file. In general, if you want to use a template function or class in multiple source files, it's best to implement the routines in a header file. Most if not all C++ compilers cannot generate template definitions once code is in .o (i.e. object) format.

Here's a simple example using the template function above:


    nelchina % cat example.cpp
    #include "template.h"
    #include <iostream>
    
    int main()
    {
        int int_array[]={1, 2, 3, 4, 5, 6, 7, 8, 9, 10};
    
        double dbl_array[]={2.1, 1.25, 2.3, 7.1, 10.01, 12.0, 14.0, 16.0, 18.0, 20.0};
    
        int isum=addArray<int>( int_array, 10 );
        std::cout << isum << std::endl;
    
        double dsum=addArray<double>( dbl_array, 10 );
        std::cout << dsum << std::endl;
        
    }
 
    mg56 % ./template_ex 
    55
    102.76

While this has been a brief introduction to templates, be aware there are many predefined templates in C++. The Standard Template Library (STL) has a wide variety of lists, stacks, vectors, and other templates designed to simplify coding of more complex structures. Watch for more articles on the C++ Standard Template Library in future issues of HPC Users' Newsletter.

NetCDF I/O Faster Than Binary I/O

[ by Lee Higbie ]

I've been advocating netCDF for some time without having done an analysis to determine the size of its overhead. While I don't believe facts should ruin a story, I thought you might disagree on that point so I've collected some "facts."

On the IBMs netCDF was nearly always much faster than binary write from Fortran. On Midnight (Sun AMD Opteron, Linux) the speeds were close, most netCDF writes took 1.0 to 1.7 times as long as binary writes. On Nelchina (Cray XD1, Linux) the netCDF writes took 1.5 to 2.5 times as long. I gather that most of this slowdown is due to byte swapping the data array into netCDF's standard format (based on a note from Russ Rew at UCAR). From what Russ says, much of this overhead will go away in netCDF-4.

I wrote a short program to test the speed of writes for both types of I/O. Here is an excerpt from one case of that code:


    ...
    call system_clock(start)
    write(1, iostat = iosb) x ! binary write of 25+ MB of data
    call system_clock(now)
    wrint = now - start ! time to write x
    ...
    print *, "Time for ", size, "MB binary write = ", wrint
    ! size = number of megabytes in x
    NF90_CREATE(ncdfilenm, NF90_CLOBBER, ncid) ! Open file
    NF90_DEF_DIM(ncid, "1", prsz1, x1_did) ! define x
    NF90_DEF_DIM(ncid, "2", prsz2, x2_did) ! prsz* = x's dims
    NF90_DEF_DIM(ncid, "3", prsz3, x3_did)
    NF90_DEF_DIM(ncid, "4", prsz4, x4_did)
    dimids = (/ x1_did, x2_did, x3_did, x4_did /) ! vector of dimnsns
    NF90_DEF_VAR(ncid, "data", NF90_REAL, dimids, varid)
    NF90_ENDDEF(ncid) ! definition done
    
    call system_clock(start)
    NF90_PUT_VAR(ncid, varid, x) ! write data
    call system_clock(now)
    ncint = now - start
    ...
    print *, "Time for ", size, "MB netCDF write = ", ncint

In the ellipses are checks to be sure all the return codes are zero. The middle group of NF90_* calls are shown so you can see the programming overhead of netCDF I/O. This code is copied from "The NetCDF Tutorial," http://www.unidata.ucar.edu/software/netcdf/docs/

The binary file is inscrutable and non-portable. The netCDF is portable and there are many netCDF visualization utilities. For example, the file written above in the 100MB case, was NetCDFfile; ncdump prints the contents of netCDF files:


    >ncdump NetCDFfile 
 head -n13
    netcdf NetCDFfile {
    dimensions:
    1 = 100 ;
    2 = 50 ;
    3 = 100 ;
    4 = 50 ;
    variables:
    float data(4, 3, 2, 1) ;
    data:
    
    data =
    0.04449632, 0.06998526, 0.09545948, 0.1209116, 0.1463343, 0.1717201,

To me the astounding part of this experiment was the result. In nearly all Iceberg (IBM Power4, AIX) and Iceflyer (IBM Power4, AIX) cases netCDF write was more than twice as fast as binary write.

I also tested a second timing code snippet. It was similar to the one above but with ten writes in a loop, all to the same file. For the binary writes, I inserted rewind(1) just before each write. For netCDF there is no rewind, it's newer than tapes, so I coded the top block of NF_90* functions before each NF90_PUT_VAR. In other words, for the repeated netCDF writes, I closed the file, reopened it, redefined the header information and then wrote the data. The timing loops included the rewind or the close-open-define operations. The timing ratios for this case were similar to the first case.

I ran the program with array sizes ranging from 25 to 200 MB (400 MB on Nelchina and Midnight) in batch (normal) mode and interactively.

ScicomP 13 Call for Abstracts

We thought we'd pass along this ScicomP announcement. ARSC IBM users might be interested in attending this meeting or submitting an abstract.


>            Announcing the 13th annual meeting of ScicomP
>                        
http://www.spscicomp.org/

>                 the IBM Scientific System User Group
> 
>                        July 16 to 20, 2007
>                   Computer Center Garching (RZG)
>                        Garching, Germany
>            Meeting web site: 
http://scicomp.rzg.mpg.de/

> 
> IMPORTANT DATES:
> 
> 23 April 2007 (Monday)      - Deadline for ScicomP Abstracts
> 15 May 2007 (Tuesday)       - ScicomP Author Notification
> 15 May 2007 (Tuesday)       - Registration Opens
> 29 June 2007 (Friday)       - Early Registration Closes
> 16 July 2007 (Monday)       - ScicomP Tutorial(s)
> 17-20 July 2007 (Tue-Fri)   - ScicomP Meeting
> 
> CALL FOR ABSTRACTS:
> 
> We seek original presentations by HPC application and software
> infrastructure developers about recent discoveries, experiences,
> and lessons learned on IBM HPC products.  Speakers will be chosen
> based on 1-page abstracts submitted to the ScicomP13 website
> (<URL:
http://scicomp.rzg.mpg.de/
>) by 23 April.
> 
> ABOUT SCICOMP:
> 
> ScicomP meetings target computational scientists and engineers
> interested in achieving maximum performance and scalability on
> all IBM HPC platforms, including the POWER, Linux, BlueGene, and
> Cell platforms.  Presentations will be given by both HPC users
> and IBM staff.  The focus of user presentations is on real world
> experiences in porting, tuning, and running codes on large-scale IBM
> systems.  Topics may include (but are not limited to) code migration,
> scalable algorithms, hybrid programming models, exploiting system
> architectures, and scheduling and execution frameworks.  IBM staff
> presentations include hardware and software road maps for large
> systems; application development support software tools; performance
> programming, measurement, analysis, and tuning techniques.
> 
> For more information see the conference website:
>    
http://scicomp.rzg.mpg.de/

> 

CUG 2007

The 2007 meeting of the Cray User Group will be held in Seattle, Washington in May.

Cray User Group Meeting

Location : Seattle, Washington Dates : May 7-10, 2007 Hosted by : Boeing

More information: http://www.cug.org/1-conferences/CUG2007/index.php

Quick-Tip Q & A



A: [[ Sometimes when I "cp" a file, it changes the group ownership.  What's
   [[ up with that?


  #
  # Thanks to Barbara Herron
  #

  The cp command follows the Unix rules for creation of a new file to
  set the modes, timestamp, and ownership.  Use the "cp -p" option and
  it should preserve ownership (including group ownership), timestamps,
  and modes.


  # 
  # Thanks to Brad Havel
  #

  From the Solaris man pages for "cp":
    -----
    If target_file does not exist, cp creates a new file named
    target_file that has the same mode as source_file except that
    the sticky bit is not set unless the user is super-user. In this
    case, the owner and group of target_file are those of the user,
    unless the setgid bit is set on the directory containing the newly
    created file.  If the directory's setgid bit is set, the newly
    created file has the group of the containing directory rather than
    of the creating user.  -----

  Translated:
    If the sticky bit is set on the directory the copy will inherit the
    group of the directory, else default to the user's primary group.
    A user's primary group can be identified with the "id" from most
    *nix systems.



Q: # Thanks to Brad Chamberlain for this question.

If I have an array of double-precision floating point values in C
and I want to write each value out to a file in an ASCII format so
that when they are read back in I have no loss of precision, how
would I do it?  I'd like the solution to avoid writing characters
that don't affect the precision, like trailing 0's.  (This question
might be of interest to Fortran users as well).


  [[ Answers, Questions, and Tips Graciously Accepted ]]




++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Editors:
--------
   Tom Baring, ARSC HPC Specialist, baring@arsc.edu, 907-450-8619
   Don Bahls, ARSC HPC Specialist, bahls@arsc.edu, 907-450-8674

Subscription Information:
-------------------------
   Subscribing: send this message to: "majordomo@arsc.edu":
     subscribe hpc_users
     end
   Unsubscribing: send this:
     unsubscribe hpc_users
     end
   For help with majordomo, send this:
     help
     end
   In all cases, leave the "subject" line of your message blank.

   Messages sent to "owner-hpc_users@arsc.edu" will be forwarded to 
   the editors.  Let us know if you have problems with majordomo.

Back Issues are Available:
--------------------------
   - Web edition:   http://www.arsc.edu/support/news/HPCnews.shtml
   - E-mail edition archive:
                    ftp://ftp.arsc.edu/pub/publications/newsletters/

-----------------------------------------------------------------------
Arctic Region Supercomputing Center          ARSC HPC Users' Newsletter
-----------------------------------------------------------------------

[[ Answers, Questions, and Tips Graciously Accepted ]]


Current Editors:
Ed Kornkven ARSC HPC Specialist ph: 907-450-8669
Kate Hedstrom ARSC Oceanographic Specialist ph: 907-450-8678
Arctic Region Supercomputing Center
University of Alaska Fairbanks
PO Box 756020
Fairbanks AK 99775-6020
E-mail Subscriptions: Archives:
    Back issues of the ASCII e-mail edition of the ARSC T3D/T3E/HPC Users' Newsletter are available by request. Please contact the editors.
Back to Top