ARSC HPC Users' Newsletter 392, August 08, 2008



Using atexit for Debugging

[ By: Don Bahls ]

On a few occasions I've had to debug codes which exited prematurely due to an error, but didn't generate a core file. I ran into another instance of this class of problem last week. Fortunately the code did produce an error message just before it exited. Using the error message along with find and grep I eventually tracked down the bug. This was a time consuming process that wasn't fun until the problem was fixed.

This agony made me wonder if there might be a better way to handle this class of problem. It would be nice if the code produced a core dump when it exits. A nice solution I found after all this was all said and done is the C "atexit" function. The "atexit" function lets a user register up to 32 functions that will be called by "exit". If you register a function which calls abort, you will get a core dump which can be used to get a stack trace.

Here's the prototype for atexit:

  #include <stdlib.h>

  int atexit( void (*func) (void) );

For the uninitiated the "void (* func) (void)" is what's called a function pointer. This means the routine expects a reference to a subroutine of the form:

  void trace(void);

You can tell "atexit" to call "trace" with the following code segment:


In the case of the code I was debugging the majority of the code was written in Fortran, so I had to wrap the "atexit" code in a way such that it could be called from Fortran. Here's the simple wrapper.

  mg56 % cat custom_exit.c
  #include <stdio.h>
  #include <stdlib.h>

  void trace()

  void register_custom_exit_()
  {/* The following doesn't check to see if atexit successfully
      registered the exit handler.  If it isn't successful we 
      won't get the core file(s).  */

  void register_custom_exit__()

The functions "register_custom_exit_" and "register_custom_exit__" provide name mangling using both one trailing underscore and two, allowing the C routines to be linked by most Fortran compilers. The custom_exit.c code can be compiled with the C compiler of your choice:


  mg56 % gcc -g custom_exit.c -c 

Next make we need to add a call to "register_custom_exit" in the file containing the main "program".


  program myapp

  ! call the "atexit" wrapper early in myapp.  Preferably right 
  ! after variable declarations.
  call register_custom_exit()

Then recompile the application including the custom_exit.o object file. When the code is run and exits it will produce a core file from each task.

If you do try this technique, be sure to remove the call to "register_custom_exit" when you are done debugging, so you don't get a core file each time you run your code.

A possible improvement to this idea would be to leave the "register_custom_exit" call in the code, but only enable core dumps if an environment variable is set.


   void register_custom_exit_()
      if ( getenv("CORE_ON_EXIT") ) { atexit(&trace); }

When the environment variable CORE_ON_EXIT is set a core file will be produced on exit, otherwise the application will act like it used to. On midnight, the mpirun command would look like this:

   mpirun -np 8 CORE_ON_EXIT=1 ./a.out

Summer Intern Final Presentations

Since early June, ARSC has been host to a number of interns from across the United States. This summer the interns have been looking at topics including climate modeling, weather modeling, computational chemistry, large scale data analysis and acceleration technologies such as GPUs, multi-core system and cell processors. During their last week at ARSC, the interns will be presenting the outcome of their summer research.

Topic: Summer Intern Final Presentations
Location: West Ridge Research Building - Room 010
Date: Thursday, August 14th, 2008
Time: 10:00 AM - 1:00 PM

On Wednesday, August the 20th, several students from George Washington University will discuss their work with acceleration technologies this summer.

Topic: GWU Student Final Presentations
Location: West Ridge Research Building - Room 010
Date: Wednesday, August 20th, 2008
Time: 10:00 AM - noon

Quick-Tip Q & A

A:[[ I have a directory (with subdirectories) that has a bunch of
  [[ duplicate files.  These files could have the same contents, but
  [[ not have the same filename.
  [[ Is there a good way to identify files with identical contents
  [[ without doing a diff on each pair of files?

# Thanks to Lorin Hochstein and Greg Newby who each suggested using a 
# checksum program to handle this.  Lorin's complete suggestion is 
# below.  

One way to do this is to calculate MD5 checksums on all of the files,
and then manipulate those with tools like sort and uniq to identify
the duplicates. If two files have the same MD5 checksum, they are very
likely to be identical. You'll need a program called "md5sum" to do

Here's an example of a bash script called "" that takes as an
argument the root directory and identifies all identical files (as a
side-effect, it will create a file called md5sums.txt that contains
the checksums).

dups=`find $1 -type f -exec md5sum {} \;
 tee md5sums.txt 
 cut -c1-32 
 uniq -d`
echo "-------------"
for hash in $dups
    grep $hash md5sums.txt 
 cut -c35-
    echo "-------------"

Use it like this:
$ ./ ~/Downloads
./RegExhibit/RegExhibit Read me 1.2.rtfd/Pasted Graphic.tiff
./RegExhibit/Version history.rtfd/Pasted Graphic.tiff

# Thanks to Martin Luethi who suggested fslint and also provided a 
# solution using the md5sum command and python.

fslint ( is often a good choice.

Of course I use my homegrown script that basically does the following:

* find all files in a directory tree (using shell wildcards)
* make a hashtable with file size as hash
* traverse the hash table and calculate a MD5 checksum on all files
  with the same size
* do something with the duplicates.

The initial list on file sizes avoids the calculation of MD5 hashes for 
all files. 

The basic structure of such a program, implemented in Python, is

#!/usr/bin/env python

import os

datapath = '/home/tinu/images/2008'

def getFilesize(allfiles, dirname, fnames):
    """create a hash of all filesizes
    for fname in fnames:
        filename = os.path.join(dirname,fname)
        if os.path.isfile(filename) and not os.path.islink(filename):
            allfiles.setdefault(os.path.getsize(filename), []).append(filename)

allfiles = {}
os.path.walk(datapath, getFilesize, allfiles)
# eliminate duplicates
for files in allfiles.values():
    if len(files) > 1:
        md5sums = {}
        for filename in files:
            md5sums.setdefault(os.popen('md5sum "%s"' %(filename) ).read()[:32],[]).append(filename)

        for msums in md5sums.values():
            if len(msums) > 1:
               # do something here, like deleting or moving duplicates
               # for this example we just print the file names
               print msums

// Editor's note: python version 2.5 includes an md5 module which you
// could use instead of the os.popen('md5sum ...') call.

Q: I was trying to get a general idea how long my job was running by
   issuing a date command before and after the "mpirun" commands in
   my script.
   mpirun ./preprocess
   mpirun ./analyze
   mpirun ./postprocess
   While it's nice to have the start and end time, I would also like
   to know how long all three commands took cumulatively to run,
   so I can specify a more accurate walltime in my PBS script.
   How can I get the time it took to run all three commands?

[[ Answers, Questions, and Tips Graciously Accepted ]]

Current Editors:
Ed Kornkven ARSC HPC Specialist ph: 907-450-8669
Kate Hedstrom ARSC Oceanographic Specialist ph: 907-450-8678
Arctic Region Supercomputing Center
University of Alaska Fairbanks
PO Box 756020
Fairbanks AK 99775-6020
E-mail Subscriptions: Archives:
    Back issues of the ASCII e-mail edition of the ARSC T3D/T3E/HPC Users' Newsletter are available by request. Please contact the editors.
Back to Top