ARSC HPC Users' Newsletter 394, September 05, 2008

Viewing the Stack with gdb

[ By: Don Bahls ]

The gdb debugger is great for debugging serial applications or to inspect core files generated by parallel applications. In this article we'll take a look at some gdb stack related commands. These basic commands can go a long way toward helping you track down bugs in your code.

Before you can begin debugging a code, it is helpful to compile your code with "-g".

e.g. midnight% pathf90 mycode.f90 -g -o mycode

If you would like to look at core files, you may have to increase the core file limit so that the system can generate an output file.

e.g. midnight% limit coredumpsize unlimited (tcsh/csh) or midnight% ulimit -Sc unlimited (sh/ksh/bash)

MPI jobs on midnight have unusual requirements to make limits work properly see: http://www.arsc.edu/support/howtos/usingsun.html#mpi

Now that we covered the basics, let take a look at some examples. Most of these examples use the code from the quick tip question from issue > 393 .

  1. Compile the code with the following: midnight% gcc -g realloc_bug.c -o realloc_bug
  2. Then run the code under "gdb"
    
       midnight% gdb ./realloc_bug
       ...
       # Create a breakpoint for the try_realloc routine
       (gdb) break try_realloc
       Breakpoint 1 at 0x400668: file realloc_bug.c, line 9.
    
       # Run the code
       (gdb) run
       Starting program: /home/bahls/realloc_bug 
    
       Breakpoint 1, try_realloc (str=0x501010 "", addstr=0x40081c "first string,") at realloc_bug.c:9
       9         size_t len    = strlen( str );
    
       # Continue running the code past the first call of "try_realloc".
       (gdb) continue
       Continuing.
       first string,
    
       Breakpoint 1, try_realloc (str=0x501010 "first string,", addstr=0x40082e " second string.") at realloc_bug.c:9
       9         size_t len    = strlen( str );
       
       # Move ahead two lines in the code.
       (gdb) step 2
       12        str = realloc( str, ( len + addlen + 1 ) * sizeof( char ) );
    
    Now we can try out some stack related commands.
  3. "backtrace" - This command shows a trace for all subroutines that are currently on the call stack along with the values of any arguments passed to the subroutines. e.g.
    
       
       (gdb) backtrace     
       #0  try_realloc (str=0x501010 "first string,", addstr=0x40082e " second string.") at sample.c:12
       #1  0x00000000004006fb in main () at sample.c:23
       
    
    The "bt" and "where" commands are shortcuts for "backtrace" if you prefer not to type so much.
  4. "backtrace full" - Shows the same information as a standard "backtrace" but also lists the names and values of all locally defined variables. e.g.
    
       (gdb) backtrace full
       #0  try_realloc (str=0x501010 "first string,", addstr=0x40082e " second string.") at sample.c:12
               len = 13
               addlen = 15
       #1  0x00000000004006fb in main () at sample.c:23
               str = 0x501010 "first string,"
    
  5. "up" - This command lets you move up the stack one frame to the subroutine that called the current subroutine. Here the "up" command moves us from "try_realloc" to the "main" routine. e.g.
    
       (gdb) up
       #1  0x00000000004006fb in main () at sample.c:23
       
    
    If you would like to move up multiple stack frames, you can specify an argument to indicate how many frames to move up (e.g. "up 10" will move up 10 frames). To move down the call stack, use the "down" command.
  6. "info args" - Displays the values that were passed to the subroutine. This is similar information to what is shown by the "backtrace" output, but is in a more readable format. e.g.
    
       (gdb) info args
       str = 0x501010 "first string,"
       addstr = 0x40082e " second string."
    
  7. "info locals" - Displays any local variables on the stack. The "backtrace full" command also shows this information. e.g.
    
       (gdb) info locals
       len = 13
       addlen = 15
    

Here are a few other commands that can be useful with more complicated codes.

  • "backtrace N" - This command shows the N innermost call frames on the stack. Likewise "backtrace -N", shows the N outermost call frames on the stack. The latter could be useful if you run into a situation where you have a stack overflow, or an error in a recursive routine. Here's an example of a stack trace from an incorrect implementation of a recursive Fibonacci function. e.g. show the 4 outermost stack frames.
    
       (gdb) backtrace -4 
       #2620 0x000000000040051e in fib (v=-2) at fib.c:7
       #2621 0x000000000040051e in fib (v=-1) at fib.c:7
       #2622 0x000000000040052b in fib (v=1) at fib.c:7
       #2623 0x0000000000400553 in main () at fib.c:15
       
    
    From this trace it looks like frame #2622 is where the problem started because frame #2621 shows the fib routine called with a negative value.
  • "select-frame V" - Lets you jump to an arbitrary stack frame. For example you might want to look at frame #2622
    
       (gdb) select-frame 2622
       (gdb) info args
    

You can find great documentation for gdb on the Free Software Foundation website:

    http://www.gnu.org/software/gdb/documentation/

New "support" Command

If you have questions about your account, obtaining a Kerberos ticket, compiling your code, debugging, etc., help is available!

ARSC hosts accounts for two sets of users: HPCMP and academic. To receive the most prompt support, HPCMP users should contact the Consolidated Customer Assistance Center (CCAC). The ARSC Help Desk should be the point of contact for academic users.

If you are unsure which classification your account falls under, the new "support" command on midnight displays the support email address and phone number appropriate for your particular account.


midnight % support
...
...

Quick-Tip Q & A


A:[[ The following C program starts with an empty string. It then tries
  [[ to append some text to it using the function try_realloc.  That
  [[ function simply reallocates memory to increase the size of the 
  [[ target string and copies the contents of a second string to its 
  [[ end. However, the code sometimes works and sometimes doesn't. 
  [[ What is wrong and how do I fix it?
  [[ 
  [[ #include <stdio.h>
  [[ #include <string.h>
  [[ #include <stdlib.h>
  [[ 
  [[ /* extends str with the content of addstr, reallocates memory to
  [[    be able to store a longer string */
  [[ void try_realloc( char* str, const char* addstr )
  [[ {
  [[   size_t len    = strlen( str );
  [[   size_t addlen = strlen( addstr );
  [[ 
  [[   str = realloc( str, ( len + addlen + 1 ) * sizeof( char ) );
  [[   strncat( str, addstr, addlen+1 );
  [[ }
  [[ 
  [[ int main()
  [[ {
  [[   char *str = calloc( 1, sizeof(char) ); /* str = "" */
  [[   
  [[   try_realloc( str, "first string," );
  [[   printf( "%s\n", str );
  [[ 
  [[   try_realloc( str, " second string." );
  [[   printf( "%s\n", str );
  [[ 
  [[   free( str );
  [[   return 0;
  [[ }
  [[
  
This question exposed the C programming mastery of readers Brad
Chamberlain, Jed Brown, Sean Ziegeler, Greg Newby, Jim Long and Chris
Young. They all pointed out that the occasional errors in this code
are due to a combination of the semantics of C parameter passing
(call by value) and the way the realloc() function works.

As Jed Brown put it:
   There are a couple of things which are wrong.  Most seriously,
   if realloc moves the memory pointed to by str, it will not be
   reflected in the str in main.  Thus main's str will be a hanging
   pointer and the memory moved by realloc will be lost.  To solve
   this, we must write our function to take the address of main's str.

   Another point is that if realloc fails, it returns NULL but leaves
   the existing memory intact.  With the construction shown, this
   memory will be lost since str becomes a NULL pointer.  Then strncat
   will get NULL as its first argument which will segfault.  To fix
   this, we need a temporary pointer so that we can keep track of
   this potentially lost memory and fail gracefully.
   ...

   Note that the original program does not fail on my machine for the
   given strings because calloc gives a sufficiently large starting
   buffer, hence realloc does not move it.


Brad Chamberlain had another possible code fix:
   If you're using a sufficiently C++-compatible C compiler, you
   might be able to change the argument to try_realloc into a char*
   reference:

   void try_realloc(char*& str, ...)

   which would require no changes at the callsite or in the
   try_realloc() definition, and would cause changes to the variable
   to be reflected at the callsite.  One downside to this approach
   is that C programmers are not accustomed to having their function
   arguments change if they don't see a &(...) at the callsite,
   so this can lead to confusion.


Several Quick Tip solvers pointed out that some defensive programming
might avoid future problems, and Greg Newby kindly offered a solution
with error checking:

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <errno.h> /*  gbn: for strerror()  */

/* extends str with the content of addstr, reallocates memory to
  be able to store a longer string */
/*  gbn: return the (new) value of *str  */
char* try_realloc( char* str, const char* addstr )
{
 size_t len    = strlen( str );
 size_t addlen = strlen( addstr );

 /*  gbn: added cast to char* for realloc; added check for success  */
 str = (char*) realloc( str, ( len + addlen + 1 ) * sizeof( char ) );
 if (str == NULL) {
   printf("realloc failed! %s\n", strerror(errno)); exit(1);
 }

 strncat( str, addstr, addlen+1 );
 return(str);  /*  gbn: return the (new) value of *str  */
}

int main()
{
 /*  gbn: added cast to char* for calloc; added check for success */
 char *str = (char*) calloc(1, sizeof(char)); /* str = "" */
 if (str == NULL) {
   printf("calloc failed! %s\n", strerror(errno)); exit(1);
}

 str=try_realloc( str, "first string," ); /* gbn: get new value for str */
 printf( "%s\n", str );

 str=try_realloc( str, " second string." ); /* gbn: new value for str */
 printf( "%s\n", str );

 free( str );
 return 0;
} 


Furthermore, Jed Brown also pointed out a problem with strncat():
   Since the length of the strings has been checked, we don't
   even need to use strncat (strcat would be okay) but it would be
   an off-by-one error if the code was reorganized later.  This is
   because strncat(pre,suf,n) appends at most n bytes of suf to pre
   and then appends a '\0'.


Finally, puzzle author Anton Kulchitsky had these parting words of wisdom:
   There is an important improvement which can be made to the code.
   The functions strcpy or memcpy are much faster options than
   strcat/strncat.  In Greg's program for example, replace the
   strncat() line with

       strcpy( str+len, addstr );

   [Editor's note: Anton pointed me to the GNU C Library documentation
   which boldly declares that "Programmers using the strcat or wcscat
   function (or the following strncat or wcsncar functions for that
   matter) can easily be recognized as lazy and reckless." However, that
   statement seemed a little too inflammatory to include in this 
   newsletter.]

   A good tool to check for many memory problems is valgrind which
   is available only on Linux machines. It shows that there is
   a problem in this code. Unfortunately, it is very hard to find
   where the problem occurs from its output.  To make the output more
   informative, the compiler option -g needs to be used to produce
   debugging information.  In this case, valgrind says which lines
   of the code cause an error.  [Editor's note: Watch for an article
   on valgrind in a future issue.]

Thank you, C wizards, for a very thorough analysis of this little
program.  I suspect the Fortran programmers in the audience are
feeling pretty good right now.


Q: My advisor just asked me to run a hundred different simulations.
   Is there an easy way to generate a PBS script for each of the input
   files he gave me?  If it helps, the input file is passed on the 
   command line to the program.

   e.g.
   ./a.out input000.nc

   And each input file is in a separate directory.

   input001/input001.nc
   input002/input002.nc
   ...
   input100/input100.nc
   
   So how do I do make this happen without wearing out my keyboard?

[[ Answers, Questions, and Tips Graciously Accepted ]]


Current Editors:
Ed Kornkven ARSC HPC Specialist ph: 907-450-8669
Kate Hedstrom ARSC Oceanographic Specialist ph: 907-450-8678
Arctic Region Supercomputing Center
University of Alaska Fairbanks
PO Box 756020
Fairbanks AK 99775-6020
E-mail Subscriptions: Archives:
    Back issues of the ASCII e-mail edition of the ARSC T3D/T3E/HPC Users' Newsletter are available by request. Please contact the editors.
Back to Top