ARSC HPC Users' Newsletter 244, April 24, 2002

Malloc Debugging with Electric Fence

[ Jim Long of ARSC contributed this article. ]

This article looks at a method of debugging memory problems with malloc-ed memory using a library called "Electric Fence", available free from: http://perens.com/FreeSoftware

From the README for Electric Fence:

Electric Fence is a different kind of malloc() debugger. It uses the virtual memory hardware of your system to detect when software overruns the boundaries of a malloc() buffer. It will also detect any accesses of memory that have been released by free(). Because it uses the VM hardware for detection, Electric Fence stops your program on the first instruction that causes a bounds violation. It's then trivial to use a debugger to display the offending statement.

This version will run on:

  • Linux kernel version 1.1.83 and above. Earlier kernels have problems with the memory protection implementation.
  • All System V Revision 4 platforms (and possibly earlier revisions) including:
    • Every 386 System V I've heard of.
    • Solaris 2.x
    • SGI IRIX 5.0 (but not 4.x)
  • IBM AIX on the RS/6000.
  • SunOS 4.X (using an ANSI C compiler and probably static linking).
  • HP/UX 9.01, and possibly earlier versions.
  • OSF 1.3 (and possibly earlier versions) on a DECalpha.

On icehawk, ARSC's IBM SP system (RS/6000), libefence.a compiled successfully with both gcc version 2.95.2, and IBM's xlc version 5. For gcc, the only change to the Makefile is CC=gcc, and for xlc, CC=xlc in addition to uncommenting additional CFLAGS that also must be used when compiling your code.

Let's look at a simple sample of how it works (this assumes the user, "username," has stored libefence.a to his or her /u1/uaf/username/lib directory):


  /* e.c - to test Electric Fence */
  #include <stdio.h>

  int main()
  {
     int i;
     int * ptr = (int *) malloc(10*sizeof(int));

     for(i=0; i<11; i++)
       ptr[i] = 2*i;
  }

  icehawk1:> gcc -g e.c -L/u1/uaf/username/lib -lefence
  icehawk1:> ./a.out

     Electric Fence 2.0.5 Copyright (C) 1987-1995 Bruce Perens.
  Segmentation fault
  icehawk1:>
        

Electric Fence put a "fence" of inaccessible memory right after the memory allocated by malloc, resulting in a core dump when ptr[10] tried to access the memory. Invoking dbx on the core file reveals the offending statement and value of i:


  icehawk1:> dbx a.out core
  Type 'help' for help.
  warning: The core file is truncated.  You may need to increase the 
  ulimit for file and coredump, or free some space on the filesystem.
  reading symbolic information ...internal error: unexpected value 120 
  at line 400
  5 in file stabstring.c

  [using memory image in core]

  Segmentation fault in main at line 9 in file "e.c"
       9       ptr[i] = 2*i;
  (dbx) print i
  10
  (dbx)
        

Important environment variables available with Electric Fence:

EF_PROTECT_BELOW
The default behavior of Electric Fence is to only place the inaccessible page after the allocated memory. To instead place the inaccessible page before the allocated memory, set the environment variable EF_PROTECT_BELOW to 1.
EF_PROTECT_FREE
If you suspect that freed memory is being accessed, set EF_PROTECT_FREE to 1. This prevents Electric Fence from returning freed memory to the system so that you can get a segmentation fault if some statement tries to access that memory.
EF_ALLOW_MALLOC_0
Electric Fence normally traps calls to malloc with size zero. To allow malloc calls of size 0, set EF_ALLOW_MALLOC_0 .
EF_ALIGNMENT
EF_ALIGNMENT is used to set word alignment, see the man page for details.

A few caveats when using Electric Fence. It can create huge core dumps, so it may be better to run your code in the debugger without a core file. Electric Fence can also consume a large amount of memory for big dynamic structures, so you may have to add additional swap space. All in all, though, Electric Fence is a great free tool for those occasional nasty debugs.

"llmap": Handy IBM SP Loadleveler Utility

Jeff McAllister of ARSC has written a perl script, called "llmap," to display loadleveler work on the IBM SP. This is similar to "grmap" for the Cray T3E. "llmap" is available in /usr/local/bin.

Here's sample output (names have been changed, of course) from a run on icehawk. It's pretty self-explanatory:


  ICEHAWK1$ llmap
  NONRUNNING JOBS:
    JID   uname    jname           usage      min max status
    4121  bigbob   icehawk1.4121   not_shared 32  32  Idle
  
  RUNNING JOBS:
    JID   uname    jname           usage      #n ld  minMB maxMB
  A 4104  honey    myjob.x         not_shared 12 4.1  1556  1717
  B 4105  punkin   icehawk2.4105   not_shared 13 4.1   164   365
  C 4116  bigbob   icehawk1.4116   not_shared 8  4.1   163   357
  D 4127  slim     myjob           not_shared 8  0.2    97   291
  
             free load               free load               free load
  node   job  MB  avg     node   job  MB  avg     node   job  MB  avg 
  --------------------------------------------------------------------
  i1n1     B 1575 4.1     i2n17    B 1401 4.0     i3n33    B 1504 4.1     
  i1n2     D 1449 0.9     i2n18    A 23   4.1     i3n34    C 1459 4.2     
  i1n3                    i2n19    A 27   4.1     i3n35    C 1389 4.2     
  i1n4     B 1420 4.0     i2n20                   i3n36    C 1400 4.1     
  i1n5     A 23   4.1     i2n21    A 23   4.2     i3n37    C 1402 4.1     
  i1n6     B 1489 4.0     i2n22    A 101  4.0     i3n38    C 1383 4.1     
  i1n7                    i2n23    A 50   4.1     i3n39    D 1495 0.1     
  i1n8     B 1510 4.1     i2n24    A 23   4.2     i3n40    D 1519 0.1     
  i1n9     B 1410 4.0     i2n25    B 1576 4.0     i3n41    C 1492 4.1     
  i1n10    C 1561 4.0     i2n26                   i3n42    C 1577 4.0     
  i1n11    A 184  4.0     i2n27    B 1401 4.0     i3n43    D 1643 0.1     
  i1n12    B 1375 4.0     i2n28    A 23   4.1     i3n44    D 1623 0.1     
  i1n13    B 1413 4.0     i2n29    A 105  4.1     i3n45    D 1532 0.1     
  i1n14    D 1629 0.1     i2n30    B 1419 4.4     i3n46    D 1622 0.2     
  i1n15                   i2n31    A 23   4.0     
  i1n16    A 23   4.1     i2n32    B 1405 4.1     
  --------------------------------------------------------------------
  46 nodes
  5 free, 41 working, 0 sharing

If you've submitted a loadlever job, you might run llmap to see if it has started, how much memory it's using, etc. If it hasn't started, llmap might give you clues why, but this can be tricky. To start a big, high-priority job, loadleveler might be in the process of draining the nodes. The appearance of idle nodes doesn't necessarily mean that your is just about to start. (As a reminder, specifying the time and minimum number of nodes your job will need can help your job start sooner under loadleveler's backfill algorithm.)

Please send feedback on llmap.

Any additional features you need or want?

Cray Support for HDF and HDF5 May Be Dropped

The latest HDF newsletter from NCSA, available at:

http://hdf.ncsa.uiuc.edu/newsletters/newsletter67.html#cray

contained this note:

  Cray Support:

    The Cray computers that we use for building HDF/HDF5 are being
    retired.  We are therefore considering dropping support for the
    Crays. If you are a Cray user and you use HDF or HDF5, please let us
    know as soon as possible.

Contact info is given on the above page. ARSC users concerned about this should let us know also, email, consult@arsc.edu .

Rob Bell Talk Rescheduled to Monday April 28

This talk rescheduled to Monday when, conveniently, the speaker will be in town:

Our NEC of the Woods: Oz, CSIRO, HPCC and NEC

Dr. Robert Bell Deputy Manager CSIRO High Performance Computing and Communications Centre   Monday, April 29th 2:30pm - 4:00pm 109 Butrovich, UAF Campus

Details:

http://www.arsc.edu/pubs/bulletins/HPCCCTalk2002.shtml

Irregular "Schedule" of this Newsletter

This, "Bi-Weekly on Fridays" newsletter has been a bit irregular lately.

We hope this isn't too troublesome to you!

Travel by the editors is often, this week for instance, the cause (we attend CUG, SPSCICOMP, SC, DoD User Group meetings, etc., and even take vacations.)

Sometimes, we confess... we just miss the "deadline." Articles submitted by readers are a big help, and you're ALWAYS invited to share your work, knowledge, discoveries, and, of course, tips.

Quick-Tip Q & A


A:[[ Am I going nuts? 
  [[
  [[  $ ls -d DATA
  [[    DATA
  [[  $ ls DATA
  [[    D2001  index.txt
  [[  $ cd DATA
  [[    sh-56 ksh: DATA: not found.
  [[ 
  [[ Why is it doing this to me?


  # 
  # Thanks to Richard Griswold:
  # 

  Something similar to this happens to me when I do not have execute
  permission to the directory.  However, on Linux, Solaris, and HP-UX I
  get the message "permission denied" instead of "not found".  The message
  may be different on other OSes.  See if you execute permission for DATA
  using
  
    ls -ld DATA

  #
  # From editors:
  #

    $ ls -ld DATA
    drw-------   3 nutso    thegroup    4096 Apr 11 11:19 DATA
  
  Given read permission on a directory, ls can open the directory inode
  and read the names of the files within. Lacking execute permission on
  the directory, however, ls can't open the files' inodes, and thus,
  can't access the extra information about the files provided by, for
  instance,  "ls -l".  In addition, cd is prohibited from making a
  non-executable directory the present working directory.

  Note, the error messages reported in this question are different in 
  detail on different systems.



Q:From time to time I'm instructed to append some path to "PATH", 
  some file to "LM_LICENSE_FILE", or some such thing.  For instance,
  in the last issue, regarding Totalview:

      "To use it, add this: 
  
        /usr/local/adm/pkg/flexlm/license.dat 
  
      to the settings of your LM_LICENSE_FILE environment variable."

  How specifically, would you suggest I do that?

[[ Answers, Questions, and Tips Graciously Accepted ]]


Current Editors:
Ed Kornkven ARSC HPC Specialist ph: 907-450-8669
Kate Hedstrom ARSC Oceanographic Specialist ph: 907-450-8678
Arctic Region Supercomputing Center
University of Alaska Fairbanks
PO Box 756020
Fairbanks AK 99775-6020
E-mail Subscriptions: Archives:
    Back issues of the ASCII e-mail edition of the ARSC T3D/T3E/HPC Users' Newsletter are available by request. Please contact the editors.
Back to Top