ARSC HPC Users' Newsletter 398, December 19, 2008

Pingo Moving to Production in January

ARSC's new Cray XT5 supercomputer, called Pingo, is currently in pre-production operation. Pingo is scheduled to open for general use on January 8, 2009 and offers 3,456 compute cores. For more information, contact the ARSC consultants at

or see the Pingo web page .

Accessing the Command Line and Environment in Fortran

[By: Lee Higbie]

It is often convenient to pass parameters to a program on the command line or via environment variables instead of using an I/O statement. With all the ARSC Fortran 90 compilers, in other words with all compilers on Midnight and Pingo except g77 on Midnight, there are standard subroutines for accessing them. Each of these subroutines returns the information its name implies as a character string truncated or padded to fit cmd or arg:

  • get_command() returns the entire command line
  • get_command_argument() returns the Nth command line argument
  • get_environment_variable() returns the value of a specified environment variable

The program I used to verify these routines illustrates their use:

Program fnTest
   implicit none
   integer              :: st , length
   character (len = 24) :: cmd, arg

   call get_command(cmd, length, st)
   print *, 'get_command:', length, st, trim(cmd), '.'
! output: 1st 24 characters of cmd (the command line), its length, and 
!         status.  status = -1 if > 24 characters in cmd

   call get_command_argument(0, cmd, length, st)
   print *, 'get_command_argument 0:', length, st, cmd, '.'
! output: 1st 24 characters of the executable name (argument 0)

   call get_command_argument(3, arg, length, st)
   print *, 'get_command_argument 3:', length, st, trim(arg)
! output: 1st 24 characters of 3rd argument

   call get_environment_variable('WORKDIR', arg, length, st)
   print *, 'get_environment_argument WORKDIR:', length, st, trim(arg)
! output: 1st 24 characters $WORKDIR.  Note no $ in var name above
end program fnTest

"Length" in the above calls is set by the subroutine to the number of characters in the actual result, which may be greater or less than the length of cmd or arg. "st" is the return status, 0 if the call worked. If the argument was too long to fit in the result variable (cmd or arg above), st will be returned negative.

The first argument in the get_command_argument subroutine is the position number of the argument to the command that started execution of the program. If you specify argument number 0, the command itself is returned.

The first argument to the get_environment_variable subroutine is the name of the environment variable whose value is desired (note that it does not have the $ and is case sensitive).

If you use the command

    ./a.out This is a,funny argument

or, in a PBS script have the line

     mpirun -n 1 ./a.out This is a,funny argument

(or aprun for Pingo) the output will be something like:

 get_command 2: 32 -1 ./a.out This is a,funny .
 get_command_argument 0: 7 0 ./a.out               .
 get_command_argument 3: 7 0 a,funny
 get_environment_argument WORKDIR: 14 0 /wrkdir/higbie

If you request a non-existent argument, the routines return a positive st and return arg is usually blank (compiler dependent).

For these subroutines you need to use a Fortran 90 compiler but most of the arguments to the subroutines are optional. The specified syntax is:

    call get_command_argument(number, value, str_length, status)
    call get_command(value, str_length, status)
    call get_environment_variable(name, value, str_length, st)

In these calls the str_length parameter is set to the number of characters in the actual parameter. If value has a too small (len = strLen) declaration, only the first strLen chars are returned (and st is returned as -1).

The three subroutines are intrinsics, specified in the Fortran 03 standard; all ARSC's Fortran 90 compilers support them now.

The differences I saw when I tried them are:

  1. The error codes for missing arguments varied.
  2. The PGI compiler returned some junk for missing parameters, but it returned a status code of 1 indicating an error.

Valgrind's Memcheck Memory Checking Tool

[By: Craig Stephenson]

This article is the first of a multi-part series describing the various components of Valgrind, "a suite of tools for debugging and profiling" C/C++ codes. These tools include:

  • Memcheck: a heavyweight memory checker
  • Cachegrind: a cache and branch profiler
  • Callgrind: a call graph profiler
  • Helgrind: a thread error detector
  • Massif: a heap profiler

We will begin by covering Memcheck, Valgrind's default tool, to provide basic usage examples and show how subtle programming errors can be detected. Memcheck is a powerful tool that, in addition to pinpointing lines of code responsible for segmentation faults or inappropriate memory operations, will often consider the operation's context in the code to report concise, unambiguous findings.

As with other debuggers, to use Valgrind one must first compile their code with the -g option to enable debugging information. Also, to avoid confusing and/or misleading information, make certain to explicitly disable compiler optimizations by using the -O0 option. E.g.,

   pathcc -o myprogram myprogram.c -g -O0

Then, running diagnostics on the executable with Valgrind's Memcheck tool is as simple as the following:

   valgrind --tool=memcheck ./myprogram

Since Memcheck is Valgrind's default tool, you may skip the --tool=memcheck option. E.g.:

   valgrind ./myprogram

The following are several example C codes to demonstrate the types of errors Memcheck can detect.

Example #1

What's wrong with the following code?

#include <stdio.h>

int main()
  int *myArray = (int *) malloc(10*sizeof(int));

  int i;
  for(i=0; i <= 10; i++)
    myArray[i] = i;

  for(i=0; i < 10; i++)
    printf("%d\n", myArray[i]);


  return 0;

Running this code without Valgrind will not necessarily produce an error. The executable may crash intermittently with a segmentation fault, depending on the system, but otherwise appear to work properly by printing values from 0 to 9. Running the executable through Valgrind's Memcheck produces the following output:

> valgrind ./example1
==9131== Invalid write of size 4
==9131==    at 0x400604: main (example1.c:10)
==9131==  Address 0x4d46058 is 0 bytes after a block of size 40 alloc'd
==9131==    at 0x4A1BB69: malloc (vg_replace_malloc.c:207)
==9131==    by 0x4005DA: main (example1.c:5)

The code has an off-by-one error. It attempts to write an integer to an unallocated memory address located immediately after the end of the allocated array because the first loop executes 11 times, not 10.

Example #2

#include <stdio.h>

int main()
  int memalloc = 512*sizeof(int);
  int forward[memalloc];

  int i;
  for(i=0; i < 512; i++)
    forward[i] = i;

  int backward[512];
  for(i=0; i < sizeof(forward); i++)
    backward[511-i] = forward[i];

  return 0;

Notice anything wrong with this code? Running it produces a segmentation fault. What does Memcheck find?

> valgrind ./example2
==16461== Invalid write of size 4
==16461==    at 0x4005A5: main (example2.c:17)
==16461==  Address 0x7feffcfbc is just below the stack ptr.  
             To suppress, use: --workaround-gcc296-bugs=yes
==16461== Process terminating with default action of signal 11 
            (SIGSEGV): dumping core
==16461==  Access not within mapped region at address 0x7FEFFBFFC
==16461==    at 0x4005A5: main (example2.c:17)
==16461== ERROR SUMMARY: 1009 errors from 1 contexts (suppressed: 8 from 5)
==16461== malloc/free: in use at exit: 0 bytes in 0 blocks.
==16461== malloc/free: 0 allocs, 0 frees, 0 bytes allocated.
==16461== For counts of detected errors, rerun with: -v
==16461== All heap blocks were freed -- no leaks are possible.
Segmentation fault

In this case, the problematic memory address is below the program's current stack pointer. Beyond simply attempting to write to unallocated memory, as in the last example, this code is overwriting memory used for the program itself. The following lines are to blame:

int memalloc = 512*sizeof(int);
int forward[memalloc];

Multiplying the number of elements in the array by the size of an integer would be appropriate if memory were being allocated dynamically with the malloc() function, like this:

int *forward = (int *) malloc(512*sizeof(int));

However, using static array allocation with its bracket syntax actually produced an array four times larger than 512 integers, for a total of 2048 integers. This, in turn, overflowed the 512-integer "backward" array using negative array indices.

Example #3

#include <stdio.h>

int main()
  int original[5] = { 11, 22, 33, 44, 55 };
  int *temp = (int *) malloc(10*sizeof(int));

  int i;
  for(i=0; i < sizeof(original) / sizeof(int); i++)
    *(temp++) = original[i];

  for(i=0; i < sizeof(original) / sizeof(int); i++)
    printf("%d\n", temp[i]);


  return 0;

Presumably, this program copies the 5 elements from the statically allocated "original" array to the first 5 elements of the dynamically allocated "temp" array, then prints these same 5 elements from the "temp" array. Yet, the output actually looks like this:

> ./example3

This does not look anything like the "original" array. Why not?

> valgrind ./example3
==10454== Conditional jump or move depends on uninitialised value(s)
==10454==    at 0x4B60159: vfprintf (in /lib64/tls/
==10454==    by 0x4B66C99: printf (in /lib64/tls/
==10454==    by 0x40067A: main (example3.c:16)
==10454== Conditional jump or move depends on uninitialised value(s)
==10454==    at 0x4B601C5: vfprintf (in /lib64/tls/
==10454==    by 0x4B66C99: printf (in /lib64/tls/
==10454==    by 0x40067A: main (example3.c:16)
==10454== Conditional jump or move depends on uninitialised value(s)
==10454==    at 0x4B60FD9: vfprintf (in /lib64/tls/
==10454==    by 0x4B66C99: printf (in /lib64/tls/
==10454==    by 0x40067A: main (example3.c:16)
==10454== Conditional jump or move depends on uninitialised value(s)
==10454==    at 0x4B6022C: vfprintf (in /lib64/tls/
==10454==    by 0x4B66C99: printf (in /lib64/tls/
==10454==    by 0x40067A: main (example3.c:16)
==10454== Conditional jump or move depends on uninitialised value(s)
==10454==    at 0x4B60255: vfprintf (in /lib64/tls/
==10454==    by 0x4B66C99: printf (in /lib64/tls/
==10454==    by 0x40067A: main (example3.c:16)
==10454== Invalid free() / delete / delete[]
==10454==    at 0x4A1AD6E: free (vg_replace_malloc.c:323)
==10454==    by 0x400699: main (example3.c:19)
==10454==  Address 0x4d46044 is 20 bytes inside a block of size 40 alloc'd
==10454==    at 0x4A1BB69: malloc (vg_replace_malloc.c:207)
==10454==    by 0x4005FB: main (example3.c:6)
==10454== ERROR SUMMARY: 36 errors from 8 contexts (suppressed: 8 from 5)
==10454== malloc/free: in use at exit: 40 bytes in 1 blocks.
==10454== malloc/free: 1 allocs, 1 frees, 40 bytes allocated.
==10454== For counts of detected errors, rerun with: -v
==10454== searching for pointers to 1 not-freed blocks.
==10454== checked 165,808 bytes.
==10454== LEAK SUMMARY:
==10454==    definitely lost: 40 bytes in 1 blocks.
==10454==      possibly lost: 0 bytes in 0 blocks.
==10454==    still reachable: 0 bytes in 0 blocks.
==10454==         suppressed: 0 bytes in 0 blocks.
==10454== Rerun with --leak-check=full to see details of leaked memory.

In C, the following syntax will indeed access the next element of the "temp" array:


But remember, using the "++" operator makes a persistent incremental change to the variable on which it is used. Using the "++" operator worked as intended when the program set the first five elements of the "temp" array, but when the program later attempted to print those five elements, it was actually printing the five elements immediately following the five elements that had been set. I.e, the program printed temp[5] through temp[9] instead of temp[0] through temp[4]. Since temp[5] through temp[9] were never initialized, Valgrind's Memcheck reported this as a possible problem.

Memcheck also reported a memory leak. The program tried to use the free() function to deallocate the "temp" array. The operation did not make much sense at this point in the code, however, since "temp" was now pointing "20 bytes inside a block of size 40 alloc'd".

These examples demonstrate several of the ways Valgrind's Memcheck can help debugging efforts. In addition to the problems covered in this article, Memcheck can also detect:

  • Reading/writing memory after it has been free'd
  • Mismatched use of malloc/new/new [] vs free/delete/delete []
  • Overlapping src and dst pointers in memcpy() and related functions

For more information, refer to Valgrind's Memcheck manual at the following URL:

Quick-Tip Q & A

A:[[ Is there a command, like "seq," but which increments or decrements
  [[ dates in the standard, YYYYMMDD format?  My wish would be for
  [[ something like this:

     computer:~ % for dd in $(seq --yyyymmdd  20081029 20081102)
     > do
     >   echo /var/local/log/somelog.${dd}
     > done

# Bill Homer offers this Perl solution

Here is a script that uses the Perl Time::Local module. It expects
start and end dates as arguments, and in the interest of brevity,
does no argument checking.

#!/usr/bin/env perl

use Time::Local "timelocal_nocheck";

($start, $end) = @ARGV;
($sy, $sm, $sd) = ($start =~ /(\d\d\d\d)(\d\d)(\d\d)/);

while ($start <= $end) {
 print $start, "\n";
 @next = localtime timelocal_nocheck 0,0,0,++$sd,$sm,$sy;
 $start = sprintf("%4d%02d%02d", 1900+$next[5], 1+$next[4], $next[3]);

# Ryan Czerwiec took the shell script route

Here's a way you'll have to modify on the fly, but it looks like the
"wishlist" solution would too.  This could be a little more difficult
than the "wishlist" format if the desired time span is quite large.  I
can envision having to pull out a calendar and add up days in each month
to set the sequence limits correctly, but with any luck that's not a
computer:~ % for dd in $(seq $a $b)
  echo /var/local/log/somelog.`date -d "$baseday + $dd days" +%Y%m%d`

will work where you define or hardwire a and b to be relative to
baseday, i.e. -1 for the day before, 7 for a week after, etc.  If you
want dates far from today, set baseday to, say, April 1:
baseday="2008-04-01" or leave it blank or unset for baseday to be the
current day.  You could use a structure like this instead (this is in
set a = 0
set day = 0
while $day != $end
  set day = `date -d "$baseday + $a days" +%Y%m%d`
  echo /var/local/log/somelog.$day
  @ a++

Just set baseday in 2008-04-01 format and end in 20080501 format.  Of
course, the seq approach assures a finite number of trips through the
loop, but the while loop could go on forever if the user isn't careful.

Q: I am developing an MPI program and would like to display some
   basic information about the systems the MPI tasks are running on.
   Some of the nodes I'm using have more than one task running on each.
   Is there a way display this information only once per node?  Is there
   a way to determine the number of tasks running on each node?

   If my job was running on the following nodes:


   I might want to see something like this:

   node        tasks
   ----------  --------
   nid00008    4
   nid00009    2
   nid00010    1

   The machine I'm using doesn't seem to have an MPI hostfile, so it
   looks like I will have to do this with MPI calls alone.

[[ Answers, Questions, and Tips Graciously Accepted ]]

Current Editors:
Ed Kornkven ARSC HPC Specialist ph: 907-450-8669
Kate Hedstrom ARSC Oceanographic Specialist ph: 907-450-8678
Arctic Region Supercomputing Center
University of Alaska Fairbanks
PO Box 756020
Fairbanks AK 99775-6020
E-mail Subscriptions: Archives:
    Back issues of the ASCII e-mail edition of the ARSC T3D/T3E/HPC Users' Newsletter are available by request. Please contact the editors.
Back to Top