ARSC T3E Users' Newsletter 143, May 29, 1998

Goodbye "denali"

ARSC's first supercomputer, denali, an 8-CPU, 1-GWord, CRAY Y-MP, will be removed from service this Sunday, the 31st. Denali has been replaced by chilkoot, a 12-CPU, 1-GWord, CRAY J932se.

Debugging Parallel Programs with Totalview

Moving from a single-process, GUI debugger to totalview for MPP programs should not be too difficult. When you get tired of adding unnecessary print statements to help troubleshoot, it's probably worth an hour or so to learn MPP totalview.

Here's a quick startup guide:

  1. Exporting the debugger display: Try totalview's X Window System GUI interface (the default). You must first export the display from your supercomputing center's T3E to your local workstation. (ARSC SGI users: read news xauth for instructions.) Totalview's -L option, if your connection is too slow, launches the ASCII interface.
  2. Compiling for the debugger: You must recompile those source files that you wish to debug. Use the -g or -G compiler options. (You don't have to recompile every program unit! This is a useful feature for users with huge codes.) For both the Fortran and C/C++ compilers, under Cray's programming environment 3.0, finer control is provided by the -G option. -G specifies the debug level . With either compiler, the -g option selects the default debug level. Read the man pages for details, and remember, these options eliminate optimizations, so recompile without either -g or -G before attempting any productions runs.
  3. Specifying the number of PEs your program will use in the debugger: Two points here. First, totalview is an interactive program, so you may not run it (and the program to be debugged) on more PEs than are provided to you for interactive use. At ARSC, the interactive limit for all users is set to 8. The command: udbsee | grep jpelimit will show your PE limits (jpelimit[i] is interactive limit, jpelimit[b] is batch limit). Second, you must invoke totalview with the -X option to specify the number of PEs. For instance: totalview -X 3 will launch totalview on 3 PEs. You can check this using grmview, which will show a job, owned by you, running (or waiting to run) on 3 PEs, and called, ttsession .
  4. Specifying your executable to debug: Name your executable file on the shell command line: totalview -X 3 ./a.out or, after starting totalview, use the file pull-down menu's load new program... option.
  5. Setting search paths, when your source code and executable are in different directories: For symbolic debugging, totalview needs to find your source files. Specify the paths for it to search using the Set Search Directory option under the Source menu item.
  6. Starting the program with or without command line arguments: Once you've loaded the program, you'll get a familiar-looking debugger window with oval checkboxs, adjacent to your lines of source code, for setting breakpoints. Set a breakpoint at the top of your code. Run the program. If you need to run the program with command line arguments, use the run (args)... option under the Control menu, otherwise, just click the run button.
  7. Choosing the current PE: Use the PE text-box to chose the PE to view. Process states are usually different on different PEs. The PE setting allows you to see the current values of variables and execute lines on different PEs.
  8. Stepping and setting breakpoints on multiple PEs: Use the PSet menu option to switch between PSet SINGLE and ALL . PSet stands for "process set" (as in, "set of processes") and the current PSet value is displayed (not accessed) in the PSet text-box. PSet affects (among other things) the way you step through the program. When PSet is set to SINGLE , you may chose a PE, as described above, and step through its process without advancing the other processes. When PSet is set to ALL , you take a step in all processes whenever you click "step." You may switch back and forth between ALL and SINGLE. Next behaves like Step . Breakpoints, however, are established without reference to PSet .
  9. Help: Most totalview controls and concepts will be familiar if you have used debuggers before. In any case, xhelp , is a great reference.
  10. Notes:
    1. Many programs try to minimize explicit synchronization events, assuming, instead, that the processes on different PEs will proceed at the same pace. You can certainly tamper with this lock-step timing when you single step through a program, one PE at a time. By manually changing the expected order of events, you can cause a functioning program to deadlock in ways that would not occur naturally.
    2. The debugger flags, -g and -G, disable compiler optimizations. (Read the compiler man pages for details.) This can change the order of operations, meaning that bugs (created by optimization) might "go away." It also means that the program will run slower, sometimes considerably slower.

Again, be sure to recompile with optimization enabled before doing any production runs !

CUG Final Program

Next month will see the annual Cray User Group conference in Stuttgart, Germany, June 15-19. More details can be found at , where the latest program can be down-loaded. ARSC will be presenting a number of items,

  • a paper from Guy Robinson on 'Pushing/Dragging Users Towards Better Utilization'
  • Barbara Horner-Miller will be presenting results of a survey of user support activity at Cray sites.
  • And ARSC users have contributed several of their latest visualizations to be shown at various points throughout the conference.

A review of the conference will appear in a future newsletter.

Review of Parallel Tools for MPI Development

The following document:

gives a nice comparison and review of the following parallel tools for MPI development:

  • AIMS - instrumentors, monitoring library, and analysis tools
  • MPE logging library and Nupshot performance visualization tool
  • Pablo - monitoring library and analysis tools
  • Paradyn - dynamic instrumentation and run-time analysis tool
  • SvPablo - integrated instrumentor, monitoring library, and analysis tool
  • VAMPIRtrace monitoring library and VAMPIR performance visualization tool
  • VT - monitoring library and performance analysis and visualization tool for the IBM SP

VAMPIR is available to ARSC users. Reread newsletter #129 for details on using this tool:


We strongly encourage heavy MPI users to analyze their code! Impressive speedup is sometimes possible.


Quick-Tip Q & A

A: {{ How do you Unix wizards trick e-mail into sending those
      automatic, "Don't bother me, I'm at Eurodisney," messages? }}

First, create a ".forward" file in your home directory--this is a good
idea, anyway.  Normally, the .forward file will contain the e-mail 
address of your workstation, or whatever address to which you would 
like your mail forwarded.

Use the .forward file to save one copy of each message and pipe another
into the "vacation" utility, which sends automatic replies.  Read "man
vacation," for details.

Here is one reader response to this question:

This week's query about an automated reply to incoming e-mail while
away from the office is actually a touchy issue. The Berkeley Unix
'vacation' program is available with all SunOS (incl. Solaris)
machines, and some others. However, its use has been discouraged for at
least two reasons. It has always been a problem for generating excess
traffic in response to bulk e-mail. Also it was a small component in a
common 'sendmail' security vulnerability a couple years back (the
problem was with sendmail, but a common fix included disabling

Q: The following snippet of Fortran code fails on the "write" because
   it would output 40*7, or 280 characters per line.  It turns out 
   that the maximum allowed is 267 per line, and that there is a
   corresponding restriction on "reads."

        open (unit=5,file="test.out")
        do j=1,5
          write (5,100) (out(j,i),i=1,40)
 100      format (40(i6,' '))
        close (unit=5)

   Here's the system info provided when the code crashes:

    yukon$ ./a.out
    lib-1211 ./a.out: UNRECOVERABLE library error 
      A WRITE operation tried to write a record that was too long.
    Encountered during a sequential formatted WRITE to unit 5
    Fortran unit 5 is connected to a sequential formatted text file: "test.out"
     Current format:   100 FORMAT(40(i6,' '))

   And here's the question:

   What if you *really* needed to read or write lines in excess of
   267 characters?  What could you do?

[ Answers, questions, and tips graciously accepted. ]

Current Editors:
Ed Kornkven ARSC HPC Specialist ph: 907-450-8669
Kate Hedstrom ARSC Oceanographic Specialist ph: 907-450-8678
Arctic Region Supercomputing Center
University of Alaska Fairbanks
PO Box 756020
Fairbanks AK 99775-6020
E-mail Subscriptions: Archives:
    Back issues of the ASCII e-mail edition of the ARSC T3D/T3E/HPC Users' Newsletter are available by request. Please contact the editors.
Back to Top