ARSC HPC Users' Newsletter 209, December 1, 2000



UAF Colloquium Series: Jon Genetti, Dec. 7

The UAF Department of Mathematical Sciences and ARSC are jointly sponsoring a Mathematical Modelling, Computational Science, and Supercomputing Colloquium Series.

The schedule and abstracts for the '00-'01 academic year are available at:

The next presentation:

Bringing Space Into Focus Dr. Jon Genetti Computer Scientist San Diego Supercomputer Center

Date: Thursday, December 7, 2000 Time: 2:00-3:00 PM Location: Natural Sciences 202


The San Diego Supercomputer Center collaborated with the American Museum of Natural History to produce a visualization of the Orion Nebula for the new Hayden Planetarium.

During the Space Show, viewers are transported 1500 light years to the heart of the nebula on an 87 foot digital dome consisting of 9 million pixels. An alternate version was produced for flat displays and was recently shown at Siggraph in the Electronic Theater.

The talk will cover the following topics:

  • overview of the project and technical challenges
  • description of the visualization process that began with Hubble Space Telescope imagery and ended with simulated Orion Nebula images from any viewpoint
  • development of the rendering software
  • content generation for alternate (non-flat) displays
  • current work of visualizing cells for cancer research
  • future work of visualizing the Aurora Borealis

Jon Genetti is a Computer Scientist in the SDSC Scientific Visualization group. His current interests are designing and implementing algorithms for out-of-core visualization, volume rendering, and medical imaging.

Jon received his Ph.D. degree in Computer Science from Texas A&M University in 1993. Prior to arriving at SDSC in 1996, Jon spent three years as a Visiting Assistant Professor at the University of Alaska Fairbanks.


Book Review: Parallel Programming in OpenMP

[Review by Tom Baring.]

Parallel Programming in OpenMP Chandra, Dagum, Kohr, Maydan, McDonald, Menon ISBN 1-55860-671-8 Academic Press Copyright 2001 230 pages

This is a great introduction to parallel programming; it provides both introductory and in-depth matter on the OpenMP API. It's well-written, enjoyable, and succeeds in the goal it sets in the Preface (p xiii):

"...the main information available about OpenMP [was] the OpenMP specification (available from the OpenMP Web site at Although this is appropriate as a formal and complete specification, it is not a very accessible format for programmers wishing to use OpenMP for developing parallel applications. This book tries to fulfill the needs of these programmers."

Unlike many programming texts, the authors don't bury the reader in code. There are examples throughout, but big sections of code required by dumb machines but annoying to smart people are generally left out. In Chapter 2, we meet our first parallelized loop (p23):

       subroutine saxpy z,a,x,y,n)
       integer i,n
       real z(n), a, x(n), y

!$omp parallel do
       do i = 1, n
           z(i) = a * x(i) + y

And are treated to a clear, informative description (p24):

"These ... threads divide the iterations of the do loop among themselves, with each thread executing a subset of the total number of iterations. There is an implicit barrier at the end of the parallel do construct."

Did you know about the implicit barrier? The OpenMP standard is small, but there are important details like this on perhaps every topic.

There are chapters on loop-level parallelism, parallel regions, and synchronization. Each chapter describes the generic concepts (applicable in any programming model) and then focuses on OpenMP solutions for the applications programmer.

The final chapter, of interest to those with production codes, covers performance issues. It starts with Amdahl's law and factors (like load balance) that affect performance, and proceeds to issues specific to different architectures (cache vs vector, for instance). To whet your appetite (p203):

"We have given reasons why using more threads than the number of processors can lead to performance problems. Do the same problems occur when each parallel application uses less threads than processors, but the set of applications running together in aggregate uses more threads than processors?"

Perhaps 10% of the examples are in C/C++, and attention is paid to particular C++ issues. The Fortran concepts translate to C however, so this shouldn't be a problem.

My biggest complaint is that the book ignores the existence of Fortran 90/95. I noted a total of one (1) example of Fortran 90 array syntax, nothing on modules or Fortran 90 intrinsics, and no discussion of Fortran 90 issues.

I give this book a strong recommendation. Co-editor Guy Robinson must like it too, as he's assigned it as a required text for his graduate course, "Parallel Programming for Scientists," next semester at UAF.


Updated Reading List on High Performance Parallel Processing

[ Many thanks to Guy Robinson for periodically sharing his reading list via this newsletter. Guy requests additional suggestions and reviews. ]

MPI information sources.

MPI: The Complete Reference. Snir, Otto, Huss-Lederman, Walker and Dongarra.
MIT Press.
ISBN 0 262 69184 1

(*)MPI: The Complete Reference, volume 2. Gropp et al. MIT Press.
ISBN 0262571234

Using MPI. Gropp, Lusk, Skjellum. MIT Press.
ISBN 0 262 57104 8

OpenMP information sources.

Parallel Programming in OpenMP. Chandra, Kohr, Menon, Dagum, Maydan, McDonald.
Morgan Kaufmann.
ISBN: 1558606718

Parallel Programming Skills/Examples.

Practical Parallel Programming. Gregory V. Wilson. MIT Press.
ISBN 0 262 23186 7

Designing and Building Parallel Programs. Ian Foster. Addison-Wesley.
ISBN 0 201 57594 9

Parallel Computing Works!
Roy D. Williams, Paul C. Messina (Editor), Geoffrey Fox (Editor), Mark Fox
Morgan Kaufmann Publishers; ISBN: 1558602534

An interesting review of programming languages can be found at

Fortran, C, HPF, and other languages.

Fortran90/95 Explained. Metcalf and Reid. Oxford Science Publications.
ISBN 0 19 851888 9

Fortran 90 Programming. Ellis, Philips, Lahey. Addison-Wesley.
ISBN 0-201-54446-6

Programmers Guide to Fortran90. Brainerd, Goldberg, Adams. Unicomp.
ISBN 0-07-000248-7

The High Performance Handbook. Koelbel, Loveman, Schreiber, Steele, Zosel.
ISBN 0-262-11185-3/0-262-61094-9

Parallel Programming using C++. G.V.Wilson and P Lu. MIT Press.
ISBN 0 262 73118 5

A Programmer's Guide to ZPL. Synder, MIT Press.
ISBN 0-262-69217-1


Scientific Visualisation, Overviews, Methodologies and Techniques, 
Nielson, Hagen and Muller.
ISBN 0-8186-7777-5, IEEE order number BP07777.

Visual Explanations : Images and Quantities, Evidence and Narrative
by Edward R. Tufte ISBN: 0961392126

Envisioning Information by Edward R. Tufte ISBN: 0961392118

The Visual Display of Quantitative Information by Edward R. Tufte
ISBN: 096139210


Numerical Recipes in Fortran 77 and Fortran 90 : The Art of Scientific 
and Parallel Computing.
William H. Press, Saul A. Teukolsky, William T. Vetterling, Brian P. 
Cambridge Univ Pr (Pap Txt); ISBN: 0521574404 ;

Numerical Recipes Example Book (Fortran)
William T. Vetterling, Saul A. Teukolsky, William H. Press
Cambridge Univ Pr (Pap Txt); ISBN: 0521437210

Numerical Recipes in C : The Art of Scientific Computing
William H. Press, Saul A. Teukolsky, William T. Vetterling, Brian P. 
Cambridge Univ Pr (Short); ISBN: 0521431085

Numerical Recipes Example Book (C)
William T. Vetterling, Saul A. Teukolsky, William H. Press
Cambridge Univ Pr (Pap Txt); ISBN: 0521437202

Numerical Recipes in Fortran 90: The Art of Parallel Scientific
  Computing, Volume 2 of Fortran Numerical Recipes - Press, Teukolsky,
  Vetterling and Flannery, Cambridge U. Press, ISBN 0-521-57439-0, 1996. 
  Code can be downloaded (purchased) from

See also

for web versions of the 
Numerical Recipes series of books for browsing.


A two volume set:
(1) High Performance Cluster Computing: Architecture and Systems,
(2) High Performance Cluster Computing: Programming and Applications,
(R. Buyya editor)...Prentice Hall PTR, Upper Saddle River, 1998.

In Search of Clusters 2nd Edition. Gregory.F.Pfister. Prentice Hall
PTR, Upper Saddle River, 1998,  ISBN 0-13-899709-8.

How to Build a Beowulf. Sterling, Salmon, Becker and Savarrese. The 
MIT Press, 1999,
ISBN 0-262-69218-X.

Parallel Programming

Techniques and Applications Using Networked Workstations and
Parallel Computers. Wilkinson and Allen. Prentice Hall, Upper Saddle 
River, 1999, 
ISBN 0-13-671710-1.

Building Linux Clusters. Spector, O'Reilly.
IBSN 1-56592-625-0.

Programming Skills.

Debugging and Performance Tuning for Parallel Computing Systems, 
Simmons et al.

Foundations of Parallel Programming, A Machine-independent Approach, 


The Clockwork Muse. Zerubavel. Harvard University Press.
ISBN 0-674-13586-5

Background information/Fun Reading.

Hal's Legacy: 2001's Computer as a Dream and a Reality.
ISBN 0 262 19378 7

High Performance Compilers for Parallel Computing. Michael Wolfe,
ISBN 0-8053-2730-4

Supermen. C.J Murray. Wiley.
ISBN 0 471 04885 2

The Victorian Internet : The Remarkable Story of the Telegraph and the
Nineteenth Century's On-Line Pioneers. Tom Standage, Berkley Pub Group; 
ISBN: 0425171698.

Recreational Recommendations.

Chocolat, Bean Trees, Candide, Children of God, and Troublesome
Offspring of Cardinal Guzman. (Guzman and Children both have
pre-requisite reading.)

CUG SUMMIT 2001 Call For Papers: Deadline Dec.8

Final Call for Papers:

CUG SUMMIT 2001 May 21-25, Indian Wells, California Deadline: December 8, 2000

The form for submitting an abstract is at:


Quick-Tip Q & A

A:[[ According to hpm, my SV1 code gets 30 million cache hits/second.  How
  [[ can I tell if this is really improving its performance?

Disable caching, do another run, and compare.  

The "/etc/cpu" command can be used to disable data and/or instruction
caching for a run of any program (see "man cpu").  For example, if your
executable were named, "a.out", the following SV1 command would run it
with data caching disabled:

  /etc/cpu -m ecdoff ./a.out

You can still use hpm to measure performance:

  hpm /etc/cpu -m ecdoff ./a.out

Q: My MPI code doesn't know until runtime how many messages the PEs
   will be exchanging.  Given this problem, how can I match a receive to
   every send, as required by MPI?

[[ Answers, Questions, and Tips Graciously Accepted ]]

Current Editors:
Ed Kornkven ARSC HPC Specialist ph: 907-450-8669
Kate Hedstrom ARSC Oceanographic Specialist ph: 907-450-8678
Arctic Region Supercomputing Center
University of Alaska Fairbanks
PO Box 756020
Fairbanks AK 99775-6020
E-mail Subscriptions: Archives:
    Back issues of the ASCII e-mail edition of the ARSC T3D/T3E/HPC Users' Newsletter are available by request. Please contact the editors.
Back to Top