ARSC T3E Users' Newsletter 122, July 25, 1997


ZPL: New Parallel Language Available

[ This article contributed by Brad Chamberlain of the ZPL Group at the University of Washington. Brad and his colleagues have been quite active on ARSC's MPP platforms, and we wish to congratulate them on this release. ]

ZPL is a portable array-based data parallel language that has been developed at the University of Washington over the past several years.

The official release of ZPL was announced earlier this month, making the ZPL compiler and runtime libraries publicly available for the first time. It runs on a number of parallel and sequential platforms, including the Cray T3D and T3E. Most of ZPL's Cray implementation was done at ARSC, and ARSC users are encouraged to try the ZPL installations on denali and yukon. See the information at the end of this article.

The goal of the ZPL group is to provide an elegant, high-level programming language without sacrificing performance, scalability, or portability. This is achieved by supplying users with a number of intuitive high-level operators and constructs that make it easy to evaluate how well programs will run in parallel. These features allow programmers to code at a natural level and to qualitatively compare the performance of different implementation choices. The language also has the benefit of unambiguously identifying parallel operations to the compiler, allowing it to make informed optimization decisions.

Although most programmers don't relish the idea of learning a new language, those with scientific computing experience typically become comfortable with ZPL's Modula-based syntax within a few hours, and find that it provides a very expressive means for coding their applications. During its development, ZPL has been used for applications in Astronomy, Biology, Civil Engineering, Statistics, and Molecular Dynamics.

The following code fragment is taken from a ZPL program that implements a simple 4-point Jacobi iteration. Some declarations and initializations are omitted for the sake of brevity -- see the ZPL web page for the full example.

       region R = [1..N,1..N];      -- declare the problem size
       var A,Temp:[R] double;       -- declare two double arrays
       direction north = [-1, 0];   -- declare vector offsets
                 south = [ 1, 0];
                 east  = [ 0,-1];
                 west  = [ 0, 1];

       [R] repeat
               Temp := (A@north + A@south + A@east + A@west)/4;
               diff := max<< fabs(A-Temp);
               A := Temp;
           until (diff < epsilon);

The first lines declare a region R that describes the problem size, two arrays of doubles whose sizes are defined by R, and four offset vectors. The program then repeatedly computes over the region, replacing each array value with the average of its four nearest neighbors. When the largest change in values is less than the constant epsilon, the computation ends.

ZPL achieves its portability by compiling programs to an ANSI C representation. This code is then compiled using the target machine's native C compiler and linked with machine-specific runtime libraries. Porting ZPL to a new platform therefore consists of implementing the runtime libraries for that machine. At this point, ZPL has been ported to Cray's SHMEM interface as well as the Intel Paragon, the IBM SP2, the SGI Power Challenge, sequential UNIX workstations, and platforms supporting the PVM 3.3 and MPI 1.1 standards. In addition, a multithreaded implementation of ZPL is currently in the works.

ZPL's performance has been demonstrated to be competitive with hand-coded parallel C programs, yet ZPL programs are typically much easier to write and maintain. To aid in this effort, users can develop and debug their ZPL programs on sequential workstations and then run them on a parallel platform with just a single recompilation. Since ZPL is compiled to C as an intermediate form, sequential C and Fortran routines can be linked in and called effortlessly. ZPL generally outperforms HPF, and has the advantage of having a directive-free syntax and an unambiguous performance model.

ZPL achieves its good performance due both to machine-independent optimizations and its sensitivity to specific machine characteristics. Many of its machine-independent optimizations are devoted to reducing the costs of interprocessor communication by eliminating redundant communications, combining compatible communications into a single message, and overlapping communication and computation. A paper evaluating the effects of these optimizations in ZPL will be presented at ICPP this August.

Another of ZPL's most important optimizations is its aggressive use of array contraction -- a transformation that reduces the memory requirements of the temporary array variables that are often involved when programming in array languages. This benefits a program's temporal and spatial locality, reducing contention in the cache.

ZPL's sensitivity to machine-specific characteristics is achieved using the Ironman principle, which is the topic of a paper to be presented at LCPC this August. The Ironman philosophy was actually developed at ARSC out of necessity when ZPL was first being ported to denali in 1994. At that time, the ZPL compiler implemented all interprocessor communication in terms of a simple Send/Receive message passing model. When the port to denali began, its implementors realized that to get the best performance possible, they would need to use Cray's SHMEM routines to implement the communication routines. However, due to the paradigm skew between message passing and an optimal usage of SHMEM, the implementation could not be done without significant performance degradation.

This led to the development of Ironman, a scheme in which the compiler no longer relies on a specific data transfer paradigm. Instead, it annotates its output code with calls that indicate where data transfer is legal and may be required. On each platform, these calls are either ignored or implemented in terms of the machine's optimal data transfer paradigm, allowing the compiler to achieve good communication performance without making assumptions about the target machine. This allowed the ZPL compiler to make good use of the SHMEM library, and led to improvements in execution time of up to 65% for pure communication-oriented microbenchmarks and up to 14% in benchmark applications (as compared with communication done using PVM or MPI).

Users are invited to visit the ZPL home page at:

to learn more about the language, to find out how to use the installations at ARSC, or to install it on their personal workstation. Further questions about the language should be directed to:

The ZPL group would like to thank ARSC for the use of their resources, and for their continually helpful technical support.


Dr. Ramesh Agarwal to Lecture at UAF

ARSC, UAF (The University of Alaska, Fairbanks) , and the IEEE Electron Devices Society are hosting Dr. Ramesh Agarwal for the following series of three seminars, to be held in Fairbanks on August 13th and 14th:

  • "Towards Teraflop Architectures, Algorithms and Applications"
  • "General Aviation - Past, Present, and Future"
  • "Application of CFD Based Approach to Electromagnetics and Semiconductor Device Simulation"

Dr. Agarwal is the Bloomfield Distinguished Professor in the Department of Aerospace Engineering and Executive Director of the National Institute for Aviation Research at Wichita State University. He is also the Director of the Aircraft Design and Manufacturing Research Center and a Senior Fellow at the National Institute for Aviation Research at Wichita State University.

From 1978-1994, he was the Program Director and McDonnell Douglas Fellow at McDonnell Douglas Aerospace in St. Louis, Missouri.

Dr. Agarwal has done pioneering work in Computational Fluid Dynamics, Computational Acoustics, Computational Electromagnetics, and Multidisciplinary Design and Optimization. He has a Ph.D. from Stanford University. He is a Fellow of the American Institute of Aeronautics and Astronautics, American Society of Mechanical Engineers, and the American Association for the Advancement of Science. He has been the recipient of many awards for his technical contributions.

The final schedule will be available from the ARSC web site ( and will be announced in this newsletter. Synopses of Ramesh's three seminars follow:


Towards Teraflop Architectures, Algorithms and Applications

During the past decade, a large number of parallel computers have been commercially built worldwide using SIMD or MIMD (both shared and distributed memory) architectural routes to parallelism. However, they have been limited to a peak performance of at most a few gigaflops.

Very recently, new systems have been built which offer the promise of teraflow scalability to solve the so called "Grand Challenge Problems." The Cray T3E, IBM SP2, Convex Exemplar SPP1000, Intel Paragon XP/S, KSR-2, and CM5 are some of the machines which fall into this category. This lecture will review these architectures and their potential for solving large scale problems.


General Aviation - Past, Present, and Future

A brief history of General Aviation in the U.S. will be presented. Its present status and future prospects will be reviewed. The particular focus of this presentation will be on the recently created government/industry/university partnership to revitalize general aviation known as AGATE (Advanced General Aviation Transport Experiments).

The consortium is designed to accelerate the processes for bringing new technologies from the experimental research stage to fully certified production status. Successes have already been demonstrated in the use of advanced satellite GPS-based "highway in the sky navigation" for air traffic control at the 1996 International Olympic Games, and in demonstrations of greatly simplified single power level control for piston engines.


Application of CFD Based Approach to Electromagnetics and Semiconductor Device Simulation

It is shown that many of the well-known formulations for semi-conductor device simulation, namely the drift-diffusion model, the hydrodynamic model, and the energy transport model which are the moments of the Boltzmann equation for a Maxwellian or non-Maxwellian distribution, can be recast as a set of first-order partial differential equations in conservation law form. As a consequence, the well-developed CFD grid-generation techniques and solution algorithms can be applied for the numerical solution of these equations. The author has pioneered this emerging application of CFD technology to device simulation.


ARSC T3E queue structure adjusted slightly

The m_64pe_4h queue on yukon now starts at 18:00 and stops at 03:00. We are in a gradual process of tuning the queues: please send e-mail to with comments.


MPT Upgrade on ARSC T3E

MPT ("message passing toolkit") version is now available on yukon. It fixes known problems with MPI_CHARACTER and MPI_SHORT.

MPI users are encouraged to try this release.


Switching Modules -- Quick Review

Use the UNICOS module command to add the latest compiler, toolkit, or library release to your programming environment. You must do this manually, as ARSC installs upgrades frequently (for your use) but rarely adds them to the default environment as soon as installed.

The following statement changes the MPT software accessed when you execute the f90 command from mpt (the default) to mpt. module switch mpt mpt.

  To restore the default version:
    "module switch mpt. mpt". 
  To use only the latest version of all PE components:
    "module switch PrgEnv".
  To determine which versions of all packages you are currently using:
    "module list".
  To list all versions available at ARSC, and determine which versions
  are defined as defaults, ARSC provides the script:

Here is part of the output of PEvers, as run on July 25 on yukon:

  yukon$ /usr/local/bin/PEvers

  The following Programming Environment Packages are installed:
  The current default version is //opt/ctl/cf90/
  The current default version is

As shown, the MPT default points to version To maintain a stable default computing environment, ARSC does not adjust the defaults frequently. However, if you wish, we encourage you to test the latest and greatest versions, which is a simple matter using the module command.

For more information, see:


HPF_CRAFT included in PGHPF 2.3 for the T3E

[ NAS Parallel Benchmark results for PGHPF available at:]

The following is a portion of the PGI press release from:
        Cray Research and The Portland Group (PGI) Release
 HPF_CRAFT Programming Model As Part Of New PGHPF™ 2.3 Compiler

Eagan, MN, July 15, 1997 - Cray Research and The Portland Group (PGI)
announced today that HPF_CRAFT will be included in the new PGHPF 2.3
compiler version designed for the CRAY T3E line of supercomputers.

HPF_CRAFT is an open, standards-based, parallel-programming model that
provides more sophisticated capabilities for the CRAY T3E system. It
builds on CRAFT, an earlier proprietary programming model that was used
on the CRAY T3D supercomputer, the predecessor to the CRAY T3E series.
HPF_CRAFT allows programmers to take advantage of the unique
scalability of the CRAY T3E system, a massively parallel supercomputer
which scales to thousands of processors.

As part of a development agreement, Cray Research and PGI defined
HPF_CRAFT and secured its acceptance as a recognized extension of the
industry-standard High-Performance Fortran (HPF-2) language. This
approach delivers the unique features of CRAFT within the context of a
portable, standard language.


Quick-Tip Q & A

A: {{ When "pasting" into a vi window, why might you get
        an additional
                indentation on each 
                        subsequent line 
                                (like this)?  }}

  # Thanks to several people who responded. Here is one response:

    You may have set "autoindent". 
      You can disable it with ":set noai"
      You can re-set it with ":set ai".

  # Personally, I like vi's autoindent feature (except when pasting
  # text).  I have defined two maps in my .exrc file that let me switch
  # it on and off using '=' and '_':

      map _ :set noautoindent ^M
      map = :set autoindent ^M

  # (the ^M reflects the RETURN and is created by hitting CTRL-V 
  # and then <ENTER>):
  # Here is another reader response, attacking a more general problem:

    I have seen this problem occuring on the command line due to
    telnetd assuming that certain external processing is being carried
    out. This is sometimes not the case and you have to explicitly
    define this to avoid the indentation problems that are seen. Use the

    stty -extproc

    command in your .profile to resolve this. Mainly see this problem
    in UNICOS windows, not sure if this is relevant if copying between
    other types.

Q: If you have a few heavily used directories and "cd" back and forth
   between them often, what shortcut will save you from typing the
   entire path each time?

   User Armadillov, for instance, wants to simplify the following:

     yukon$   cd /tmp/armadill/meltdown/data/1997_06/1
     yukon$   run_em
     yukon$   cd ~/meltdown/results/1997
     yukon$   store_em
     yukon$   cd /tmp/armadill/meltdown/data/1997_06/2
     yukon$   run_em
     yukon$   cd ~/meltdown/programs/src
     yukon$   make -f em
     yukon$   cd /tmp/armadill/meltdown/data/1997_06/2
     yukon$   run_em ; mailx bonnie < etcetera

[ Answers, questions, and tips graciously accepted. ]

Current Editors:
Ed Kornkven ARSC HPC Specialist ph: 907-450-8669
Kate Hedstrom ARSC Oceanographic Specialist ph: 907-450-8678
Arctic Region Supercomputing Center
University of Alaska Fairbanks
PO Box 756020
Fairbanks AK 99775-6020
E-mail Subscriptions: Archives:
    Back issues of the ASCII e-mail edition of the ARSC T3D/T3E/HPC Users' Newsletter are available by request. Please contact the editors.
Back to Top