| Newsletter Index | Quick-Tip Index | Search Newsletters |
http://www.arsc.edu/misc/jobs.html
Congrats to LJ Evans, Roger Edberg, and Jenn Wagaman who produced the film, and others who contributed. A limited number of copies of the video are available for educational purposes upon request to LJ Evans at, ljevans@arsc.edu.
[ This is the first in a two-part series contributed by Brad Chamberlain of the University of Washington. ]
The languages compared on the T3E were F90+MPI, Co-Array Fortran (CAF), High Performance Fortran (HPF), and ZPL. The benchmark used for the study was the MG benchmark from the NAS Parallel Benchmark Suite (NPB) v2.3. This benchmark uses a multigrid computation to obtain an approximate solution to a scalar Poisson problem on a discrete 3D grid with periodic boundary conditions. The ARSC T3E was used to conduct the test runs.
To this end, the F90+MPI code is the original version written by NAS. The CAF version was written by a member of the CAF team and was developed by simply replacing the MPI calls in the F90+MPI version with equivalent CAF statements or subroutines. The HPF version was written at NASA Ames as part of a project to implement the NAS benchmarks as efficiently as possible in HPF. PGI has identified this as the best implementation of MG for the pghpf compiler that they are aware of. The ZPL version was obtained by making a careful translation of the F90+MPI code into ZPL.
Each implementation of the benchmark was evaluated both in terms of its performance and its expressiveness (i.e., its ability to express the MG computation both cleanly and succinctly). The compilers used on the T3E are summarized here:
language compiler version command-line options
-------- -------- ------- --------------------
F90+MPI f90 3.2.0.1 -O3
CAF f90 3.2.0.1 -O3 -X 1 -Z <nprocs>
HPF pghpf 2.4.4 -O3 -Mautopar -Moverlap=size:1 -Msmp
ZPL zc 1.15a
cc 6.2.0.1 -O3
Note that to achieve portability across numerous platforms, ZPL
compiles to ANSI C and then uses the machine's native C compiler (in
this case cc) to create the executable.
It should also be noted that the implementations made different assumptions about the execution parameters and when they are bound:
Speedup of MG Class A (256 proc time / CAF 4 proc time) --------------------- F90+MPI: 30.91 CAF: 50.65 HPF: 1.01 ZPL: 33.28For Class C, which has 512x512x512 elements at its finest level, a minimum of 16 processors is required to obtain sufficient memory. All speedups are thus computed relative to the fastest 16-processor time (119.20 seconds, obtained by CAF) and a speedup of 16 would be considered ideal.
Speedup of MG Class C (256 proc time / CAF 16 proc time) --------------------- F90+MPI: 10.91 CAF: 14.98 HPF: --.-- ZPL: 11.90Looking at the class C results, these numbers show that CAF achieves the best speedup of ~15. After a sizeable gap, ZPL came in at ~12, and just behind it, the F90+MPI version at ~11. The HPF version was unable to run on 256 processors due to its excessive memory requirements.
The differences in performance between the languages were found to be due largely to three factors (in no particular order):
The summary of these three factors is that CAF's performance advantage is due to its use of Fortran as a base language, its optimized stencil computations, and its use of a lower-level SHMEM-style communication. ZPL suffers primarily from its lack of the stencil optimization and its use of C as a base language. F90+MPI suffers due to its reliance on the MPI interface.
language lines decls comp comm -------- ----- --------- --------- --------- F90+MPI 992 168 (16%) 237 (23%) 587 (59%) CAF 1150 243 (21%) 238 (20%) 669 (58%) HPF 433 129 (29%) 304 (70%) 0 ( 0%) ZPL 192 90 (46%) 102 (53%) 0 ( 0%)The first thing to notice is that the languages break into two general camps: Those that provide a local view of the computation, in which the programmer is writing per-node code, and therefore explicitly responsible for expressing the data layout and interprocessor communication of their programs; and those that provide a global view of the computation, in which the programmer essentially writes a sequential computation and the compiler is responsible for issues of distribution and communication. F90+MPI and CAF both require a local view, while HPF and ZPL provide a global view. The result is that global-view codes are 2-6 times shorter than local-view codes.
As can be seen from the figures above, the length of local-view implementations is primarily due to the large amount of code required to specify communication (~60% for this benchmark). CAF tends to require slightly more code than F90+MPI due to the fact that MPI provides high-level communication mechanisms like reductions. In CAF these operations must be written by hand using its co-array indexing syntax. In contrast, the HPF and ZPL programs require far fewer lines since the programmer can ignore details of distribution and communication. ZPL's computation portion is more succinct than HPF due to its use of "regions" which eliminate the need for looping and indexing.
This difference between global and local view is not merely a matter of how much typing one has to do, but also an issue of complexity. The hardest part of coding most parallel algorithms is the correct specification of data distribution and communication while dealing with boundary conditions, race conditions, deadlock situations, etc. Thus, this difference in line counts represents not merely a large portion of the code, but an extremely intricate one that is distracting from the algorithm at hand.
Naturally, line counts alone are not sufficient to evaluate the expressiveness of the programs, so the code was also scrutinized to get a sense for how clear it was. ZPL was deemed best as a description of the algorithm, as it was not obfuscated by data distribution, communication, looping, indexing, and hand optimizations. The claim is that if you want to understand how the MG benchmark works, the ZPL code will be the easiest to read.
The next issue of the T3E Newsletter will contain code samples from the different implementations.
MPI: http://www-unix.mcs.anl.gov/mpi/index.html
CAF: http://www.co-array.org/
HPF: http://www.crpc.rice.edu/HPFF/home.html
ZPL: http://www.cs.washington.edu/research/zpl/
The 30th International Arctic Workshop: Thursday through Saturday, March 16-18, 2000 Institute of Arctic and Alpine Research University of Colorado, Boulder, CO USA The 30th Arctic Workshop will be held at the Institute of Arctic and Alpine Research, University of Colorado. The meeting will consist of a series of talks and poster sessions covering all aspects of high-latitude environments, past and present. Previous Arctic Workshops have included presentations on arctic and antarctic climate, geomorphology, hydrology, glaciology, soils, ecology, oceanography, and Quaternary history.For registration and details, see:
http://instaar.colorado.edu/AW2000/
http://www.sdynamix.comThe basic problem is:
Find the optimal initial angle for a trajectory to reach a target at 2000 m to within .5 m.From the announcement:
The best entry in each language will be posted on our site to serve as a barometer for those pondering what language to choose for their technical computing. Questions regarding the contest should be directed to with Contest 2000 in the "Subject" line.
A: {{ Is it a good idea to compress files which are to be DMF migrated? }}
If your only goal is to save space on tape, it's a bad idea.
All files are automatically compressed by the tape hardware, so if
you "pre-compress" you'll be wasting CPU time and creating extra
work for yourself. (And, interestingly, compressing a compressed
file can actually increase its size.)
Q: I tried to authenticate using Kerberos/SecurID and got this message:
kerberos skew too great
What does this mean?
[ Answers, questions, and tips graciously accepted. ]
Contact:
Donald Bahls ARSC User Consultant ph: 907-450-8674 Ed Kornkven ARSC HPC Specialist ph: 907-450-8669 Arctic Region Supercomputing Center University of Alaska Fairbanks PO Box 756020 Fairbanks AK 99775-6020
Send comments and questions to the current editors using this Contact Form.E-mail Subscriptions:
| Newsletter Index | Quick-Tip Index | Search Newsletters |
Arctic Region Supercomputing Center
PO Box 756020, Fairbanks, AK 99775 | voice: 907-450-8600 | email:
home | search | about | support | news | science | resources