ARSC T3E Users' Newsletter 153, October 16, 1998
SC98 Attenders...
If you're attending SC98, please tell us about your:
- Poster,
- Booth,
- Talk,
- Panel Session,
- BOF, SIG, MIG, etc.
We're preparing a special issue to promote your work and to help T3E users orient to one another in Orlando.
Co-array Fortran
[ Many thanks to John Reid who provided this introduction to
Co-array Fortran. As discussed in
the previous issue
, Co-array
Fortran is available in CF90.3.1 for the T3E and documentation is
both online and in "man" pages. ]
Co-Array Fortran
Bob Numrich, Silicon Graphics, Inc. (formerly Cray Research), USA John Reid, Rutherford Appleton Laboratory, UK Alan Wallcraft, Naval Research Laboratory, USA
Abstract
Co-array Fortran is a simple parallel extension to Fortran 90/95, based on a second set of subscripts to address different processes. A subset has been implemented in Fortran 3.1 on the T3E. We introduce this subset and provide examples to illustrate how clear, powerful, and flexible it can be. For more about Co-Array Fortran, see: http://www.co-array.org
Basics
Co-Array Fortran uses an SPMD (Single Program, Multiple Data) model. A single program is replicated to a number of 'images', each with its own set of local variables. All images start by executing the main program and mostly execute asynchronously. The number of images may be set at load time and is always fixed during execution. It is available from the intrinsic function num_images(). Images have indices 1, 2, ..., num_images() and co-subscripts are mapped to image indices by the usual rule. The index of the executing image is available from the intrinsic function this_image(). Note that the indices start at 1 rather than 0.
Variables declared as co-arrays are accessible through second set of array subscripts, delimited by square brackets.
Simple examples of co-array syntax
real :: r[*], s[0:*], x(n)[*] ! Co-arrays always have assumed
type(u) :: u2(m,n)[np,*] ! co-size (equal to number of images)
real :: t ! Local variables
integer p, q, index(n)
:
t = s[p] ! Copy to local variable from image p
x(:) = x(:)[p] ! Reference without [] is to local part
x(i)[p] = s[index(i)]
u2(i,j)%b(:) = u2(i,j)[p,q]%b(:)
A non-trivial example: redistribution
Consider redistributing the array a(1:kx,1:ky)[1:kz] from b(1:ky,1:kz)[1:kx], where max(kx,kz) <= num_images().
iz = this_image()
if (iz<=kz) then ! If construct needed so that no action
do ix = 1, kx ! is taken on images with no data.
a(ix,:) = b(:,iz)[ix]
end do
end if
Implementation model
The rules are designed to ensure that a co-array occupies the same set of addresses within each image. Therefore, a co-array must have the same set of bounds on all images and there is an implicit synchronization of all images at an allocate or deallocate statement so that they all perform their allocations and deallocations in the same order. This is the only dynamic form. Automatic co-arrays or co-array-valued functions would require automatic synchronization, which would be awkward.
Synchronization
The images normally execute asynchronously. Most of the time, the compiler can optimize as if the image is on its own. If one image relies on another image having taken an action, explicit synchronization is needed.
For example, suppose we wish to set the variable great to the largest value of the co-array a on all images:
great = maxval(a(:)) ! local max.
call sync_images ! Wait for all images to have local max.
if(this_image()==1)then ! Use image 1 to find global max.
do i = 2, num_images()
great = max(great,great[i])
end do
end if
call sync_images ! Wait for image 1 to have finished,
great = great[1] ! then get the result
Sync_images with no arguments is a global barrier. It can also synchronize a subset of images when called with an argument. However, sync_images has been replaced with two new intrinsics, sync_all and sync_team, in the full language and sync_team (available as a module for CF90) is a better way to synchronize a subset of images.
Procedures
A co-array subobject including co-subscripts in [ ] is permitted only in an intrinsic operation, intrinsic assignment, or I/O list. However, a subobject of a co-array without [ ] or % may be passed to a co-array. The ordinary rules of Fortran 90/95 apply to the local part and the co-array part is defined afresh. Any subscript expressions must have the same value on all images executing the call.
Structure components and pointers
A co-array may be of a derived type with pointer components, which allows the size to vary from image to image. It is limited to allocatable behaviour. Co-Arrays are symmetric (SHMEM safe) objects, but the pointer components are not.
Input/output
Co-Array Fortran I/O is not in the CF90 subset, so I/O is similar to that with SHMEM or MPI. Each image has its own set of independent I/O units and a file may be opened on one image when it is already open on another (although an assign statement may be needed for this to work). Units identified by * have a single file position on all images.
SHMEM/MPI/PVM
Co-Arrays can interoperate with SHMEM, MPI and PVM on the T3E. Co-arrays are currently limited to 64-bit objects (no REAL*4 or COMPLEX co-arrays), but it is very easy to replace a SHMEM call with a co-array assignment:
common/buffer/ b1(100),b2(100)
call shmem_barrier_all
call shmem_get(b2,b1,100,ipe)
becomes:
common/buffer/ b1(100)[0:*],b2(100)
call sync_images
b2(:) = b1(:)[ipe]
Note that adding [ ] to the declaration will not break any existing code, because b1 without square brackets always refers to the local part of the co-array. This allows gradual conversion of programs to Co-Array Fortran.
Final remarks
Co-array Fortran provides a very clear way to express data access between processes. Any reference to a co-array without [ ] is a reference to the local part. We expect most of the code to be like this. [ ] acts as a flag to the reader that communication is taking place. Only in a procedure call can communication take place without [ ]. Preliminary results are very encouraging, indicating that it allows support of large numbers of processes without degradation.
AAAS 49th Arctic Science Conference and IARC Inauguration
If you haven't visited ARSC lately, the dramatic new conchiform building across the street is nearing completion. It will be home to the International Arctic Research Center, or "IARC."
The inauguration of the IARC occurs as part of the Arctic Science Conference of the AAAS which will be held at UAF on October 25-28.
Twenty ARSC users and two ARSC staff members are contributing to posters in the conference poster sessions (I'm sorry if I missed anyone). Be sure to look for them:
Peter Delamere Roger Edberg Doug Goering Larry Hinzman Shusun Li Wei Li Elizabeth Lilly Mike Lilly Wieslaw Maslowski Julie McClean Anthony David McGuire Don Morton Guy Robinson Albert Semtner Knut Stamnes Hiroshi Tanaka Xiaozhen Xiong Yuxia Zhang Ziya Zhang Qianlai Zhuang
Here's more on the IARC from the introduction to the conference program:
The International Arctic Researcher Center (IARC) has been established on the campus of the University of Alaska Fairbanks to serve as a focal point for global change research in the Arctic under the auspices of the governments of the United States and Japan. It is one of the projects under the "Joint Statement of the Japan-United States Framework for a New Economic Partnership," more commonly called the "Common Agenda." The two countries have agreed to cooperate on several specific issues, including global change research.
The primary missions of the IARC are to synthesize existing and new information with creative approaches in understanding the Arctic as a system and to serve as an interface and facilitator for international cooperative research. It is our hope that results from our research will become one of the most crucial factors in determining accurately the rate of allowable release of carbon dioxide. This is because it is generally agreed that that the Arctic is the region of the earth where the effects of global warming will most prominently manifest themselves.
And, as noted in issue #149 , here's the web address for more on the conference:
http://www.gi.alaska.edu/aaas/index.html
ZPL Pasta
I found the following link while snooping around the ZPL pages (ZPL is a parallelizing compiler developed at the University of Washington--see newsletter #122 ):
http://www.cs.washington.edu/research/zpl/misc/pasta.html
It points to some intriguing pasta recipes.
Note: the link is to "pasta"--NOT "spaghetti"--and it's clearly stated, the pasta should be home-made. (So, I suppose, if you're writing spaghetti code, at least make it good!) They're accepting recipes, but I guess Tom's "egg-drop Ramen surprise" would be disqualified. Oh well...
Quick-Tip Q & A
A: {{ The "hpm" tool for Cray PVP platforms makes it easy to measure
my vector code's overall performance. How can I get my T3E
code's MFLOP/S rating? Is there a comparable tool? }}
Use "PAT," the "performance analysis tool," which, like "hpm" on the
vector platforms, accesses the hardware performance counters.
There are three steps:
1) Recompile your code with libpat.a and the pat.cld directives file.
E.g.:
f90 -l pat pat.cld -o test test.F
--or--
cc -l pat pat.cld -o test test.c
2) Run the executable. E.g.:
mpprun -n 13 ./test
This will produce a "PDF" file. It will be named "pdf.<NNNNN>",
where <NNNNN> is some fairly arbitrary integer. E.g.: "pdf.31388".
3) Use "pat" to display the performance counter information. The syntax
is "pat -m <EXECUTABLE FILE> <PDF FILE>". E.g.:
pat -m test pdf.31388
This will show MFLOP/S achieved (among other things).
Here's an example...
yukon$ cc -l pat pat.cld -o circle.pat.exe mc.c circle.c
mc.c:
circle.c:
yukon$ mpprun -n2 ./circle.pat.exe 1000 2 1
[0] 1000 1126.165148: avg: 1.126165148
[1] 2000 2257.458945: avg: 1.128729473
yukon$ ls pdf*
pdf.31388
yukon$ pat -m ./circle.pat.exe pdf.31388
Performance counters for FpOps
Values given are in MILLIONS.
PE(id) cycles operations ops/sec dcache misses/sec
misses
0( 0) 4414.88 969.62 98.84 0.04 0.00
1( 0) 4436.35 969.80 98.38 0.28 0.03
For more information, execute "man pat" on the T3E or read the PAT
tutorial in issue #128:
/arsc/support/news/t3enews/t3enews128/index.xml
Q: Why does Fortran array indexing start at 1 while C starts at 0?
[ Answers, questions, and tips graciously accepted. ]
Current Editors:
E-mail Subscriptions:
Ed Kornkven ARSC HPC Specialist ph: 907-450-8669 Kate Hedstrom ARSC Oceanographic Specialist ph: 907-450-8678 Arctic Region Supercomputing Center University of Alaska Fairbanks PO Box 756020 Fairbanks AK 99775-6020
-
Subscribe to (or unsubscribe from) the e-mail edition of the
ARSC HPC Users' Newsletter.
-
Back issues of the ASCII e-mail edition of the ARSC T3D/T3E/HPC Users' Newsletter are available by request. Please contact the editors.
