ARSC T3E Users' Newsletter 153, October 16, 1998

SC98 Attenders...

If you're attending SC98, please tell us about your:

  • Poster,
  • Booth,
  • Talk,
  • Panel Session,
  • BOF, SIG, MIG, etc.

We're preparing a special issue to promote your work and to help T3E users orient to one another in Orlando.

Co-array Fortran


  [ Many thanks to John Reid who provided this introduction to
    Co-array Fortran.  As discussed in 
 the previous issue
, Co-array
    Fortran is available in CF90.3.1 for the T3E and documentation is
    both online and in "man" pages. ] 

Co-Array Fortran

Bob Numrich, Silicon Graphics, Inc. (formerly Cray Research), USA John Reid, Rutherford Appleton Laboratory, UK Alan Wallcraft, Naval Research Laboratory, USA

Abstract

Co-array Fortran is a simple parallel extension to Fortran 90/95, based on a second set of subscripts to address different processes. A subset has been implemented in Fortran 3.1 on the T3E. We introduce this subset and provide examples to illustrate how clear, powerful, and flexible it can be. For more about Co-Array Fortran, see: http://www.co-array.org

Basics

Co-Array Fortran uses an SPMD (Single Program, Multiple Data) model. A single program is replicated to a number of 'images', each with its own set of local variables. All images start by executing the main program and mostly execute asynchronously. The number of images may be set at load time and is always fixed during execution. It is available from the intrinsic function num_images(). Images have indices 1, 2, ..., num_images() and co-subscripts are mapped to image indices by the usual rule. The index of the executing image is available from the intrinsic function this_image(). Note that the indices start at 1 rather than 0.

Variables declared as co-arrays are accessible through second set of array subscripts, delimited by square brackets.

Simple examples of co-array syntax


   real :: r[*], s[0:*], x(n)[*]  ! Co-arrays always have assumed 
   type(u) :: u2(m,n)[np,*]       ! co-size (equal to number of images)
   real :: t   !  Local variables
   integer p, q, index(n)
     :
   t = s[p]       ! Copy to local variable from image p
   x(:) = x(:)[p] ! Reference without [] is to local part
   x(i)[p] = s[index(i)] 
   u2(i,j)%b(:) = u2(i,j)[p,q]%b(:)

A non-trivial example: redistribution

Consider redistributing the array a(1:kx,1:ky)[1:kz] from b(1:ky,1:kz)[1:kx], where max(kx,kz) <= num_images().


 
  iz = this_image() 
  if (iz<=kz) then   ! If construct needed so that no action
     do ix = 1, kx   ! is taken on images with no data.
        a(ix,:) = b(:,iz)[ix]
     end do
  end if

Implementation model

The rules are designed to ensure that a co-array occupies the same set of addresses within each image. Therefore, a co-array must have the same set of bounds on all images and there is an implicit synchronization of all images at an allocate or deallocate statement so that they all perform their allocations and deallocations in the same order. This is the only dynamic form. Automatic co-arrays or co-array-valued functions would require automatic synchronization, which would be awkward.

Synchronization

The images normally execute asynchronously. Most of the time, the compiler can optimize as if the image is on its own. If one image relies on another image having taken an action, explicit synchronization is needed.

For example, suppose we wish to set the variable great to the largest value of the co-array a on all images:


   great = maxval(a(:)) ! local max.
   call sync_images ! Wait for all images to have local max.
   if(this_image()==1)then       ! Use image 1 to find global max. 
      do i = 2, num_images()
         great = max(great,great[i])
      end do
   end if
   call sync_images ! Wait for image 1 to have finished, 
   great = great[1] ! then get the result 

Sync_images with no arguments is a global barrier. It can also synchronize a subset of images when called with an argument. However, sync_images has been replaced with two new intrinsics, sync_all and sync_team, in the full language and sync_team (available as a module for CF90) is a better way to synchronize a subset of images.

Procedures

A co-array subobject including co-subscripts in [ ] is permitted only in an intrinsic operation, intrinsic assignment, or I/O list. However, a subobject of a co-array without [ ] or % may be passed to a co-array. The ordinary rules of Fortran 90/95 apply to the local part and the co-array part is defined afresh. Any subscript expressions must have the same value on all images executing the call.

Structure components and pointers

A co-array may be of a derived type with pointer components, which allows the size to vary from image to image. It is limited to allocatable behaviour. Co-Arrays are symmetric (SHMEM safe) objects, but the pointer components are not.

Input/output

Co-Array Fortran I/O is not in the CF90 subset, so I/O is similar to that with SHMEM or MPI. Each image has its own set of independent I/O units and a file may be opened on one image when it is already open on another (although an assign statement may be needed for this to work). Units identified by * have a single file position on all images.

SHMEM/MPI/PVM

Co-Arrays can interoperate with SHMEM, MPI and PVM on the T3E. Co-arrays are currently limited to 64-bit objects (no REAL*4 or COMPLEX co-arrays), but it is very easy to replace a SHMEM call with a co-array assignment:


    common/buffer/ b1(100),b2(100)
    call shmem_barrier_all
    call shmem_get(b2,b1,100,ipe)

becomes:


    common/buffer/ b1(100)[0:*],b2(100)
    call sync_images
    b2(:) = b1(:)[ipe]

Note that adding [ ] to the declaration will not break any existing code, because b1 without square brackets always refers to the local part of the co-array. This allows gradual conversion of programs to Co-Array Fortran.

Final remarks

Co-array Fortran provides a very clear way to express data access between processes. Any reference to a co-array without [ ] is a reference to the local part. We expect most of the code to be like this. [ ] acts as a flag to the reader that communication is taking place. Only in a procedure call can communication take place without [ ]. Preliminary results are very encouraging, indicating that it allows support of large numbers of processes without degradation.

AAAS 49th Arctic Science Conference and IARC Inauguration

If you haven't visited ARSC lately, the dramatic new conchiform building across the street is nearing completion. It will be home to the International Arctic Research Center, or "IARC."

The inauguration of the IARC occurs as part of the Arctic Science Conference of the AAAS which will be held at UAF on October 25-28.

Twenty ARSC users and two ARSC staff members are contributing to posters in the conference poster sessions (I'm sorry if I missed anyone). Be sure to look for them:


  Peter Delamere                  Roger Edberg                    
  Doug Goering                    Larry Hinzman                   
  Shusun Li                       Wei Li                          
  Elizabeth Lilly                 Mike Lilly                      
  Wieslaw Maslowski               Julie McClean                   
  Anthony David McGuire           Don Morton                      
  Guy Robinson                    Albert Semtner                  
  Knut Stamnes                    Hiroshi Tanaka                  
  Xiaozhen Xiong                  Yuxia Zhang                     
  Ziya Zhang                      Qianlai Zhuang                  

Here's more on the IARC from the introduction to the conference program:

The International Arctic Researcher Center (IARC) has been established on the campus of the University of Alaska Fairbanks to serve as a focal point for global change research in the Arctic under the auspices of the governments of the United States and Japan. It is one of the projects under the "Joint Statement of the Japan-United States Framework for a New Economic Partnership," more commonly called the "Common Agenda." The two countries have agreed to cooperate on several specific issues, including global change research.

The primary missions of the IARC are to synthesize existing and new information with creative approaches in understanding the Arctic as a system and to serve as an interface and facilitator for international cooperative research. It is our hope that results from our research will become one of the most crucial factors in determining accurately the rate of allowable release of carbon dioxide. This is because it is generally agreed that that the Arctic is the region of the earth where the effects of global warming will most prominently manifest themselves.

And, as noted in issue #149 , here's the web address for more on the conference:

http://www.gi.alaska.edu/aaas/index.html

ZPL Pasta

I found the following link while snooping around the ZPL pages (ZPL is a parallelizing compiler developed at the University of Washington--see newsletter #122 ):

http://www.cs.washington.edu/research/zpl/misc/pasta.html

It points to some intriguing pasta recipes.

Note: the link is to "pasta"--NOT "spaghetti"--and it's clearly stated, the pasta should be home-made. (So, I suppose, if you're writing spaghetti code, at least make it good!) They're accepting recipes, but I guess Tom's "egg-drop Ramen surprise" would be disqualified. Oh well...

Quick-Tip Q & A



A: {{ The &quot;hpm&quot; tool for Cray PVP platforms makes it easy to measure
      my vector code's overall performance.  How can I get my T3E
      code's MFLOP/S rating?  Is there a comparable tool? }}

Use &quot;PAT,&quot; the &quot;performance analysis tool,&quot;  which, like &quot;hpm&quot; on the
vector platforms, accesses the hardware performance counters.

There are three steps:

1) Recompile your code with libpat.a and the pat.cld directives file.
   E.g.:

     f90 -l pat pat.cld -o test test.F 

       --or--

     cc -l pat pat.cld -o test test.c 


2) Run the executable. E.g.:

     mpprun -n 13 ./test

   This will produce a &quot;PDF&quot; file.  It will be named &quot;pdf.<NNNNN>&quot;,
   where <NNNNN> is some fairly arbitrary integer.  E.g.: &quot;pdf.31388&quot;.


3) Use &quot;pat&quot; to display the performance counter information. The syntax 
   is &quot;pat -m <EXECUTABLE FILE> <PDF FILE>&quot;. E.g.:

     pat -m test pdf.31388

   This will show MFLOP/S achieved (among other things). 


Here's an example... 


  yukon$ cc -l pat pat.cld -o circle.pat.exe mc.c circle.c 
  mc.c:
  circle.c:
  yukon$ mpprun -n2 ./circle.pat.exe 1000 2 1
  [0] 1000 1126.165148: avg:  1.126165148
  [1] 2000 2257.458945: avg:  1.128729473
  yukon$ ls pdf*
  pdf.31388
  yukon$ pat -m ./circle.pat.exe pdf.31388

                 Performance counters for FpOps
   
      Values given are in MILLIONS.
   
     PE(id)   cycles    operations    ops/sec    dcache  misses/sec
                                                 misses

     0(   0)  4414.88      969.62      98.84      0.04      0.00
     1(   0)  4436.35      969.80      98.38      0.28      0.03

For more information, execute &quot;man pat&quot; on the T3E or read the PAT
tutorial in issue #128:

/arsc/support/news/t3enews/t3enews128/index.xml





Q: Why does Fortran array indexing start at 1 while C starts at 0?

[ Answers, questions, and tips graciously accepted. ]


Current Editors:
Ed Kornkven ARSC HPC Specialist ph: 907-450-8669
Kate Hedstrom ARSC Oceanographic Specialist ph: 907-450-8678
Arctic Region Supercomputing Center
University of Alaska Fairbanks
PO Box 756020
Fairbanks AK 99775-6020
E-mail Subscriptions: Archives:
    Back issues of the ASCII e-mail edition of the ARSC T3D/T3E/HPC Users' Newsletter are available by request. Please contact the editors.
Back to Top