ARSC T3E Users' Newsletter 118, April 25, 1997
ARSC Completes its First T3E Training Course
The two editors of this newsletter conducted a four-day introductory seminar on parallel programming and the CRAY T3D/T3E environment this week. The seminar took place here at ARSC.
We anticipate that one attendee, who runs a simplified model of ocean thermodynamics, will begin porting and parallelizing almost immediately. He had been confounded in attempts to write an efficient vectorizable version of the code, but experienced an immediate (though slight) speedup running on one T3D PE, relative to his HP-Unix desktop workstation. Other attendees had longer term needs or academic interest in MPP.
"Right-Hand Rule" in Fortran 90
As mentioned in newsletter 102 there is no Fortran77 on the T3E, only Fortran90. One of the features of Fortran90 is array syntax and it might be tempting to replace all do loops with the array syntax versions.
Users should beware, however, that Fortran90 uses a right hand evaluation rule, i.e. the entire right hand side vector expression is evaluated before storage in the left hand side vector. If any reference is made to the variable being assigned, unexpected results will occur. This is illustrated in the code below.
cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc
program prog2
! test right hand rule.
integer, parameter :: nsize=8
integer, dimension(nsize):: a, b
do i=1,nsize
a(i)=i
b(i)=i
enddo
! using do loop
noff=3
nwork=5
do i=1,nwork
a(i+noff-1)=-a(i)
enddo
! using array syntax
b(noff:nwork+noff-1) = -b(1:nwork)
! display results
do i=1,nsize
write(6,*) ' data ',i,' is ',a(i),b(i)
enddo
do i=1,nsize
if(b(I).ne.a(I)) then
write(6,*) ' entry ',i,' not equal ',a(I),b(I)
endif
enddo
stop
end
cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc
Here output from a run of this program:
yukon% f90 -o prog2 prog2.f
yukon% ./prog2
data 1 is 2*1
data 2 is 2*2
data 3 is 2*-1
data 4 is 2*-2
data 5 is 1, -3
data 6 is 2, -4
data 7 is -1, -5
data 8 is 2*8
entry 5 not equal 1, -3
entry 6 not equal 2, -4
entry 7 not equal -1, -5
STOP (PE 0) executed at line 44 in Fortran routine 'PROG2'
yukon%
This rule is included to allow for optimisations. By using an array syntax we have informed the compiler that the operations can be carried out in any order. A DO loop should be used if a specific update order is required.
A good reference on fortran90/95 is:
Fortran90/95 explained, Metcalf and Reid, Oxford Science Publications. ISBN 0-19-8518888-9
Announcement: Third European SGI/Cray MPP Workshop
[ Here is a drastically shortened version of the announcement we received this week. See "more information" if you're interested.]
> > Third European SGI/Cray MPP Workshop > > > Important dates: > ----------------- > >> May, 12, 1997 - Deadline for receipt of contributed abstract << > June, 09, 1997 - Notification of acceptance of papers > August, 11, 1997 - Deadline for receipt final papers > August, 25, 1997 - Deadline for registration > September, 11, 1997 - Third European SGI/Cray MPP Workshop > > > Topics of interest will include : > ----------------------------------- > * Support Tools and Environments > * Parallel Debugging > * Parallel Languages > * Automatic Parallelization and High-Performance Compilers > * Programming Models and Methods > * Parallel Numerical Algorithms > * Scheduling and load balancing > * Performance Evaluation and Prediction > * Parallel operating systems > > More Information : > ----------------- > For more information about the Third European SGI/Cray MPP Workshop > please contact : > > mailto : workshop@armoise.saclay.cea.fr > http://www.cea.fr/workshop/ >
grmview
As mentioned in the previous newsletter, mppview on the T3E can tell you what jobs are running, and give a little info on the system configuration.
grmview provides similar information with greater depth and detail. Here's an example from the ARSC T3E:
yukon$ grmview
PE Map: 96 PEs configured
Ap. Size Number Aps. <<<< Lists >>>>>
Type PE min max running limit Label svc uid gid acid
+ APP 0 2 84 1 1 - - - - -
79 identical PEs skipped
+ APP 80 2 84 0 1 - - - - -
3 identical PEs skipped
+ CMD 84 1 1 1 unlim - - - - -
+ CMD 85 1 1 0 unlim - - - - -
5 identical PEs skipped
+ OS 91 0 0 0 0 - - - - -
+ OS 92 0 0 0 0 - - - - -
+ CMD 93 1 1 0 unlim - - - - -
+ CMD 94 1 1 0 unlim - - - - -
+ OS 95 0 0 0 0 - - - - -
In the above portion of grmview's output, every PE is classified as an application PE, command PE, or operating system PE. Other fields tell whether the PE is in use, and if so, by how many processes. Note that APP PEs have a process limit of 1, just like the T3D PEs. CMD and OS PEs, on the other hand, are multitasked, with no limit to the number of processes, just like the PVP front-end used by the T3D.
Exec Queue: 4 entries total. 3 running, 1 queued
uid gid acid Label Size BasePE ApId Command
0 0 0 - 1 84 11bf21001a6 tpdaemon
162 882 882 - 64 16 11af4107f89 a.out.64
299 1302 1302 - 16 0 4db20101106 jjsrt_Vdepth
uid gid acid Label Size Command Reason
1436 206 206 - 45 stst_45 Ap. limit
This second segment of output reports status of jobs in the system.
You can get even more detail about each PE, for instance, its x,y,z position in the torus using the -l option. Here's this info on the first few PEs (some field names shortened and spaces deleted):
yukon$ grmview -l
PE Map: 96 PEs configured
Ap Size Number Aps. <<<< Lists >>>>>
Type PE min max running lim Lbl svc uid gid acid x y z Clk UsrMem FreMem
+ APP 0 2 84 1 1 - - - - - 0 0 0 300 117 59
+ APP 1 2 84 1 1 - - - - - 1 0 0 300 117 59
+ APP 2 2 84 1 1 - - - - - 0 1 0 300 117 59
Here's a bit of the man page on grmview:
DESCRIPTION
The grmview command displays information from the Global Resource
Manager (GRM) about the current PE configuration (PE map),
applications currently running on the PEs, and applications
waiting to run on the PEs.
PE Map Fields
The PE map fields (-m) are as follows:
Type
The PE type specified by the administrator. The first
character is + or -. If the PE is operational, a +
appears; otherwise a - appears. Another line
beginning with ! describes the reason for a
non-operational PE (see Example 3).
PE
The logical PE number.
Ap. Size min
The minimum-size application eligible to run on the
PE.
Ap. Size max
The maximum-size application eligible to run on the
PE.
Number Aps. running
The number of applications having reserved global
resources currently running on the PE.
Number Aps. limit
The maximum number of applications having reserved
global resources allowed to be simultaneously
allocated to the PE.
Label
The label assigned to the PE by the administrator. A
- means no label has been assigned.
svc
If a service list exists, yes appears. At present
there is no option to view the list.
uid
If a user ID list exists, yes appears. At present
there is no option to view the list.
gid
If a group ID list exists, yes appears. At present
there is no option to view the list.
acid
If an account ID list exists, yes appears. At present
there is no option to view the list.
The extended PE map (-l) display fields include all the PE map fields
plus the following fields appended to the end of the line:
x
The physical x-coordinate of the PE.
y
The physical y-coordinate of the PE.
z
The physical z-coordinate of the PE.
Clock
The clock speed of the PE (in Mhz).
UsrMem
Available user memory (in Mbytes).
FreMem
Current free memory (in Mbytes).
The PE map display fields for non-operational PEs (indicated by a line
beginning with !) are as follows:
PM Reg
If the PM is not registered (NO), it is impossible to run
applications on the PE.
Net Avail
If network routing is not functioning (NO), the PE has no
communication with the rest of the machine.
T3D/T3E Differences
- Some system header files moved ( Newsletter #117 ).
- mppview differences ( #117 ).
- no f77 on T3E (#118)
Quick-Tip Q & A
A: {{ How can you capture a "man" page without getting all the
formatting characters? Say you wanted ASCII text for a
newsletter. }}
# Several responses to this one... Thanks!
# Use the command, "col -bx", e.g.:
man col
col -bx
mailx -s "news" everyone_that_ever_loved_me.list
Q: Can you tell NQS to not restart your job from the beginning after
a failed checkpoint at shutdown or after a crash? You might want
to do this if restarting the job would overwrite work already
completed.
[ Answers, questions, and tips graciously accepted. ]
Current Editors:
E-mail Subscriptions:
Ed Kornkven ARSC HPC Specialist ph: 907-450-8669 Kate Hedstrom ARSC Oceanographic Specialist ph: 907-450-8678 Arctic Region Supercomputing Center University of Alaska Fairbanks PO Box 756020 Fairbanks AK 99775-6020
-
Subscribe to (or unsubscribe from) the e-mail edition of the
ARSC HPC Users' Newsletter.
-
Back issues of the ASCII e-mail edition of the ARSC T3D/T3E/HPC Users' Newsletter are available by request. Please contact the editors.
