ARSC T3D Users' Newsletter 106, September 27, 1996
T3D/T3E Timings
[ One of our T3D users, Dr. Alan Wallcraft of Stennis Space Center, contributes this article. ]
I have a couple of ocean model benchmark codes which Cray ran on both the T3D and T3E. The original used Fortran 77 and PVM message passing. Vendors are allowed to modify the code, but I don't know to what extent Cray did so. One change they made was to use the Fortran 90 compiler and REAL*4 throughout, since ocean models typically run with 32-bit REALs. The benchmarks use the same basic code but at two sizes, (i) a small 1/2 degree global ocean case, and (ii) a much larger 1/16th degree global case. Since the T3E typically has twice as much memory per node as the T3D, Cray ran on half as many nodes on the T3E. This makes direct node for node comparisons trickier, but even assuming only a 1.8x speedup when doubling the number of nodes (which is about the minimum expected for this kind of code) the T3E is about 3.3x faster than the T3D. This is a very good speedup, particularly since the compiler is presumably still better tuned for the T3D than the T3E (i.e. the T3E compiler will probably improve as it matures). Peak T3E hardware performance is 4x the T3D (600 Mflops vs 150 Mflops).
#T3D WALL #T3E WALL SPEED-UP SPEED-UP
NODES TIME NODES TIME WALL TIME PER NODE
OCEANS-02 32 440 16 249 1.77x 3.2x
OCEANS-16 128 1350 64 734 1.84x 3.3x
Barriers Revisited
I have found that my intuition regarding the TEST_BARRIER, SET_BARRIER, and WAIT_BARRIER functions was off (see the next article). I've done some more work and come up with a better understanding.
By way of introduction, what do you think the following program will do if compiled for four PEs and run on the T3D?
ccccccccccccccccccccccccccccccc
program wait1
intrinsic MY_PE
call WAIT_BARRIER ()
print*, MY_PE(), " done."
end
ccccccc
It terminates normally:
denali$ a.out 0 done. 2 done. 3 done. 1 done.
How about this?
ccccccccccccccccccccccccccccccc
program wait2
intrinsic MY_PE
logical TEST_BARRIER
print*, MY_PE(), " Before: TEST_BARRIER()= ", TEST_BARRIER()
if (MY_PE() .EQ. 0) then
call SET_BARRIER ()
else
call sleep (5)
call WAIT_BARRIER ()
endif
print*, MY_PE(), " After: TEST_BARRIER()= ", TEST_BARRIER()
end
ccccccc
It terminates normally, with this output:
denali$ a.out 1 Before: TEST_BARRIER()= T 3 Before: TEST_BARRIER()= T 2 Before: TEST_BARRIER()= T 0 Before: TEST_BARRIER()= T 0 After: TEST_BARRIER()= F 1 After: TEST_BARRIER()= T 3 After: TEST_BARRIER()= T 2 After: TEST_BARRIER()= T
And this?
ccccccccccccccccccccccccccccccc
program barr1
intrinsic MY_PE
if (MY_PE() .EQ. 0) then
call sleep (5)
call barrier ()
call barrier ()
else
call SET_BARRIER ()
call SET_BARRIER ()
call SET_BARRIER ()
call SET_BARRIER ()
call barrier ()
endif
print*, MY_PE(), " done."
end
ccccccc
It hangs on the second barrier call in PE0, and must be interrupted:
denali$ a.out
2 done.
1 done.
3 done.
Interrupt
Beginning of Traceback (PE 0):
Started from address 0x20000c0514 in routine '_sma_deadlock_wait'.
Called from line 78 (address 0x20000c0720) in routine 'barrier'.
Called from line 8 (address 0x20000001a4) in routine 'WAIT'.
Called from line 363 (address 0x2000003e08) in routine '$START$'.
End of Traceback.
---
I wrote these little programs to demonstrate specific features of the barrier calls. Here are some general observations followed by a description of barrier states and transitions.
Observations:
- Setting a barrier on one PE does NOT set it on any other PEs.
- The SET_BARRIER() call sets the local barrier unless all of the other PEs have already set their barriers, in which case, it clears the barriers on all PEs, all at once. On the T3D, this takes place in hardware, and is extremely fast, regardless of the number of PEs involved.
- While a barrier is set, extra calls to SET_BARRIER() have no effect and are not enqueued for later.
- Calling TEST_BARRIER() or WAIT_BARRIER() has no effect on the state of the barrier.
- The basic "BARRIER()" call is probably safer and easier to use than SET, WAIT, and TEST.
STATE DESCRIPTION FOR T3D BARRIER FUNCTIONS FROM THE POINT OF VIEW OF THE LOCAL PE:
-------------------------------------------------------
The local barrier can be in one of two states:
- CLEAR: in which a "WAIT_BARRIER()" call will NOT block and a "TEST_BARRIER()" call will return TRUE.
- SET: in which a "WAIT_BARRIER()" call WILL block and a "TEST_BARRIER()" call will return FALSE.
Start state:
CLEAR: all programs start with all PE processes in the barrier
state, CLEAR.
Transitions:
1) CLEAR ==> CLEAR: any other (i.e., non-local) PE calls
"SET_BARRIER()."
2) CLEAR ==> SET: occurs when the local PE calls "SET_BARRIER()."
3) SET ==> SET: any other PE, except for the last remaining PE, calls
"SET_BARRIER()."
4) SET ==> SET: any other PE which has already called "SET_BARRIER()"
calls it again.
5) SET ==> SET: the local PE calls "SET_BARRIER()" again.
6) SET ==> CLEAR: occurs when the last remaining PE calls "SET_BARRIER()".
This could be the local PE, in which case, the transitions
from CLEAR ==> SET and back from SET ==> CLEAR happen
atomically.
-------------------------------------------------------
For anyone who is interested, here is my main testing program. It lets you set barriers on different PEs at different times, and overlap barrier segments, as you wish.
#######################################################
Program barrier_tests
implicit none
integer MY_PE, tnum, nspins
real dummy
intrinsic MY_PE
logical TEST_BARRIER, tb
character*35 barStat
dummy = 0
if (N$PES .NE. 4) then
stop "NPES must equal 4."
endif
call barrier ()
do tnum = 1, 36
call spin (dummy, 1)
if (MY_PE() .EQ. 0) then
if (tnum .EQ. 8) call SET_BARRIER ()
if (tnum .EQ. 16) call WAIT_BARRIER ()
if (tnum .EQ. 28) call SET_BARRIER ()
write (6, 1010) MY_PE(), tnum, " barrier is: ", barStat ()
call flush (6)
else if (MY_PE() .EQ. 1) then
if (tnum .EQ. 8) call SET_BARRIER ()
if (tnum .EQ. 16) call WAIT_BARRIER ()
if (tnum .EQ. 24) call SET_BARRIER ()
write (6, 1010) MY_PE(), tnum, " barrier is: ", barStat ()
call flush (6)
else if (MY_PE() .EQ. 2) then
if (tnum .EQ. 8) call SET_BARRIER ()
if (tnum .EQ. 12) call WAIT_BARRIER ()
if (tnum .EQ. 24) call SET_BARRIER ()
if (tnum .EQ. 30) call SET_BARRIER ()
write (6, 1010) MY_PE(), tnum, " barrier is: ", barStat ()
call flush (6)
else if (MY_PE() .EQ. 3) then
if (tnum .EQ. 4) call SET_BARRIER ()
if (tnum .EQ. 10) call WAIT_BARRIER ()
if (tnum .EQ. 20) call SET_BARRIER ()
! Does not block because set (on this PE) not called first
if (tnum .EQ. 32) call WAIT_BARRIER ()
if (tnum .EQ. 34) call SET_BARRIER ()
! Blocks because of prior call to set
if (tnum .EQ. 35) call WAIT_BARRIER ()
write (6, 1010) MY_PE(), tnum, " barrier is: ", barStat ()
call flush (6)
endif
enddo
write (6, 1000) "PE", MY_PE(), " DONE"
call flush (6)
1000 format (a,i3,a)
1010 format (i3,i3,a,a)
end
ccccccccccccccccccccccccccccccccccccc
character*35 function barStat ()
implicit none
logical TEST_BARRIER
if (TEST_BARRIER()) then
barStat = "CLEAR "
else
barStat = "SET - WAIT_BARRIER() would block"
endif
end
ccccccccccccccccccccccccccccccccccccc
subroutine spin (dummy, nspins)
implicit none
integer i, spinnum, nspins
real dummy, ranf
intrinsic ranf
do spinnum = 1, nspins
do i = 1,100000
dummy = dummy + ranf ()
enddo
enddo
end
cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc
Corrected Barrier Timing Program
In last week's Newsletter, I presented a program to do a timing comparison between eurekas and barriers. As noted, the program occasionally hangs. I speculated: "I think the problem is that the barrier calls are too close together, and once in a while a "set" goes undetected at some process' "wait." Sure enough, as explained in the preceding article, extra SET_BARRIER() calls have no effect, and thus I had introduced a synchronization bug into the program.
Frank Chism of CRI rightfully called me on this, and sent in email and a debugged version of the program. Here is an part of Frank's message and the program:> > Use of the fully asynchronous barrier routines can introduce very subtle > bugs if you are not careful to be sure that all branches fully satisfy > not only the setting of barriers but properly wait for them to complete > before allowing termination. In your case you made sure that all PEs > set barriers when they should, but because you did not wait for all of > them to complete it was possible for some PEs to terminate before the > last barrier was satisfied. > > Frank > > ------------------------------------------------------------------- > Here is a diff between the original test you published and the corrected > one I ran. Also I have included the entire working test. > > ------------------------------------------------------------------------- > > denali% diff eb.f eb.org > 42,43d41 > < Chism debug: Note the wait barrier() part of this barrier is not > < Chism matched in the else branch of this if > 54,57d51 > < Chism debug: barrier is set, but no matching wait_barrier for full > < Chism barrier call in the other branch of the if > < call wait_barrier() > < Chism end debug > 94,99c88 > < Chism debug: A call to set_barrier without a matching wait_barrier > < Chism does not match the other branches set_barrier barrier > < Chism combination. I'll match a barrier with a barrier > < Chism call set_barrier () > < call barrier() > < Chism end debug > --- > > call set_barrier () > 103,105d91 > < Chism Debug barrier problem at termination > < Chism stop > < Chism end debug > denali% cat eb.f > Program barrier_timings > > implicit none > integer trigger_PE ! Which PE will now trigger event > integer mc(128) ! Array to store system info > integer MY_PE ! Intrinsic function to get PE number > integer mem_event ! Shared variable for memory-mode event > real t1 ! Temporary storage of start times > real t2 ! Temporary storage of end times > real junk > real delay_start ! For simulated work, start of spin > real irtc ! Internal function, clock ticks > real cp ! Clock period in secs > logical test_event ! Internal function > logical test_barrier ! Internal function > intrinsic MY_PE > cdir$ shared mem_event > > call gethmc (mc) > cp = mc(7) * 1.0e-12 ! convert picosecs to secs. > > c > c Time event propagation when using eureka-mode events > c > if (MY_PE() .EQ. N$PES-1) then > write (6,1000) "EUREKA-MODE " > call flush (6) > endif > > do trigger_PE = 0, N$PES - 1 > call clear_event () ! In Eureka mode, all PEs must clear > > call barrier () ! Make sure all PEs ready to watch for event > if (MY_PE() .EQ. trigger_PE) then > > ! Kill .1 secs to simulate some work > delay_start = irtc () > 5 if (irtc() .LT. delay_start + 0.1 / cp) goto 5 > > t1 = irtc () > call set_event () ! Trigger event > Chism debug: Note the wait barrier() part of this barrier is not > Chism matched in the else branch of this if > call barrier () ! Wait till all PEs detect event > t2 = irtc () > > write (6, 1010) MY_PE(), (t2-t1) * cp * 1e6 > call flush (6) > else > 10 if (.NOT. test_event()) goto 10 > > ! Inform triggering PE that 1st barrier release was detected > call set_barrier () > Chism debug: barrier is set, but no matching wait_barrier for full > Chism barrier call in the other branch of the if > call wait_barrier() > Chism end debug > endif > enddo > > > c > c Now use barrier > c > > > if (MY_PE() .EQ. N$PES-1) then > write (6,1000) "BARRIER" > call flush (6) > endif > > do trigger_PE = 0, N$PES - 1 > > call barrier () > if (MY_PE() .EQ. trigger_PE) then > > delay_start = irtc () > 105 if (irtc() .LT. delay_start + 0.1 / cp) goto 105 > > t1 = irtc () > call set_barrier () ! Trigger release of barrier > call barrier () ! Wait till all PEs detect release > t2 = irtc () > > write (6, 1010) MY_PE(), (t2-t1) * cp * 1e6 > call flush (6) > else > call set_barrier () ! All non-trigger PEs pass barrier > > ! Spin until trigger PE does its set_barrier > 110 if (.NOT. test_barrier()) goto 110 > > ! Inform triggering PE that 1st barrier release was detected > Chism debug: A call to set_barrier without a matching wait_barrier > Chism does not match the other branches set_barrier barrier > Chism combination. I'll match a barrier with a barrier > Chism call set_barrier () > call barrier() > Chism end debug > endif > enddo > > Chism Debug barrier problem at termination > Chism stop > Chism end debug > > 1000 format (a,/,"Event_PE ", " Delay(usecs)") > 1010 format (i4, " ", f6.2) > end
This version produces output similar to that presented last week:
EUREKA-MODE
Event_PE Delay(usecs)
0 9.25
1 9.15
2 10.10
3 9.16
4 9.12
5 9.05
6 9.07
7 9.38
BARRIER
Event_PE Delay(usecs)
0 6.83
1 6.60
2 7.97
3 7.81
4 7.08
5 7.30
6 6.44
7 6.67
Quick-Tip Q & A
A: {{ If you work at computers all day (for years), how can you reduce
eye-strain? }}
# When pressed, ARSC staff were happy to respond. Some tips:
- Every 5-60 minutes, look up from the screen and focus (for
several seconds) on something distant.
- Use dark "wallpaper." On text screens, use dark background
with bright foreground. Here's a suggested xterm setting:
WINTERM='xwsh -name winterm -fn
-*-screen-bold-r-normal--18-*-*-*-m-100-iso8859-1
-bg black -fg white -bold cyan'.
- Use large fonts and sit further back.
- Blink often.
- Use eye-drops.
- Roll your eyes.
- Get a monitor lens. This is from a Cornell study (available at:
http://www.news.cornell.edu/Chronicles/5.2.96/filters.html
):
"After using a glass anti-glare filter, the percentage of daily
or weekly problems related to lethargy/tiredness, tired eyes,
trouble focusing eyes, itching/watery eyes and dry eyes was
half what they were before filter use for people who use
computer monitors all day at work, said ergonomist Alan Hedge,
professor of design and environmental analysis and director of
the Human Factors Laboratory at Cornell."
Q: What's an easy way to remove all the 'core' and 'mppcore' files in
any of your directories (but have 'rm' ask before removing)?
[ Answers, questions, and tips graciously accepted. ]
Current Editors:
E-mail Subscriptions:
Ed Kornkven ARSC HPC Specialist ph: 907-450-8669 Kate Hedstrom ARSC Oceanographic Specialist ph: 907-450-8678 Arctic Region Supercomputing Center University of Alaska Fairbanks PO Box 756020 Fairbanks AK 99775-6020
-
Subscribe to (or unsubscribe from) the e-mail edition of the
ARSC HPC Users' Newsletter.
-
Back issues of the ASCII e-mail edition of the ARSC T3D/T3E/HPC Users' Newsletter are available by request. Please contact the editors.
