ARSC T3D Users' Newsletter 14, December 9, 1994
Upcoming T3D Software Upgrade
ARSC will be upgrading the T3D software to CrayLib_M 1.1.1.2, MAX 1.1.0.4 and SCC_M 4.0.2.11 on December 11th.
By now, users should have run their programs and saved the executables and results. These executables and results should be compared to the results after the new products are installed. If there are erroneous differences then Mike Ess should be contacted.
PE Limits
Next week we will be sending out notification by e-mail of PE limit changes. The changes will go into effect one week after the e-mail is sent. To have us hold up on this change or to increase the limits in the future, please contact Mike Ess by e-mail or phone.At ARSC, most production runs can be handled by the 32 PE NQS queue. The use of large partitions of 64 or 128 PEs seems restricted to research problems of the "what if..." type. Basically, development is done on smaller partitions and then to complete a table of performance or problem sizes, the larger partitions are used.
We have been accommodating this situation by individually changing a user's interactive permissions with the understanding that users would share their results with us. We are in the process of notifying users of the change in this policy. In the future we will reset everyone back to the default settings and then make it easy for users to request changes that will be in effect for a short time period (perhaps a week per request).
Shmem Calls - Corrections and Improvements
Corrections
In the ARSC T3D 12th Newsletter I had some timings and experience with using a spin wait on a shared variable. Frank Chism of CRI modified my original code to use a barrier call instead of my spin waits and that gave me a chance to back over the original code. The original code had at least three mistakes:- The arrays b and c were not initialized
- The order arrays in the shmem call should have been reversed
- The IF test for correctness was never executed
parameter( NMAX = 100 000 )
real a( NMAX ), b( NMAX ), c( NMAX )
INTRINSIC MY_PE
integer DONE
CDIR$ SHARED DONE
DONE = 0
do 10 i = 1, NMAX
a( i ) = i
10 continue
me = MY_PE()
n = 1
if( me .eq. 0 ) then
do 100 i = 1, 6
t1 = second( )
call shmem_get( a, b, n, 0 )
t2 = second( )
call shmem_get( a, c, n, 1 )
t3 = second( )
rmb = 8*n/1000000.0
write( 6, 600 ) i, n, t2-t1, t3-t2, rmb/(t2-t1), rmb/(t3-t2)
n = n * 10
100 continue
DONE = 1
else
1 continue
if( DONE .eq. 0 ) goto 1
endif
600 format( i4, i7, f10.6, f10.6, f10.2, f10.2 )
do 200 i = 1, NMAX
if( i .ne. b( i ) ) then
write( 6, 601 ) i, a( i ), b( i )
stop
endif
if( i .ne. c( i ) ) then
write( 6, 602 ) i, a( i ), b( i )
stop
endif
200 continue
601 format( "b is bad ", i10, f10.1, f10.1 )
602 format( "c is bad ", i10, f10.1, f10.1 )
end
I could have found mistakes 1. and 2. myself, if I hadn't made the mistake 3. The problem is that the statement:
if( i .ne. b( i ) ) then
is always false (as originally written) but the code hung before this. By the time I narrowed the problem down to the spin wait, I had removed the check for correct answers. It's a convoluted world.
Improvements
Frank Chism correctly points out that in coding the spin wait I was actually reinventing the wheel of parallel processing because that's what barriers are for. Using his ideas and wishing to time more than just communication on a single node (PE0 and PE1 are on the same node), I expanded the original program to:
program sh
parameter( NMAX = 100 000 )
real a( NMAX ), b( NMAX ), c( NMAX ), d( NMAX ), e( NMAX)
real f( NMAX ), g( NMAX ), h( NMAX ), p( NMAX )
INTRINSIC MY_PE
do 10 i = 1, NMAX
a( i ) = i
b( i ) = 0.0
c( i ) = 0.0
d( i ) = 0.0
e( i ) = 0.0
f( i ) = 0.0
g( i ) = 0.0
h( i ) = 0.0
p( i ) = 0.0
10 continue
me = MY_PE()
n = 1
if( me .eq. 0 ) then
do 100 i = 1, 6
t1 = second( )
call shmem_get( b, a, n, 0 )
t2 = second( )
call shmem_get( c, a, n, 1 )
t3 = second( )
call shmem_get( d, a, n, 2 )
t4 = second( )
call shmem_get( e, a, n, 3 )
t5 = second( )
call shmem_get( f, a, n, 4 )
t6 = second( )
call shmem_get( g, a, n, 5 )
t7 = second( )
call shmem_get( h, a, n, 6 )
t8 = second( )
call shmem_get( p, a, n, 7 )
t9 = second( )
rmb = 8*n/1000000.0
write(6,600)i,n,rmb/(t2-t1),rmb/(t3-t2),rmb/(t4-t3),
+ rmb/(t5-t4),rmb/(t6-t5),rmb/(t7-t6),
+ rmb/(t8-t7),rmb/(t9-t8)
n = n * 10
100 continue
do 200 i = 1, NMAX
if( a( i ) .ne. b( i ) ) then
write( 6, 601 ) i, a( i ), b( i )
stop
endif
if( a( i ) .ne. c( i ) ) then
write( 6, 602 ) i, a( i ), c( i )
stop
endif
if( a( i ) .ne. d( i ) ) then
write( 6, 603 ) i, a( i ), d( i )
stop
endif
if( a( i ) .ne. e( i ) ) then
write( 6, 604 ) i, a( i ), e( i )
stop
endif
if( a( i ) .ne. f( i ) ) then
write( 6, 605 ) i, a( i ), f( i )
stop
endif
if( a( i ) .ne. g( i ) ) then
write( 6, 606 ) i, a( i ), g( i )
stop
endif
if( a( i ) .ne. h( i ) ) then
write( 6, 607 ) i, a( i ), h( i )
stop
endif
if( a( i ) .ne. p( i ) ) then
write( 6, 608 ) i, a( i ), p( i )
stop
endif
200 continue
endif
CDIR$ barrier
600 format( i4, i7, 8f8.2 )
601 format( "b is bad ", i10, f10.1, f10.1 )
602 format( "c is bad ", i10, f10.1, f10.1 )
603 format( "d is bad ", i10, f10.1, f10.1 )
604 format( "e is bad ", i10, f10.1, f10.1 )
605 format( "f is bad ", i10, f10.1, f10.1 )
606 format( "g is bad ", i10, f10.1, f10.1 )
607 format( "h is bad ", i10, f10.1, f10.1 )
608 format( "p is bad ", i10, f10.1, f10.1 )
end
real function second( )
second = float( irtc( ) ) / 150000000.0
end
The program now times shmem_get calls on PE0 from PE0 to PE7, a barrier is used to synchronize PEs, and all the answers are checked for correctness.
The results are much better and more consistent that the times for the previous program with the spin wait. Of course the transfers on the same PE are fastest but there is a significant bump for PE1 to PE0 because they are on the same node. Why transfers from PE4 and PE5 are faster on all three runs below than transfers from PE2, PE3, PE6 and PE7 is a mystery to me. (Notice that optimization is on for this program.)
/mpp/bin/cf77 -Wf" " -c shmemb.f mppldr shmemb.o a.out Size of transfer Pe0 PE1 PE2 PE3 PE4 PE5 PE6 PE7 1 1 0.43 1.24 1.25 1.21 1.23 1.28 1.27 1.25 2 10 6.56 9.00 8.43 8.80 8.81 8.70 8.69 8.75 3 100 40.97 29.78 27.62 27.40 28.69 28.97 27.77 27.66 4 1000 105.95 38.35 34.89 34.86 36.31 36.31 34.88 34.86 5 10000 114.05 38.85 33.75 35.05 37.01 37.03 35.07 33.97 6 100000 116.46 38.67 34.83 34.94 36.85 36.89 34.94 34.88 a.out 1 1 0.45 1.27 1.19 1.21 1.27 1.27 1.23 1.26 2 10 6.50 9.06 8.58 8.72 8.91 9.02 8.63 8.55 3 100 41.61 29.84 27.81 27.84 28.65 28.76 27.52 27.70 4 1000 106.01 38.38 34.84 34.87 36.31 36.24 34.90 34.78 5 10000 114.21 38.85 33.71 35.03 36.99 37.02 35.07 34.00 6 100000 116.48 38.66 34.84 34.94 36.85 36.87 34.93 34.94 a.out 1 1 0.44 1.27 1.22 1.26 1.25 1.27 1.27 1.25 2 10 6.39 8.94 8.49 8.80 8.75 8.84 8.67 8.75 3 100 40.97 29.64 27.45 27.50 28.45 28.76 27.56 27.47 4 1000 106.37 38.28 34.89 34.84 36.24 36.31 34.92 34.85 5 10000 114.20 37.23 35.05 35.07 37.01 37.04 34.02 35.05 6 100000 115.01 38.82 34.83 34.94 36.85 36.54 34.92 34.93
Reminders
List of Differences Between T3D and Y-MP
The current list of differences between the T3D and the Y-MP is:- Data type sizes are not the same (Newsletter #5)
- Uninitialized variables are different (Newsletter #6)
- The effect of the -a static compiler switch (Newsletter #7)
- There is no GETENV on the T3D (Newsletter #8)
- Missing routine SMACH on T3D (Newsletter #9)
- Different Arithmetics (Newsletter #9)
- Different clock granularities for gettimeofday (Newsletter #11)
Third Call for Test Programs
In putting together a regression test for upgrades of T3D software, I found two areas where I need more coverage with user supplied programs. I would like to ask again for users to submit test programs on:- PVM called from Fortran
- Shmem routines called from either Fortran or C
Future Newsletter
While at the past Supercomputing '94 conference I got a lot of information on the T3D. It will take me a while to outline it, but eventually you'll see it in the upcoming newsletters.Current Editors:
E-mail Subscriptions:
Ed Kornkven ARSC HPC Specialist ph: 907-450-8669 Kate Hedstrom ARSC Oceanographic Specialist ph: 907-450-8678 Arctic Region Supercomputing Center University of Alaska Fairbanks PO Box 756020 Fairbanks AK 99775-6020
-
Subscribe to (or unsubscribe from) the e-mail edition of the
ARSC HPC Users' Newsletter.
-
Back issues of the ASCII e-mail edition of the ARSC T3D/T3E/HPC Users' Newsletter are available by request. Please contact the editors.
