| Newsletter Index | Quick-Tip Index | Search Newsletters |
Last week I showed the verbose output from the "what" command to see exactly what is in a T3D library in the directory /mpp/lib. A more succinct report comes from:
what /mpp/lib/*.a | grep versioncurrently on denali, this produces:
libcomm_version 1.0.0.0 06/08/94 09:41:28 libf_version 8.2.0.0 06/08/94 09:43:07 libfi_version 8.2.0.0 06/08/94 10:00:14 libm_version 8.1.0.1 06/15/94 16:43:37 libpvm3_version 1.1.0.0 06/08/94 09:04:01 libsci_version 1.1.0.0 06/08/94 09:09:10 libsma_version 1.1.0.0 06/08/94 09:39:24 libu_version 8.2.0.0 06/08/94 10:14:54The version numbers listed above identifies what is in CrayLib_M. The new version of CrayLib_M for December 4th produces:
libcomm_version 1.0.0.8 10/12/94 08:39:18 libf_version 8.2.0.9 10/12/94 08:43:15 libfi_version 8.2.0.9 10/14/94 10:37:51 libm_version 8.1.0.2 09/08/94 18:20:35 libpvm3_version 1.1.0.7 09/09/94 10:52:22 libsci_version 1.1.0.9 10/12/94 07:25:20 libsma_version 1.1.0.12 10/12/94 08:33:06 libu_version 8.2.0.9 10/12/94 10:09:48
Below is a simple program for timing the speed of shmem_get calls for messages of increasing size. The program below times 6 shmem_get calls of size 8, 80, 800, 8000, 80000, and 800000 bytes. Using these times and sizes we can get a speed for the shmem_get call as a function of message size. This program does all of its timings on PE0 and times shmem_get calls when the data is both on and off PE0. (A shmem_get of memory local is done better with just a copy.)
program sh
parameter( NMAX = 100 000 )
real a( NMAX ), b( NMAX ), c( NMAX )
INTRINSIC MY_PE
integer DONE
CDIR$ SHARED DONE
DONE = 0
do 10 i = 1, NMAX
a( i ) = i
10 continue
me = MY_PE()
n = 1
if( me .eq. 0 ) then
do 100 i = 1, 6
t1 = second( )
call shmem_get( a, b, n, 0 )
t2 = second( )
call shmem_get( a, c, n, 1 )
t3 = second( )
rmb = 8*n/1000000.0
write(6,600)i,n,t2-t1,t3-t2,rmb/(t2-t1),rmb/(t3-t2)
n = n * 10
100 continue
DONE = 1
else
1 continue
if( DONE .eq. 0 ) goto 1
endif
600 format( i4, i7, f10.6, f10.6, f10.2, f10.2 )
do 200 i = 1, NMAX
if( i .ne. b( i ) ) then
write( 6, 601 ) i, a( i ), b( i )
stop
endif
if( i .ne. c( i ) ) then
write( 6, 602 ) i, a( i ), b( i )
stop
endif
200 continue
end
601 format( "b is bad ", i10, f10.1, f10.1 )
602 format( "c is bad ", i10, f10.1, f10.1 )
real function second( )
second = float( irtc( ) ) / 150000000.0
end
The typical output for running this program is:
/mpp/bin/cf77 -Wf" " -c shmem.f mppldr shmem.o a.out 1 1 0.000016 0.000009 0.50 0.91 2 10 0.000011 0.000011 7.13 7.49 3 100 0.000017 0.000029 47.92 27.27 4 1000 0.000073 0.000211 109.51 37.93 5 10000 0.000692 0.002137 115.66 37.44 6 100000 0.006769 0.020570 118.19 38.89The major problem with this result is that the program hangs. As the program was written all PEs other than PE0 just spin, waiting for the DONE variable to change from 0 to 1. At the default level of compiler optimization the compiler doesn't recognize the the variable DONE is SHARED and might be changed by some other PE. So the spin loop has no way of being stopped. Compiling with optimization off produces this output:
/mpp/bin/cf77 -Wf" -o off " -c shmem.f mppldr shmem.o a.out 1 1 0.000019 0.000009 0.43 0.88 2 10 0.000012 0.000011 6.66 7.08 3 100 0.000020 0.000031 40.40 25.67 4 1000 0.000092 0.000223 86.58 35.82 5 10000 0.000860 0.002107 93.00 37.96 6 100000 0.008549 0.021146 93.57 37.83This time the program properly ends but the performance of the shmem_gets have taken a nasty performance hit on the local PE transfer. It has been suggested that by using the CDIR$ SUPPRESS directive the compiler will not optimize away correct behavior but will retain speed elsewhere in the code. I changed the class exercise to:
.
.
.
else
1 continue
CDIR$ SUPPRESS DONE
if( DONE .eq. 0 ) goto 1
endif
.
.
.
And the program runs correctly (i.e., doesn't hang, and correctly
transfers the data) but the performance hit is another surprise:
/mpp/bin/cf77 -Wf" " -c shmemq.f mppldr shmemq.o a.out 1 1 0.000020 0.000011 0.39 0.74 2 10 0.000015 0.000014 5.42 5.81 3 100 0.000027 0.000046 29.99 17.28 4 1000 0.000159 0.000375 50.24 21.30 5 10000 0.001154 0.002900 69.33 27.58 6 100000 0.011401 0.028644 70.17 27.93This time even the transfers off PE0 took a performance hit. This is counterintuitive. The CDIR$ SUPPRESS worked but has less performance than the original solution.
In the general case on the T3D, I think that it would be safest to develop code with optimization off and then when the program behaves correctly optimization can be turned on. If with optimization turned on, the program doesn't work correctly then the sensitive sections might be isolated in separate compilation units and compiled without optimization. In this one case CDIR$ SUPPRESS didn't seem to be very useful.
timer Wallclock Fortran T3D or Granularity Resolution
or CPU timer or C Y-MP T3D Y-MP T3D Y-MP
irtc wallclock Fortran both ~.15um ~.2um
rtc wallclock Fortran both ~.30um ~.3um
tsecnd CPU Fortran both 10000um 3um
gettimeofday wallclock C both ~2500um ~30um
second CPU Fortran Y-MP 1um 5um
notes:
We are asking users to submit their codes to be part of our regression test suite. We are looking for relatively small, short running, self contained programs in source form that are currently running correctly. The advantage of submitting your code is that it will be one of the first programs run on a new T3D release and so if it doesn't work we wouldn't install the new T3D software. Otherwise your program may fail after the new T3D software has become the default.
We have been accommodating this situation by individually changing a user's interactive permissions with the understanding that users would share their results with us. We are in the process of notifying users of the change in this policy. In the future we will reset everyone back to the default settings and then make it easy for users to request changes that will be in affect for a short time period (perhaps a week per request).
Contact:
Donald Bahls ARSC User Consultant ph: 907-450-8674 Ed Kornkven ARSC HPC Specialist ph: 907-450-8669 Arctic Region Supercomputing Center University of Alaska Fairbanks PO Box 756020 Fairbanks AK 99775-6020
Send comments and questions to the current editors using this Contact Form.E-mail Subscriptions:
| Newsletter Index | Quick-Tip Index | Search Newsletters |
Arctic Region Supercomputing Center
PO Box 756020, Fairbanks, AK 99775 | voice: 907-450-8600 | email:
home | search | about | support | news | science | resources