ARSC T3D Users' Newsletter 44, July 14, 1995
More on Fixed and Plastic Executables
In last week's newsletter, I described one Craft Fortran program that did not show much of a speedup when compiled as a fixed executable as opposed to a "plastic" executable. Of course, a possible speed improvement is only one benefit from such a switch. Dr. Ming Jiang, a member of the UAF faculty and the ARSC staff sends in this further note:
> > ARSC T3D Users' Newsletter Number 43 07/07/95 > > > > plastic 128 0.011290 0.040066 0.011630 > > with -X128 128 0.011002 0.037600 0.010887 > > > > With this switch the timings are consistently affected in the > > right direction but the effect is minimal. > > > > If any users have similar experiences in "switchology", I'd be > > happy to pass them on through this newsletter. > > Mike, > > -X pes will produce a smaller executable. > It doesn't require re-load every time, while "plastic" one calls > mppldr each time you run it. > > Dr. Ming Jiang > Associate Professor > Dept. of Math Sciences > University of Alaska Tel: (907) 474-6666 ext 3744 > Fairbanks, AK 99775 Fax: (907) 474-5394I also thought that the -X npes switch would make more efficient use of memory, but from the test case below, it seems that the plastic executable is implemented as efficiently as the fixed executable. To check out how much of a size advantage the fixed executable has over the plastic executable, I devised this small program:
parameter( NMAX = 2048 ) real a( NMAX, NMAX ) cdir$ shared a( :, :block ) real pesum( 0:127 ) cdir$ shared pesum( :block ) intrinsic my_pe me = my_pe() call barrier() if( me .eq. 0 ) t1 = second() cdir$ doshared( j ) on a( i, j ) do 10 j = 1, NMAX do 10 i = 1, NMAX a( i, j ) = i + ( j-1 ) * NMAX 10 continue call barrier() if( me .eq. 0 ) t1 = second() - t1 call barrier() if( me .eq. 0 ) t2 = second() pesum( me ) = 0.0 cdir$ doshared( j ) on a( i, j ) do 20 j = 1, NMAX do 20 i = 1, NMAX pesum( me ) = pesum( me ) + a( i, j ) 20 continue call barrier if( me .eq. 0 ) then t2 = second() - t2 sum = 0.0 do 30 i = 0, n$pes-1 sum = sum + pesum( i ) 30 continue w = NMAX * NMAX / 1000000.0 write(6,600)n$pes,w/t1,w/t2,sum,(NMAX*NMAX+1)*NMAX*NMAX/2 600 format( i6, f10.3, f10.3, f20.1, i20 ) endif endThis program has one large two dimensional array of size NMAX*NMAX which contains the integers 1, 2, 3, ... NMAX*NMAX. The program uses as many processors as it can to sum the elements of the two dimensional array and then checks the result with the identity:
1 + 2 + 3 + ... + n = ( n + 1 ) * n / 2By varying the value NMAX, we can see how large a 2 dimensional array is allowed when the program is compiled for a fixed executable. Finding the maximum value for NMAX doesn't require much searching because Craft Fortran requires that the shared array "a" have a leading dimension that is a power of 2. Just for fun, I also computed the speed of the shared initialization loop and summation loop. The table below summaries the results:
Results for plastic and fixed executables on the T3D, sizes and executable times Number of PES 1 2 4 8 16 32 64 128 Total physical memory 8 16 32 64 128 256 512 1024 (in millions 64 bit words) Compiled as a plastic executable: Maximum Value 2048 2048 4096 4096 8192 8192 16384 16384 for NMAX Size of array "a" 4.2 4.2 16.8 16.8 67.2 67.2 268.4 268.4 (in millions of 64 bit words) Speed of initialization 6.4 12.3 24.7 49.4 103.5 207.1 414.1 828.5 (millions of results/second) Speed of summation 9.8 18.7 37.4 74.9 156.9 313.9 628.0 1255.1 (millions of adds/second) Compiled for a fixed number of PEs: Maximum Value 2048 2048 4096 4096 8192 8192 16384 16384 for NMAX Speed of initialization 6.4 14.2 24.7 49.4 103.5 237.9 475.9 951.3 Speed of summation 9.8 18.7 37.4 74.9 156.8 313.6 627.9 1256.0Because each doubling of NMAX causes a quadrupling of the memory required by array "a", the pattern of largest problems that fits seems to make sense. The use of the -X npes flag shows that the largest problem that physically fits in memory can be solved is solved. And for this problem, the current implementation of compiling a "plastic" executable is as efficient in memory use as compiling for a fixed executable. But as in the last newsletter there is a slight speed improvement for executables targeted for a fixed number of PEs.
There is a significant advantage for the fixed executable in that the size of its a.out is much smaller than of the the plastic executable. For this program, the executable sizes are:
985696 Jul 11 14:21 suma128 <- fixed executable 2189104 Jul 11 14:23 sumb128 <- plastic executableand the size of the fixed executable was the same whatever the value of npes in the compilation command cf77 -X npes ...
On the way to producing the above tables I ran into several interesting error messages:
- Operand Range Error When compiling a problem larger than what will physically fit either as a fixed or plastic executable, there is no error message (Although CRI sells only 8MW nodes, so there is a fixed limit for the case of the fixed executables). Only at execution time does the user get the catch-all "Operand Range Error". Increasing NMAX until this error message appears is how I determined the largest problem that fits.
The fixed limit on PE_PRIVATE arrays If the array "a" is not declared SHARED then it
is a PE_PRIVATE array and will be contained in the memory of each PE. In compiling both the
plastic and fixed executable, the compiler gives a good error message for arrays too large
for a single PE:
2 2. real a( nmax, nmax ) cft77-424 cf77: WARNING $MAIN, Line = 2, File = suma.f, Line = 2 Array "A" exceeds CPU targeted memory size of 8388608 words.
Limit on the size of a shared array There is a compiler limit on the largest shared
array. If the array "a" is shared and NMAX = 262144 (the array "a" is 6872GW) then the
compilation aborts with:
Total size of memory segment 03 exceeds compiler limit. cft77-9 cf77: CFT77 COMPILATION ABORTEDSo the size of a shared array seems to have no real limit except for maybe artifical programs like the "linpeak" benchmark.
Fixed objects take precedence over plastic objects
- When dealing with object files that have been compiled as fixed and plastic, the executable that mppldr produces is always fixed. And mppldr believes the user knows what he is doing, because there is no warning message.
- In the case when the "fixedness" varies among objects, the mppldr knows the user needs
helps and gives an appropriate error message:
/mpp/bin/cf77 -X1 second.f /mpp/bin/cf77 -X128 suma.f second.o -o suma mppldr-302 cf77: WARNING The number of PEs compiled into module 'SECOND' (1) differs from the number of PEs compiled into a prior module (128). mppldr-112 cf77: WARNING Because of previous errors, file 'suma' is not executable.
Release 1.2.2 of CrayLibsARSC is planning to make available the 1.2.2 release of CrayLibs as soon as it is released. Watch this newsletter for further details.
List of Differences Between T3D and Y-MPThe current list of differences between the T3D and the Y-MP is:
- Data type sizes are not the same (Newsletter #5)
- Uninitialized variables are different (Newsletter #6)
- The effect of the -a static compiler switch (Newsletter #7)
- There is no GETENV on the T3D (Newsletter #8)
- Missing routine SMACH on T3D (Newsletter #9)
- Different Arithmetics (Newsletter #9)
- Different clock granularities for gettimeofday (Newsletter #11)
- Restrictions on record length for direct I/O files (Newsletter #19)
- Implied DO loop is not "vectorized" on the T3D (Newsletter #20)
- Missing Linpack and Eispack routines in libsci (Newsletter #25)
- F90 manual for Y-MP, no manual for T3D (Newsletter #31)
- RANF() and its manpage differ between machines (Newsletter #37)
- CRAY2IEG is available only on the Y-MP (Newsletter #40)
- Missing sort routines on the T3D (Newsletter #41)
I encourage users to e-mail in differences that they have found, so we all can benefit from each other's experience.
Ed Kornkven ARSC HPC Specialist ph: 907-450-8669 Kate Hedstrom ARSC Oceanographic Specialist ph: 907-450-8678 Arctic Region Supercomputing Center University of Alaska Fairbanks PO Box 756020 Fairbanks AK 99775-6020
Subscribe to (or unsubscribe from) the e-mail edition of the
ARSC HPC Users' Newsletter.
Back issues of the ASCII e-mail edition of the ARSC T3D/T3E/HPC Users' Newsletter are available by request. Please contact the editors.