ARSC T3D Users' Newsletter 37, May 26, 1995
New T3D Batch PE Limits
In the past week all active users of the ARSC T3D had their batch PE limit increased to 128. This allows these users access to the 128-PE 8-hour queues that run on the weekends. If you need your T3D UDB limits changed please contact Mike Ess.
New Fortran Compiler
An upgrade version of the cf77 compiler is available on Denali with the path:/mpp/bin/cft77new and /mpp/bin/cf77newFor the default versions we have:
/mpp/bin/cf77 -V Cray CF77_M Version 6.0.4.1 (6.59) 05/25/95 13:36:39 Cray GPP_M Version 6.0.4.1 (6.16) 05/25/95 13:36:39 Cray CFT77_M Version 6.2.0.4 (227918) 05/25/95 13:36:39and for this new version:
/mpp/bin/cf77new -V Cray CF77_M Version 6.0.4.1 (6.59) 05/25/95 13:37:26 Cray GPP_M Version 6.0.4.1 (6.16) 05/25/95 13:37:26 Cray CFT77_M Version 6.2.0.9 (259228) 05/25/95 13:37:27This new compiler fixes a potential race condition in shared memory accesses and also fixes an inlining problem with the F90 intrinsics, MINLOC and MAXLOC.
This compiler will become the default after we finish testing it and users will be notified before that happens. I encourage users to try this compiler before it becomes the default.
Random Number Generation on the T3D and Y-MP
In newsletter #29 (3/31/95), I announced the availability of benchlib on the ARSC T3D. The sources for these libraries are available on the ARSC ftp server in the file:pub/submissions/libbnch.tar.ZThe compiled libraries are also available on Denali in
/usr/local/examples/mpp/lib/lib_32.a /usr/local/examples/mpp/lib/lib_scalar.a /usr/local/examples/mpp/lib/lib_util.a /usr/local/examples/mpp/lib/lib_random.a /usr/local/examples/mpp/lib/lib_tri.a /usr/local/examples/mpp/lib/lib_vect.aand the sources are available in: /usr/local/examples/mpp/src. In previous newsletters, I've described the contents of some of the libraries:
#30 (4/7/95) - the "pref" routine of lib_util.a #33 (4/28/95) - the fast scalar math routines in lib_scalar.a #34 (5/05/95) - the fast vector math routines in lib_vector.a #35 (5/12/95) - the tridiagonal solvers in lib_tri.aIn this newsletter, I will describe the routines in lib_random.a and compare them to the other random number generators on the T3D and Y-MP. This is the last library from benchlib. I welcome any user comment or experience with these libraries and I will pass it on to readers of the ARSC T3D newsletter.
Random Number Generators
Of course, a 'random' generator doesn't actually produce random numbers but a sequence of pseudorandom numbers that have characteristics of a sequence of random numbers. These sequences are necessarily reproducible so that computer experiments can be run over and over. As in most areas of computing, there is always of tradeoff between speed and quality and so it is with these pseudorandom number generators (RNG). The easiest to measure is their speed and that is what is presented here. (Analyzing the quality of their random sequences is left for some Ph.D. thesis.)On the T3D there are 3 available random number generators:
rand: rand() is supplied with most implementations of C in
libc.a. It usually produces a 16 bit integer, that
can be converted to a double in the range 0.0 to 1.0,
i.e.:
random_real = rand() / (double)RAND_MAX;
where RAND_MAX is defined in <stdlib.h>. Because
only 16 bits can change from call to call it's
usually not considered "random" enough. But its
implementation is the same on probably all machines.
It is the same on both the Y-MP and the T3D. There
is a man page on Denali that describes rand(). (The
division to obtain a random real number is not the
same on each machine.)
ranf: RANF is the random number generator on the Y-MP. It
exists in both scalar and vector versions in libm.a
and is written in highly optimized assembly language.
This routine is described in a man page on Denali and
in that manpage there is Fortran version that mimics
the assembly language. That Fortran version does not
run on the T3D because of differences in normalizing
floating point numbers, but the T3D does have a
version in /mpp/lib/libm.a that produces results
similar to those on the Y-MP. It's a little
inconsistent to have the common manpage for the Y-MP
and T3D to have a program describing the function
run only on the Y-MP.
rantom: The versions in benchlib's lib_random.a are different
than both of the above options but have been written
for FAST execution on the T3D processor. In
lib_random.a are both Fortran and assembly language
versions and a manpage describing the algorithm and
its speed is in /usr/local/examples/mpp/src/random
Timing Routines
Below is the program I used to time the T3D routines:
#include <stdlib.h>
main()
{
int a[ 1000000 ], b[ 1000000 ], c[ 1000000 ], d[ 1000000 ];
int nlog, n, i;
double t1, second(), t2, t3, t4;
int rand();
fortran double RANF();
fortran double RANTOM();
double denom;
denom = (double)RAND_MAX;
printf( " RAND_MAX = %d %f\n", RAND_MAX, denom );
n = 1;
for( nlog = 0; nlog < 7; nlog++ ) {
t1 = second();
for( i = 1; i <= n; i++ ) { a[ i ] = rand() / denom; }
t1 = second() - t1;
t2 = second();
for( i = 1; i <= n; i++ ) {
b[ i ] = RANF();
}
t2 = second() - t2;
t3 = second();
for( i = 1; i <= n; i++ ) {
c[ i ] = RANTOM();
}
t3 = second() - t3;
t4 = second();
for( i = 1; i <= n; i++ ) {
d[ i ] = RANTOMS();
}
t4 = second() - t4;
printf("%3d %10d %10.6f %4.1f %10.6f %4.1f %10.6f %4.1f %10.6f %4.1f\n"
,nlog,n,t1,n/(t1*1000000),t2,n/(t2*1000000),
t3,n/(t3*1000000),t4,n/(t4*1000000));
n = n * 10;
}
}
double second()
{
fortran irtc();
return( irtc( ) / 150000000.0 );
}
</pre>
The timing program used on the Y-MP was:
<pre>
#include <stdlib.h>
main()
{
int a[ 1000000 ], b[ 1000000 ], c[ 1000000 ], d[ 1000000 ];
int nlog, n, i;
double t1, SECOND(), t2, t3, t4;
int rand();
fortran double ranf();
fortran double RANTOM();
fortran double SECOND();
double denom;
int zero = 0;
denom = (double)RAND_MAX;
printf( " RAND_MAX = %d %f\n", RAND_MAX, denom );
n = 1;
for( nlog = 0; nlog < 7; nlog++ ) {
t1 = SECOND();
for( i = 1; i <= n; i++ ) {
a[ i ] = rand() / denom;
}
t1 = SECOND() - t1;
t2 = SECOND();
for( i = 1; i <= n; i++ ) {
b[ i ] = RANFF();
}
t2 = SECOND() - t2;
RANSET( &zero );
t3 = SECOND();
for( i = 1; i <= n; i++ ) {
c[ i ] = ranf();
}
t3 = SECOND() - t3;
RANSET( &zero );
t4 = SECOND();
for( i = 1; i <= n; i++ ) {
d[ i ] = _ranf();
}
t4 = SECOND() - t4;
for( i = 0; i <= n; i++ ) {
if( c[ i ] != d[ i ] ) {
printf( "diff in ranf %f %f\n", i, c[ i ], d[ i ] );
}
}
printf("%3d %10d %10.6f %4.1f %10.6f %4.1f %10.6f %4.1f %10.6f %4.1f\n"
,nlog,n,t1,n/(t1*1000000),t2,n/(t2*1000000),t3,n/(t3*1000000),t4,n/(t4*1000000));
n = n * 10;
}
}
Timing Results
Usually the the results of a RNG are given in terms of millions of random numbers per second and that is how I've arranged the table below. I also like to time a loop's worth of results and then divide by the length of the loop. This gives some feel for the overhead of the loop compared to the work of the loop body and also shows the asymptotic speed for a large number of calls.
The speed of random generators on the T3D and Y-MP
==================================================
(in millions of random numbers per second)
T3D routines:
RNG -> rand() ranf() rantom()
Fortran Assembler
------ ------ ------- ---------
loops 1 0.2 0.2 0.3 0.3
10 1.1 0.9 1.1 1.5
100 2.3 1.2 1.5 2.3
1000 2.6 1.3 1.6 2.4
10000 2.6 1.3 1.6 2.5
100000 2.6 1.3 1.6 2.4
1000000 2.6 1.3 1.6 2.4
Y-MP routines:
RNG -> rand() ranf() ranf()
Fortran library routines
Scalar Vector
------ ------- ------ ------
loops 1 0.2 0.1 0.1 0.2
10 0.6 0.3 0.6 1.5
100 0.8 0.3 0.9 10.5
1000 0.9 0.3 0.9 18.5
10000 0.9 0.3 0.9 19.3
100000 0.9 0.3 0.9 19.1
1000000 0.9 0.3 0.9 19.4
Observations:
- Rand() runs faster on a single PE of the T3D than on the Y-MP. The price of portability often obscures performance differences.
- The difference between rand() and ranf() on both machines shows that the quality of a random sequence does not come without cost.
- That the difference between the Fortran version of rantom() and the assembler version on the T3D is not as great as the difference as the ranf() versions on the Y-MP may be a sign that the days of assembly language writing are on the wane.
-
The last three columns show a range of Y-MP performance, all computing the same sequence of random numbers. The performance follows the effort:
Fortran -> assembler -> vectorized assembler
- The last two loops for the Y-MP timing program show: "What a difference an underscore makes!" (The underscore invokes the vectorized version.)
List of Differences Between T3D and Y-MP
The current list of differences between the T3D and the Y-MP is:- Data type sizes are not the same (Newsletter #5)
- Uninitialized variables are different (Newsletter #6)
- The effect of the -a static compiler switch (Newsletter #7)
- There is no GETENV on the T3D (Newsletter #8)
- Missing routine SMACH on T3D (Newsletter #9)
- Different Arithmetics (Newsletter #9)
- Different clock granularities for gettimeofday (Newsletter #11)
- Restrictions on record length for direct I/O files (Newsletter #19)
- Implied DO loop is not "vectorized" on the T3D (Newsletter #20)
- Missing Linpack and Eispack routines in libsci (Newsletter #25)
- F90 manual for Y-MP, no manual for T3D (Newsletter #31)
- RANF() and its manpage differ between machines (Newsletter #37)
Current Editors:
E-mail Subscriptions:
Ed Kornkven ARSC HPC Specialist ph: 907-450-8669 Kate Hedstrom ARSC Oceanographic Specialist ph: 907-450-8678 Arctic Region Supercomputing Center University of Alaska Fairbanks PO Box 756020 Fairbanks AK 99775-6020
-
Subscribe to (or unsubscribe from) the e-mail edition of the
ARSC HPC Users' Newsletter.
-
Back issues of the ASCII e-mail edition of the ARSC T3D/T3E/HPC Users' Newsletter are available by request. Please contact the editors.
