ARSC T3D Users' Newsletter 25, March 3, 1995
ARSC T3D Future Upgrades
Next week we will be testing the upgrade to the T3D 1.2 Programming Environment (libraries, tools and compilers.) If all goes well it will be on the system in three weeks.
We are also planning to install CF90 and C++ for the T3D. This will come after the upgrade to the 1.2 P.E. I am interested in hearing from users who want to use the CF90 and C++ products as soon as they are available.
Upgrade to the T3D Memory
On February 7th, ARSC upgraded the memory on each PE from 2MWs to 8MWs. If any users have questions about this, please contact Mike Ess. We have run into problems with the user limit for mppcore size which is now set too low for the 8MW nodes. On March 3rd, all T3D users had their mppcore file limit increased to allow a core file from 8 PEs. If you are planning to debug with core files on more than 8 PEs, then please contact Mike Ess. If your setting is too low, you will get an error message like:mppexec: user UDB core limit reached, mppcore dump terminatedThe T3D can create tremendously large mppcore files, the storage for these files are charged against your service unit allocation. If you're not going to use mppcore files, delete them as soon as you don't need them. These files appear in the directory of the executable that aborted with the name "mppcore". Y-MP jobs that abort produce a core file called "core".
With the increased memory size, another problem has arisen. A user has tried to create his own restart files by writing out large data arrays from all PEs at the same time. With the 8MW nodes the size of these arrays can be much larger than before. At times these write operations hang and the T3D application hangs. This shouldn't be the case, but as a work-around the user 'serialized' these write operations to go one at a time. This could be implemented in Fortran with a shared variable. In this case the user used a PVM message from PE N to PE N+1 to notify PE N+1 that its write could begin.
New SHMEM Manuals from CRI
One of the recipients of the ARSC T3D newsletter alerted me to postscript copies of the SHMEM manuals that I wrote about in Newsletter #24.SN-2516 SHMEM Technical guide for Fortran Users SN-2517 SHMEM Technical guide for C UsersI can e-mail these postscript versions to ARSC users who request them.
FMlib Available on ARSC's T3D
The FM library of communication functions is available and running on the ARSC T3D. The include file "fm.h", which is needed, is in the directory/usr/local/examples/mpp/includethe needed library is:
/usr/local/examples/mpp/lib/libFM.aThese functions come in the similar send/receive pairs as PVM but have much smaller latencies than the PVM functions on the T3D. One set of functions use the fetch and increment hardware and the other set use the atomic swap hardware. A brief description of the functions follows:
void FM_initialize(void); /* Setup */
void FM_set_parameter(int param_id, int value)
/* Change maximum message size or number of buffers */
typedef void FM_shandler( );
void FMf_send_4( int node, FM_shandler *fptr, ... )
/* Sent a 4 word message, with four words as arguments to the
handler FM_shandler executed on the receiving PE node. */
typedef void FM_lhandler( void *, int);
void FMf_send(int node,FM_lhandler,void *buf,int byte_count);
/* Sent a message of length byte_count, with handler
FM_lhandler executed on the receiving PE node. */
int FMf_extract_1( void );
/* retrieve the next message sent to the current node */
int FMf_extract( void );
/* a more general message retrieve */
These functions with the FMf_ prefix use the fetch and increment hardware, there is a parallel set of functions with a FMi_ prefix that use the atomic swap hardware. Further details are in the WWW page:
http://www-csag.cs.uiuc.edu/projects/communication/t3d-fm.htmlThe idea of passing a function to be executed on the receiving node is intriguing, but I haven't figured out a use for it. Because these functions are oriented towards small latencies they seem specialized for small (<4Kb) messages. I'll do some timing comparisons for next week's newsletter. Below is an example program and its output when run on a 2PE machine
#include <stdio.h>
#include <mpp/shmem.h>
#include <mpp/globals.h>
#include <fm.h>
int num_incoming_msgs = 0;
void remote_handler_short(int i1, int i2, int i3, int i4) {
num_incoming_msgs++;
printf( " %d %d %d %d\n", i1, i2, i3, i4 );
}
void remote_handler_long(void* buf, int byte_count ) {
int* buffer = (int *)buf;
int i;
num_incoming_msgs++;
printf( " %d %d", num_incoming_msgs, ( byte_count / 8 ) );
for(i = 0; i < (byte_count/8); i++) printf(" %d",buffer[i]);
printf( "\n" );
}
main() {
int i;
int buf[32];
int my_pe = _MPP_MY_PE;
for( i = 0; i < 32; i++ ) buf[ i ] = 0;
FM_set_parameter( MAX_MSG_SIZE_FINC, 1024 );
FM_set_parameter( MSG_BUFFER_SIZE_FINC, 512 );
FM_initialize();
if (my_pe == 0) {
for( i = 0; i < 32; i++ ) buf[ i ] = i;
for (i = 0; i < 10; i++) {
FMf_send_4(1, remote_handler_short, i, i+1, i+2, i+3 );
FMf_send(1, remote_handler_long, (void *)buf, (i+1)*8 );
}
}
else if (my_pe == 1) {
while ( num_incoming_msgs < 2 * 10 ) {
FMf_extract_1();
FMf_extract();
}
}
barrier();
}
0 1 2 3
2 1 0
1 2 3 4
4 2 0 1
2 3 4 5
6 3 0 1 2
3 4 5 6
8 4 0 1 2 3
4 5 6 7
10 5 0 1 2 3 4
5 6 7 8
12 6 0 1 2 3 4 5
6 7 8 9
14 7 0 1 2 3 4 5 6
7 8 9 10
16 8 0 1 2 3 4 5 6 7
8 9 10 11
18 9 0 1 2 3 4 5 6 7 8
9 10 11 12
20 10 0 1 2 3 4 5 6 7 8 9
Libsci Routines and Improved Speed on the T3D
"Linpack" when it was introduced in 1979 by Jack Dongarra, Cleve Moler, J.R. Bunch and Pete Steward, it was not a benchmark but a collection of linear algebra routines and the "Linpack Benchmark" grew from an appendix of the Linpack Users' Guide that compared various machines solving a 100x100 system of linear equations. The Linpack Benchmark measures the speed of only two of the routines in the package:sgefa - factors a dense square matrix sgesl - uses the above factorization to solve a system of equations
The naming convention for the linpack routines is straightforward:sgefa ^^ ^ - fa for factorization(sl for solve,...) --- ge for general matrix(pp for packed,positive definite,...) ---- s for single precision(d for double precision,...)In 1992, the Lapack package was introduced as a replacement for Linpack and Eispack (a package of eigenvalue routines). This library was more comprehensive and faster than the previous packages. It contains replacements for the linpack routines sgefa and sgesl, and the linpack benchmark can be modified to run with Lapack routine with only the following changes:
c call sgefa(a,lda,n,ipvt,info)
call sgetrf( n, n, a, lda, ipvt, info )
c call sgesl(a,lda,n,ipvt,b,0)
call sgetrs( "n", n, 1, a, lda, ipvt, b, n, info )
The naming conventions and arguments passed are similar to the Linpack routines. With these two changes to the Linpack benchmark, we can compare the single PE results from the last newsletter with these single PE results of "the Linpack benchmark run with the Lapack routines".
Problem Linpack Mflop rates Lapack Mflop rates
size Fortran BLAS1 Libsci BLAS1
1 .19 .13 .04
2 .45 .30 .15
3 .90 .30 .03
4 1.50 1.04 .06
5 2.21 1.28 .09
10 5.04 2.92 2.55
20 8.72 6.07 5.70
40 11.04 10.72 13.48
50 10.93 12.29 15.95
60 11.10 13.49 20.13
70 11.21 13.38 21.12
80 11.43 13.31 24.89
90 11.55 13.46 25.15
100 11.64 13.75 27.73
200 12.08 17.30 36.82
300 12.03 19.66 41.51
400 11.84 21.20 44.12
500 11.60 22.26 45.40
600 11.36 23.03 46.73
700 11.21 23.61 47.80
800 11.14 24.14 48.57
900 10.97 24.40 48.90
1000 10.83 24.69 49.47
1500 10.15 25.62 51.04
2000 9.83 26.18 51.87
2500 9.70 24.61 52.30
2600 9.70 26.49 52.42
The Lapack routines on the T3D are a good improvement over the Linpack routines. Neither Linpack nor Eispack are in libsci for the T3D, the T3D version of libsci has only Lapack routines. On an older machine like the Y-MP, Linpack and Eispack are provided in libsci for legacy codes. The Y-MP libsci also contains all of the Lapack routines.
List of Differences Between T3D and Y-MP
The current list of differences between the T3D and the Y-MP is:- Data type sizes are not the same (Newsletter #5)
- Uninitialized variables are different (Newsletter #6)
- The effect of the -a static compiler switch (Newsletter #7)
- There is no GETENV on the T3D (Newsletter #8)
- Missing routine SMACH on T3D (Newsletter #9)
- Different Arithmetics (Newsletter #9)
- Different clock granularities for gettimeofday (Newsletter #11)
- Restrictions on record length for direct I/O files (Newsletter #19)
- Implied DO loop is not "vectorized" on the T3D (Newsletter #20)
- Missing Linpack and Eispack routines in libsci (Newsletter #25)
Current Editors:
E-mail Subscriptions:
Ed Kornkven ARSC HPC Specialist ph: 907-450-8669 Kate Hedstrom ARSC Oceanographic Specialist ph: 907-450-8678 Arctic Region Supercomputing Center University of Alaska Fairbanks PO Box 756020 Fairbanks AK 99775-6020
-
Subscribe to (or unsubscribe from) the e-mail edition of the
ARSC HPC Users' Newsletter.
-
Back issues of the ASCII e-mail edition of the ARSC T3D/T3E/HPC Users' Newsletter are available by request. Please contact the editors.
