ARSC T3D Users' Newsletter 83, April 19, 1996
T3E Information
> PSC's Cray T3E Is Installed and Running Parallel Applications 04.19.96 > NEWS BRIEFS HPCwire > ============================================================================= > > Eagan, Minn. -- The first CRAY T3E scalable parallel system has been > installed at the Pittsburgh Supercomputing Center (PSC) and is already > running parallel applications, Cray Research announced earlier this week. > > Since installing the early-access system only three weeks ago, six > applications are now running in parallel mode on the new Cray system, said > PSC officials, and a number of additional applications are targeted for > near-term deployment. PSC scientific co-director Michael Levine said that > all applications running to date on the new system have produced correct > results and that PSC experts have verified hardware and software stability > with repeated long overnight runs. > > Applications selected for early migration to the CRAY T3E system include > several developed by PSC staff and users, as well as frequently used off- > the-shelf packages ported and optimized for scalable parallel computing > under PSC's Parallel Applications Technology Program partnership with > Cray. Applications already running include major quantum chemistry and > biomedical applications (CHARMM, GAMESS and AMBER). Others to be > added will include software for crystallographic structure determination, > properties of advanced materials, leading-edge environmental and finite- > element multiphysics codes and Msearch, PSC's parallel genome sequence > database searching and alignment package. > > According to a Cray representative, volume shipments of the new > supercomputer are slated to begin in third quarter. PSC's system will be > upgraded over time and ultimately scale to 512 processors. The center will > continue operating its prior-generation CRAY T3D system for production > problems and will replace it when the CRAY T3E reaches 512 processors, > according to Levine, who made a presentation at an Executive Cray User Group > meeting April 17 on the status of the CRAY T3E system and PSC's applications > migration progress. > > "On behalf of the national scientific community, we at PSC are looking > forward to the substantially increased capability of the new CRAY T3E," > Levine said. "Its faster processing and communications speeds, coupled with > a much larger memory, will enhance our ability to attack leading-edge > problems while maintaining our multi-year investment in highly optimized > applications programs. This will keep PSC, the NSF supercomputer centers > program and American researchers in a world leadership position." > > Levine said that over the course of the last three years, PSC users have > run a wide-range of industrial and scientific problems on the current CRAY > T3D supercomputer, consuming about 4.6 million processing hours. "When we > get into full production with the CRAY T3E," said Levine, "we expect > scientific productivity to improve by a factor of three to four." > > "We are pleased that this early access system is enabling PSC application > experts to make such rapid advances in migrating their applications to the > CRAY T3E environment," said Robert H. Ewald, Cray Research president and > chief operating officer. "The CRAY T3E preserves the macroarchitecture and > programming environment of the CRAY T3D. This consistency protects PSC's > parallel applications investment and contributes to this exceptional progress > by PSC and Cray personnel. This early applications progress will enable the > production use of the CRAY T3E at PSC and other customers later this year." > > Cray said that it had more than $160 million in advance orders for the > CRAY T3E system at year-end 1995.
Comparing PVM, MPI and SHMEM
In newsletter #81 (4/5/96), I presented some preliminary results comparing PVM and MPI. I have updated that table with newer results. The changes from that table to this are:- On the SGI workstations, I am now running the most recent version of PVM, pvm version 3.3 release 10.
- On both the SGI workstations and the T3D, I am now running the most recent version of the Argonne/Mississippi State implementation of MPI, MPICH 1.0.12.
- I've corrected the original source code to count ints on the T3D as 8 bytes not 4 bytes (oops, sorry about that).
- Increased the table to include SHMEM timings for an operation similar to the sends and receives in the PVM and MPI versions.
Preliminary results for MPI/PVM comparison
<-----------T3D--T3D------> <----SGI COW---->
CRI CRI EPCC Argonne Oak Ridge Argonne
SHMEM PVM MPI MPI PVM MPI
PE 1.2 3.3.4 1.4a 1.0.12 3.3.10 1.0.12
(2.1.1)
ping test 5 259 81 80 2567 2858
(microseconds)
bandwidth test
(Mbytes/second)
length of message
(bytes)
~100 16.00 .44 1.03 1.20 .04 .03
~1000 73.84 3.51 7.68 10.66 .24 .28
~10000 123.07 14.24 19.91 48.97 .54 .77
~100000 126.81 24.24 28.37 95.52 .62 .99
~1000000 127.32 25.95 29.44 103.07 .48 .88
Timing SHMEMs
To time the SHMEM_PUT routines used in a similar manner to the PVM and MPI send/receive pairs, I used the same control as the MPI timing program of Newsletter #81 but replaced the send of the data and receive of the acknowledgment with a shmem_put and a shmem_wait. On PE0, where the timing is done, I replaced:
< if( MPI_Send( iarray,numint,MPI_INT,other,tagsend,MPI_COMM_WORLD ) ) {
< printf( "can't send to bandwidth test\n" );
< goto bail;
< }
< if( MPI_Recv( iarray,1,MPI_INT,other,tagrecv,MPI_COMM_WORLD,&stat)){
< printf( "recv error in bandwidth test\n" );
< goto bail;
< }
with:
> ack[ 0 ] = 0;
> shmem_put( jarray, iarray, numint, 1 );
> if( ack[ 0 ] == 0 ) {
> shmem_wait( ack, 0 );
> }
and on PE1 where the receive was done and an acknowledgment was sent, I replaced:
&tl; MPI_Recv( iarray, numint, MPI_INT,other,tagsend,MPI_COMM_WORLD,&stat ); < MPI_Send( iarray, 1, MPI_INT, other, tagrecv, MPI_COMM_WORLD );with:
> testpos = numint - 1;
> if( jarray[ testpos ] == 0 ) {
> shmem_wait( &jarray[ testpos ], 0 );
> }
> if( jarray[ testpos ] == 1 ) {
> shmem_put( ack, ack, 1, 0 );
> }
I think this substitution duplicated the functionality of the send/receive pairs in PVM or MPI versions. The complete program for measuring shmem_puts is given at the end of this newsletter. A sample output is shown below:
RTT Avg uSec 5 RTT Min uSec 5 Message size 96 Avg Byte/uSec 13.714286 Max Byte/uSec 13.714286 Message size 960 Avg Byte/uSec 73.846154 Max Byte/uSec 73.846154 Message size 9600 Avg Byte/uSec 117.073171 Max Byte/uSec 117.073171 Message size 96000 Avg Byte/uSec 125.326371 Max Byte/uSec 126.149803 Message size 960000 Avg Byte/uSec 126.465551 Max Byte/uSec 127.270317 Done on PE0 Done on PE1All of the timings of Newsletter #81 and this newsletter were between PE0 and PE1, but with SHMEMS having such low latency and high bandwidth maybe I can use them to detect the number of hops or changes in dimension for messages between PE0 and PEn. We'll see in a future newsletter.
Announcement from PSC
> --------------------------------------------------------------------------- > Pittsburgh Supercomputing Center > Supercomputing Techniques: Parallel Processing on CRAY MPP Systems > May 20-23, 1996 > --------------------------------------------------------------------------- > REGISTRATION DEADLINE: May 1, 1996 > --------------------------------------------------------------------------- > > > PURPOSE: > > The purpose of this four day workshop is to introduce participants to > parallel processing on the CRAY T3D and explore more advanced topics, > including performance monitoring and optimization techniques. > > AGENDA: > > The first two days of this workshop have been designed to introduce > participants to PSC's supercomputing environment, compiling, debugging, > job submission, and parallel programming concepts. Participants will > learn to write parallel code using message passing calls. > > The third and fourth days are designed to cover more advanced topics, > including advanced parallel programming techniques, how to monitor > code performance and optimization strategies. There will also be > presentations on scientific applications which have been parallelized. > > ==> A working knowledge of FORTRAN or C and UNIX are required. > ==> Parallel computing experience is not necessary. > > REGISTRATION FEES: > > Admission to this training workshop is free to the United States > academic community. > > Interested corporate and government applicants, as well as applicants > from academic institutions outside the United States should contact > Anne Marie Zellner at (412)268-4960 for information on attendance fees > > HOUSING AND TRAVEL: > > Housing and travel are the responsibility of participants, but we will > provide information on local hotels at your request. Group rates for > local hotels are available on a first-come, first-served basis. > > A list of local hotels is included on the Web page referenced below. > > REGISTRATION: > > To register for this workshop, please complete and return the > registration form below by May 1, 1996 to: > > Workshop Application Committee, > ATTN: Anne Marie Zellner > Pittsburgh Supercomputing Center > 4400 Fifth Avenue, > Pittsburgh, PA 15213. > > You may also apply for this workshop by sending requested information > via electronic mail to workshop@psc.edu or via fax to (412/268-5832). > > All applicants will be notified of acceptance on May 2, 1996. > > For additional online information, please visit the workshop's Web page at > http://www.psc.edu/training/T3D_May_96/welcome.html > > ============================================================================== > Registration Form > Supercomputing Techniques: Parallel Processing on CRAY MPP Systems > May 20-23, 1996 > > Name: > > Department: > > Univ/Ind/Gov Affiliation: > > Address: > > Telephone: W ( ) H( ) > > Electronic Mail Address: > > Social Security Number: > > Citizenship: > > Are you a PSC user (yes/no)? > If yes, please give your PSC username: > > Academic Standing (please check one): > F - Faculty UG - Undergraduate I - Industrial > PD - Postdoctorate UR - University Research Staff GV - Government > GS - Graduate Student UN - University Non-Research Staff O - Other > > Please explain why you are interested in attending this workshop and what > you hope to gain from it: > > > Briefly describe your computing background (scalar, vector, and parallel > programming experience; platforms; languages) and research interests: > > > All applicants will be notified of acceptance on May 2, 1996. > >
SHMEM Timing Source
/*************** Timing program for SHMEMs by Mike Ess, ARSC ******************/
#include <stdio.h>
#include <time.h>
#include <mpp/shmem.h>
#define MIN( a, b ) (( a < b ) ? a : b )
#define MAXSIZE 250000
long iarray[ MAXSIZE ];
long jarray[ MAXSIZE ];
main(argc, argv)
int argc;
char *argv[];
{
double t1, t2, second();
int reps = 100; /* number of samples per test */
struct timeval tv1, tv2; /* for timing */
int dt1, dt2; /* time for one iter */
int at1, at2; /* accum. time */
int mt1, mt2; /* minimum times */
int numint; /* message length */
int n;
int i;
long ack[ 1 ]; /* acknowledgment signal */
int size; /* number of PEs */
int rank; /* my PE number */
int other; /* the other guy's PE */
long psync[ 2 ]; /* space for shmem_barrier synchronization */
int testpos; /* last position changed by send */
rank = pvm_get_PE( pvm_mytid() ); /* who I am */
if( rank == 0 ) other = 1; /* who he is */
if( rank == 1 ) other = 0;
for( i = 0; i < MAXSIZE; i++ ) iarray[ i ] = 1; /* initialize send buffer */
psync[ 0 ] = _SHMEM_SYNC_VALUE;
psync[ 1 ] = _SHMEM_SYNC_VALUE;
shmem_barrier( 0, 1, 2, psync ); /* sync the PEs */
if( rank == 0 ) { /* On PE 0 */
at1 = 0;
mt1 = 10000000;
for (n = 1; n <= reps; n++) { /* do rep timings */
t1 = second( ); /* latency test */
ack[ 0 ] = 0;
shmem_put( jarray, iarray, 1, 1 ); /* send a word */
if( ack[ 0 ] == 0 ) {
shmem_wait( ack, 0 ); /* wait for acknowledge */
}
t2 = second( );
dt1 = ( t2 - t1 ) * 1000000.0; /* to microseconds */
at1 += dt1; /* the running sum */
mt1 = MIN( dt1, mt1 ); /* best timing */
}
printf("RTT Avg uSec %d ", at1 / reps);
printf("RTT Min uSec %d\n", mt1 );
for (numint = 100 / sizeof( int ); numint < 1000000; numint *= 10) {
printf("Message size %d\n", numint * sizeof( int ));
at2 = 0; /* bandwidth test */
mt2 = 10000000; /* numint = 12, 120, ... */
for (n = 1; n <= reps; n++) { /* do rep timings */
t1 = second();
ack[ 0 ] = 0;
shmem_put( jarray, iarray, numint, 1 ); /* send numint ints */
if( ack[ 0 ] == 0 ) {
shmem_wait( ack, 0 ); /* wait for acknowledgment */
}
t2 = second();
dt2 = ( t2 - t1 ) * 1000000.0; /* to microseconds */
at2 += dt2; /* the running sum */
mt2 = MIN( mt2, dt2 ); /* best timing */
}
at2 /= reps;
printf("Avg Byte/uSec %8f ", (numint * sizeof( int )) / (double)at2);
printf("Max Byte/uSec %8f\n", (numint * sizeof( int )) / (double)mt2);
}
} else { /* On PE1 */
ack[ 0 ] = 1;
jarray[ 0 ] = 0;
for ( n = 1; n <= reps; n++ ) { /* mimic PE0's control */
if( jarray[ 0 ] == 0 ) {
shmem_wait( jarray, 0 ); /* wait for change */
}
if( jarray[ 0 ] == 1 ) {
shmem_put( ack, ack, 1, 0 ); /* send an ack */
}
jarray[ 0 ] = 0;
}
for (numint = 100 / sizeof( int ); numint < 1000000; numint *= 10) {
testpos = numint - 1;
jarray[ testpos ] = 0;
for (n = 1; n <= reps; n++) { /* mimic PE0's control */
if( jarray[ testpos ] == 0 ) {
shmem_wait( &jarray[ testpos ], 0 ); /* wait for last element */
} /* to change */
if( jarray[ testpos ] == 1 ) {
shmem_put( ack, ack, 1, 0 ); /* send an ack */
}
jarray[ testpos ] = 0;
}
}
}
printf( "Done on PE%d\n", rank );
exit( 0 );
bail:
printf( "Bailing out on PE%d\n", rank );
exit( -1 );
}
double second()
{
double junk;
fortran irtc();
junk = irtc( ) / 150000000.0;
return( junk );
}
Current Editors:
E-mail Subscriptions:
Ed Kornkven ARSC HPC Specialist ph: 907-450-8669 Kate Hedstrom ARSC Oceanographic Specialist ph: 907-450-8678 Arctic Region Supercomputing Center University of Alaska Fairbanks PO Box 756020 Fairbanks AK 99775-6020
-
Subscribe to (or unsubscribe from) the e-mail edition of the
ARSC HPC Users' Newsletter.
-
Back issues of the ASCII e-mail edition of the ARSC T3D/T3E/HPC Users' Newsletter are available by request. Please contact the editors.
