ARSC T3D Users' Newsletter 15, December 16, 1994
Upgrade on ARSC T3D Software
ARSC upgraded the T3D software to CrayLib_M 1.1.1.2, MAX 1.1.0.4 and SCC_M 4.0.2.11 on December 11th. There have been no problems detected by ARSC testing or reported by users.
PE Limits
Yesterday, notices were sent out to users having more than the default configuration of- 8 PEs for a maximum of 1 hour in interactive mode
- 32 PEs for a maximum of 24 hours in batch mode
Linda on the T3D
From CRI, I received a promotional announcement about the programming language Linda. I can e-mail this on to anyone interested. Is there anyone out there interested in Linda on ARSC's T3D?New SHMEM Paper
From a user, I received a copy of "SHMEM User's Guide for C" by Ray Barriuso and Allan Knies, Revision 2.2. It seems to be a replacement for the "SHMEM Users' Guide" by the same authors, Revision 2.0. I can e-mail this to anyone who is interested.Phase II I/O on the T3D
ARSC is evaluating the effort of moving from the current Phase I I/O to Phase II I/O on the T3D. In future newsletters I can summarize the differences, but for now I would like to ask if any ARSC users are interested in this upgrade or would want to be part of the evaluation?In C, the Timing Routine rtclock()
By accident I found a new timing routine for the T3D, rtclock(). I hesitate to add a new line to the table produced in Newsletter #12, but for the C programmer I think this function adds real functionality. There is a man page on denali for rtclock, but briefly, it is callable from all CRI platforms and returns the value of the real-time clock (RTC). In this way it is similar to the Fortran routines RTC and IRTC. Because there is no multiprogramming on the T3D PE we have:CPU time = Wallclock timeand so we can use rtclock to accurately measure CPU time on the T3D. The Fortran wrapper to access RTC or IRTC from C is no longer necessary and that overhead is gone too. It can be used as:
long t1, t2, rtclock(); t1 = rtclock(); /* event to measure */ t2 = rtclock(); cputime = (t2 - t1) / 150000000.0; /* time = clockticks/clockrate */The updated table (with corrected granularites for RTC and IRTC) is now:
Table of timers available on the T3D and Y-MP (um = microseconds)
timer Wallclock Fortran T3D or Granularity Resolution
or CPU timer or C Y-MP T3D Y-MP T3D Y-MP
irtc wallclock Fortran both ~.187um ~.133um
rtc wallclock Fortran both ~.867um ~.133um
tsecnd CPU Fortran both 10000um 3um
gettimeofday wallclock C both ~2500um ~30um
second CPU Fortran Y-MP 1um 5um
rtclock() wallclock C both ~1 um ~.2um
CPU (on T3D)
Communication Between the T3D and the Y-MP
In newsletter #7, I described the reason that communication between the T3D incurred a large system overhead and therefore should be avoided. One of the reasons for avoiding communication with the Y-MP was that it was slow and the timings from the example below shows this. Once we understand the basic problems then we can go on to the more exotic solutions in future newsletters.As part of the general distribution of PVM from Oak Ridge National Labs there is a collection of example programs. One of these examples does basic timings of PVM sends and receives from one master processor to one slave processor. I have modified that source to time PVM calls between Denali and the T3D.
There is one C program, timing.c, that runs on Denali initiating the sends. On a single PE of the T3D is another program timing_slave.c receiving the send and passing an acknowledgment back to the program running on denali.
A makefile that makes the two programs and runs them is shown below. (All the source for this example is in /usr/local/examples/mpp/timers on denali.)
ARCH = CRAY
CCY-MP=cc -Tcray-ymp
LDY-MP=segldr
CCT3D=cc -X 1 -Tcray-t3d
LDT3D=/mpp/bin/mppldr
CFLAGS=-O -c
PVMDIR=/u1/uaf/ess/pvm3
#NNN = user's uid
NNN =
all: timing timing_slave run
timing: timing.c
$(CCY-MP) $(CFLAGS) -I/usr/include/mpp timing.c
$(LDY-MP) -o timing timing.o -L/usr/lib -lpvm3
timing_slave: timing_slave.c
$(CCT3D) $(CFLAGS) -I/usr/include/mpp timing_slave.c
$(LDT3D) -o timing_slave timing_slave.o -lpvm3
cp timing_slave $(PVMDIR)/bin/$(ARCH)
run:
-rm /tmp/pvmd.$(NNN) /tmp/pvml.$(NNN)
pvmd3 &
sleep 1
/bin/time timing > results
echo halt
pvm
clean:
-rm -f *.o timing timing_slave core
When run with the environmental variable TARGET set to cray-ymp, the makefile will:
-
create the programs
- make the Y-MP executable timing
- make the T3D executable timing_slave
- move timing_slave to the directory from which timing will spawn it
-
run the programs
- remove the pvm log files from previous runs (users must change the NNN to their own uid number)
- initiate the pvm daemon in the background
- sleep for 1 second to allow the pvm daemon to establish
- execute the master program with results being saved to a file
- finally, initiate the pvm console and kill the pvm daemon with a halt command
Results
The timing programs measure two quantities, the time for the minimal message to make the round trip from Y-MP to T3D and then back to the Y-MP. And also a series of sends and receives of messages of increasing size. From this series of timings we can derive a speed measurement in megabytes per second. For comparison we have added the times for two other PVM configurations in the following table:
Y-MP to T3D T3D to T3D Indy to Indy
(PE0 to PE1) (Ethernet)
time for round trip
(in microseconds) 13918 2289 2486
speed for message
size (MB/s)
100 bytes .014 .044 .071
1000 bytes .125 .451 .558
10000 bytes .735 4.444 1.641
100000 bytes 1.192 33.829 1.910
1000000 bytes 2.000 89.783 2.000
The timings between PE0 and PE1 are special because for all PEs, PE(N) and PE(N+1) for N even are on the same node and share much of the same hardware. In the next newsletter we'll measure more of these PVM timings between PEs.
Reminders
List of Differences Between T3D and Y-MP:The current list of differences between the T3D and the Y-MP is:
- Data type sizes are not the same (Newsletter #5)
- Uninitialized variables are different (Newsletter #6)
- The effect of the -a static compiler switch (Newsletter #7)
- There is no GETENV on the T3D (Newsletter #8)
- Missing routine SMACH on T3D (Newsletter #9)
- Different Arithmetics (Newsletter #9)
- Different clock granularities for gettimeofday (Newsletter #11)
Current Editors:
E-mail Subscriptions:
Ed Kornkven ARSC HPC Specialist ph: 907-450-8669 Kate Hedstrom ARSC Oceanographic Specialist ph: 907-450-8678 Arctic Region Supercomputing Center University of Alaska Fairbanks PO Box 756020 Fairbanks AK 99775-6020
-
Subscribe to (or unsubscribe from) the e-mail edition of the
ARSC HPC Users' Newsletter.
-
Back issues of the ASCII e-mail edition of the ARSC T3D/T3E/HPC Users' Newsletter are available by request. Please contact the editors.
