ARSC T3D Users' Newsletter 15, December 16, 1994
Upgrade on ARSC T3D Software
ARSC upgraded the T3D software to CrayLib_M 22.214.171.124, MAX 126.96.36.199 and SCC_M 188.8.131.52 on December 11th. There have been no problems detected by ARSC testing or reported by users.
PE LimitsYesterday, notices were sent out to users having more than the default configuration of
- 8 PEs for a maximum of 1 hour in interactive mode
- 32 PEs for a maximum of 24 hours in batch mode
Linda on the T3DFrom CRI, I received a promotional announcement about the programming language Linda. I can e-mail this on to anyone interested. Is there anyone out there interested in Linda on ARSC's T3D?
New SHMEM PaperFrom a user, I received a copy of "SHMEM User's Guide for C" by Ray Barriuso and Allan Knies, Revision 2.2. It seems to be a replacement for the "SHMEM Users' Guide" by the same authors, Revision 2.0. I can e-mail this to anyone who is interested.
Phase II I/O on the T3DARSC is evaluating the effort of moving from the current Phase I I/O to Phase II I/O on the T3D. In future newsletters I can summarize the differences, but for now I would like to ask if any ARSC users are interested in this upgrade or would want to be part of the evaluation?
In C, the Timing Routine rtclock()By accident I found a new timing routine for the T3D, rtclock(). I hesitate to add a new line to the table produced in Newsletter #12, but for the C programmer I think this function adds real functionality. There is a man page on denali for rtclock, but briefly, it is callable from all CRI platforms and returns the value of the real-time clock (RTC). In this way it is similar to the Fortran routines RTC and IRTC. Because there is no multiprogramming on the T3D PE we have:
CPU time = Wallclock timeand so we can use rtclock to accurately measure CPU time on the T3D. The Fortran wrapper to access RTC or IRTC from C is no longer necessary and that overhead is gone too. It can be used as:
long t1, t2, rtclock(); t1 = rtclock(); /* event to measure */ t2 = rtclock(); cputime = (t2 - t1) / 150000000.0; /* time = clockticks/clockrate */The updated table (with corrected granularites for RTC and IRTC) is now:
Table of timers available on the T3D and Y-MP (um = microseconds) timer Wallclock Fortran T3D or Granularity Resolution or CPU timer or C Y-MP T3D Y-MP T3D Y-MP irtc wallclock Fortran both ~.187um ~.133um rtc wallclock Fortran both ~.867um ~.133um tsecnd CPU Fortran both 10000um 3um gettimeofday wallclock C both ~2500um ~30um second CPU Fortran Y-MP 1um 5um rtclock() wallclock C both ~1 um ~.2um CPU (on T3D)
Communication Between the T3D and the Y-MPIn newsletter #7, I described the reason that communication between the T3D incurred a large system overhead and therefore should be avoided. One of the reasons for avoiding communication with the Y-MP was that it was slow and the timings from the example below shows this. Once we understand the basic problems then we can go on to the more exotic solutions in future newsletters.
As part of the general distribution of PVM from Oak Ridge National Labs there is a collection of example programs. One of these examples does basic timings of PVM sends and receives from one master processor to one slave processor. I have modified that source to time PVM calls between Denali and the T3D.
There is one C program, timing.c, that runs on Denali initiating the sends. On a single PE of the T3D is another program timing_slave.c receiving the send and passing an acknowledgment back to the program running on denali.
A makefile that makes the two programs and runs them is shown below. (All the source for this example is in /usr/local/examples/mpp/timers on denali.)
ARCH = CRAY CCY-MP=cc -Tcray-ymp LDY-MP=segldr CCT3D=cc -X 1 -Tcray-t3d LDT3D=/mpp/bin/mppldr CFLAGS=-O -c PVMDIR=/u1/uaf/ess/pvm3 #NNN = user's uid NNN = all: timing timing_slave run timing: timing.c $(CCY-MP) $(CFLAGS) -I/usr/include/mpp timing.c $(LDY-MP) -o timing timing.o -L/usr/lib -lpvm3 timing_slave: timing_slave.c $(CCT3D) $(CFLAGS) -I/usr/include/mpp timing_slave.c $(LDT3D) -o timing_slave timing_slave.o -lpvm3 cp timing_slave $(PVMDIR)/bin/$(ARCH) run: -rm /tmp/pvmd.$(NNN) /tmp/pvml.$(NNN) pvmd3 & sleep 1 /bin/time timing > results echo halt pvm clean: -rm -f *.o timing timing_slave coreWhen run with the environmental variable TARGET set to cray-ymp, the makefile will:
create the programs
- make the Y-MP executable timing
- make the T3D executable timing_slave
- move timing_slave to the directory from which timing will spawn it
run the programs
- remove the pvm log files from previous runs (users must change the NNN to their own uid number)
- initiate the pvm daemon in the background
- sleep for 1 second to allow the pvm daemon to establish
- execute the master program with results being saved to a file
- finally, initiate the pvm console and kill the pvm daemon with a halt command
ResultsThe timing programs measure two quantities, the time for the minimal message to make the round trip from Y-MP to T3D and then back to the Y-MP. And also a series of sends and receives of messages of increasing size. From this series of timings we can derive a speed measurement in megabytes per second. For comparison we have added the times for two other PVM configurations in the following table:
Y-MP to T3D T3D to T3D Indy to Indy (PE0 to PE1) (Ethernet) time for round trip (in microseconds) 13918 2289 2486 speed for message size (MB/s) 100 bytes .014 .044 .071 1000 bytes .125 .451 .558 10000 bytes .735 4.444 1.641 100000 bytes 1.192 33.829 1.910 1000000 bytes 2.000 89.783 2.000The timings between PE0 and PE1 are special because for all PEs, PE(N) and PE(N+1) for N even are on the same node and share much of the same hardware. In the next newsletter we'll measure more of these PVM timings between PEs.
RemindersList of Differences Between T3D and Y-MP:
The current list of differences between the T3D and the Y-MP is:
- Data type sizes are not the same (Newsletter #5)
- Uninitialized variables are different (Newsletter #6)
- The effect of the -a static compiler switch (Newsletter #7)
- There is no GETENV on the T3D (Newsletter #8)
- Missing routine SMACH on T3D (Newsletter #9)
- Different Arithmetics (Newsletter #9)
- Different clock granularities for gettimeofday (Newsletter #11)
Ed Kornkven ARSC HPC Specialist ph: 907-450-8669 Kate Hedstrom ARSC Oceanographic Specialist ph: 907-450-8678 Arctic Region Supercomputing Center University of Alaska Fairbanks PO Box 756020 Fairbanks AK 99775-6020
Subscribe to (or unsubscribe from) the e-mail edition of the
ARSC HPC Users' Newsletter.
Back issues of the ASCII e-mail edition of the ARSC T3D/T3E/HPC Users' Newsletter are available by request. Please contact the editors.