ARSC T3D Users' Newsletter 94, July 3, 1996
T3E: Topic of T3D User Group Meeting
ARSC's T3D User Group will be meeting on Thursday, July 11th from 9:00 AM to 12 noon. The location will be Butrovich, room 109. (This reflects a time change, compared with that announced in T3D Newsletter #92.)
Frank Chism of CRI will give a presentation on the CRAY T3E.
As you may already know, ARSC expects to upgrade its T3D to a T3E. We encourage T3D users to become involved in the discussion of possible configurations, and other issues surrounding the T3E. This meeting provides a step in that direction.
MPICH v1.0.13 Available to ARSC T3D Users
We have installed the latest version of Argonne National Lab and Mississippi State's version of MPI for the T3D. It is on denali, in the following location:/usr/local/pkg/mpich_1.0.13/mpichThe include files are in:
/usr/local/pkg/mpich_1.0.13/mpich/includeThe archives are in:
/usr/local/pkg/mpich_1.0.13/mpich/lib/cray_t3d/t3d
Caveat
Version 1.0.13 is available on the MPICH public ftp site, but according to MPICH tech support, has not yet been "officially" released. (There's no mention of it on their WWW site.) However, they also tell me that the T3D code in this release has not changed since May, and that we could "go ahead" and install it. We did (but v1.0.12 is still available).Linker Warnings
When I link with v1.0.13, mppldr reports:
Unsatisfied external references
Entry name Modules referencing entry
execvp (equivalenced to $USX2)
dbxerr$c
fork (equivalenced to $USX1)
dbxerr$c
The reason is that the object "dbxerr.o" in libmpi.a references the common Unix system functions, fork and execvp, which are not supported in MAX. I reported this to MPICH, but doubt if it's going to cause problems for T3D users. (If your code does call one of the routines which depends on fork or execvp, you'll have to make changes. If this happens, please satisfy my curiosity and drop me a line.)
My few test programs compile and run correctly, but there is a glitch:
When I use "/mpp/bin/cc -c" to compile and "mppldr" to load (as two separate steps), the "execute bit" is set correctly on the result file. However, when I use "/mpp/bin/cc" without the "-c" option, thus requesting it to invoke "mppldr" for me, the result file does not have the execute bit set, and I have to set it manually (chmod 700 a.out). In both cases, however, the final executable runs correctly.
Bug Fix
As reported in Newsletter #92, previous MPICH versions crash on some of our test programs. Version 1.0.13 seems to have solved the problems.Documentation
Look in:/usr/local/pkg/mpich_1.0.13/mpich/doc
EPCC/MPI And MPICH: Timings
Using the programs listed and described in Newsletter #92, here is my second pass at some basic MPI timings. In the last two weeks, we have gotten a new version of MPICH and a "workaround" from EPCC. Finally, all three test programs run successfully using both MPI products.My results using "ring.simple.f" appear in Table 1. Barbara Herron of Lawrence Livermore sent in the following comment, yesterday. She took the words right out of my mouth:
> My observations echo yours, that MPICH seems to be getting slower > with subsequent versions. However subsequent versions are fixing > more and more problems, so I suppose it is the price we pay for > getting an accurate result, or code that executes rather than hangs!
Table 1
Transfer rates (PE to PE, no ACK) in Microseconds Per REAL*4 Value Obtained from /mpp/bin/f90 "ring.simple.f"
Buffer size MPICH MPICH MPICH EPCC/MPI
(Elements) v1.0.11 v1.0.12 v1.0.13 v1.5a
===============================================
1 49.1 56.4 50.8 35.4
2 49.1 56.5 50.6 36.2
3 53.6 57.5 53.3 39.3
4 53.7 57.5 53.7 40.0
7 54.6 59.0 55.1 41.8
8 54.2 58.3 54.0 37.1
15 48.7 52.6 55.9 59.0
16 48.9 52.8 54.8 54.2
31 49.1 53.3 56.2 58.4
32 49.0 53.5 56.1 57.1
63 49.9 54.4 59.6 64.6
64 50.2 54.7 58.3 61.2
127 52.5 57.4 61.2 73.9
128 52.5 57.1 61.5 70.4
255 57.5 61.2 66.4 92.9
256 57.5 61.3 65.7 89.7
511 67.4 71.4 79.2 139.8
512 65.8 73.1 79.1 135.7
Table 2 presents timings of "ring.f". In order for EPCC/MPI to execute "ring.f" correctly, I have had to set the environment variable "MPI_SM_TRANSFER" to 0 (I left it set for the MPICH run as well, though it doesn't seem to affect MPICH codes). I described this workaround for the EPCC bug in last week's Newsletter.
Table 2
Transfer rates (PE to PE, no ACK) in Microseconds Per REAL*4 Value Obtained from /mpp/bin/f90 "ring.f"
[ MPI_SM_TRANSFER==0 ]
Buffer Size MPICH EPCC/MPI
(Elements) v1.0.13 v1.5a
===============================
1 53.8 88.2
2 53.8 87.6
3 66.5 93.5
4 55.8 88.7
7 67.2 94.2
8 56.6 89.0
1 53.8 88.2
2 53.8 87.6
3 66.5 93.5
4 55.8 88.7
7 67.2 94.2
8 56.6 89.0
15 68.1 95.5
16 57.9 105.1
31 69.9 112.0
32 60.3 107.3
63 73.1 117.0
64 65.8 112.1
127 69.8 127.2
128 69.6 121.5
255 73.9 148.3
256 74.6 140.2
511 85.9 192.7
512 83.7 176.5
1023 104.1 273.3
1024 104.0 240.5
2047 147.8 443.7
2048 141.8 372.0
4095 220.1 794.2
4096 208.5 642.4
With MPICH v1.0.13, I can now run "ring.real8.f" to completion (earlier MPICH releases crashed or hung on the 511 buffer).
Table 3
Transfer rates (PE to PE, no ACK) in Microseconds Per REAL*8 Value Obtained from /mpp/bin/f90 "ring.real8.f"
Buffer Size MPICH EPCC/MPI
(Elements) v1.0.13 v1.5a
===============================
1 53.8 88.2
1 53.5 37.4
2 55.2 41.8
3 55.7 41.5
4 56.2 40.8
7 57.6 43.1
8 57.2 43.1
15 59.9 61.4
16 60.0 61.8
31 63.4 64.7
32 65.3 66.1
63 69.3 73.2
64 69.4 72.4
127 74.7 94.7
128 74.4 93.8
255 84.9 138.8
256 84.2 139.0
511 106.1 205.3
512 105.1 206.5
1023 147.5 382.9
1024 143.7 379.5
2047 215.6 649.3
2048 220.0 646.9
4095 350.9 1202.3
4096 356.1 1197.8
Current Editors:
E-mail Subscriptions:
Ed Kornkven ARSC HPC Specialist ph: 907-450-8669 Kate Hedstrom ARSC Oceanographic Specialist ph: 907-450-8678 Arctic Region Supercomputing Center University of Alaska Fairbanks PO Box 756020 Fairbanks AK 99775-6020
-
Subscribe to (or unsubscribe from) the e-mail edition of the
ARSC HPC Users' Newsletter.
-
Back issues of the ASCII e-mail edition of the ARSC T3D/T3E/HPC Users' Newsletter are available by request. Please contact the editors.
