| Newsletter Index | Quick-Tip Index | Search Newsletters |
With this newsletter, we have produced one issue per node of the ARSC T3D. I wonder if we will retire yukon after another 104 issues.
At any rate, it's true. The ARSC T3D, which, on February 28, 1994 was the 6th T3D brought on-line, is to be disconnected. This will occur during test time on November 18th.
Users are welcome to continue using the T3D through the 18th, but if you haven't done so yet, please port to yukon immediately, verify that your code compiles and runs correctly, and make cross-comparisons. ARSC is providing special training to help you (next article) and consultants are available at: .
November 5th, (1/2 day 1300-1600), in Fairbanks
This course is targeted at current users of the Cray T3D system who will be migrating codes to the new Cray T3E system. The course will discuss the following:
At the end of this course attendees will be in a better position to migrate software from the Cray T3D to the Cray T3E.
December 10th, (1/2 day + 1/2 day hands-on session), in Fairbanks
The purpose of this course is to describe the range of tools currently available on the ARSC T3E system. There are a number of tools installed which can help users develop efficient portable, parallel programs. These include:
The course will cover the following aspects of behaviour,
Several case studies from current users will illustrate these tools and the potential benefits in terms of reduced programming effort and improved performance.
This course is aimed at existing users of the ARSC parallel systems. Users are encouraged to bring problems for discussion and investigation during the afternoon hands on session when consultants from ARSC will be available to help with individual problems.
To register for these, and other ARSC courses, please follow the instructions at our training page:
http://www.arsc.edu/user/Classes.html
[ Don Morton of the University of Montana contributes this article. It shows the performance of one code measured across ARSC's recent T3E upgrade and with streams on and off. ]
Application: Adaptive finite element code for modeling two-phase oil/water flow in porous media.
Using a "heterogeneous" SPMD approach, one processor is responsible for dynamically modifying the mesh at specified intervals (typically every time step) to place smaller elements in regions of current activity and larger elements elsewhere. The mesh modification step also includes a load-balancing algorithm for processor assignments.
The mesh is then partitioned out to other processors for a parallel solution of the next time step. Because the equations are nonlinear, a single time step may involve several iterations for convergence. When the parallel solution has been obtained, it's shipped back to the "mesh modification processor" for another round of mesh modification and solution of the next time step.
Results, below, show the time (in seconds) required for two of the time steps. It was run on 16 PE's (1 PE for mesh modification, 15 for the parallel solution). PVM is used for the message-passing. In the first time step, there were 5320 unknowns (degrees of freedom) and in the second, 5336 unknowns.
Mesh Mod Time Mesh Transfer Time Total Time Step
Time Step (secs) (secs) (secs)
========= ============= ================== ==============
[A]
1 1.33 0.27 9.08
2 1.32 0.26 7.08
[B]
1 2.52 0.26 11.80
2 2.36 0.27 10.07
[C]
1 3.02 0.34 14.92
2 2.86 0.34 11.36
[A] ARSC 450MHz T3E with streams enabled,
[B] ARSC 450MHz T3E without streams enabled,
[C] ARSC T3E before upgrade: (300MHz, without streams enabled).
------------------------------------------------------------------------
The speed up due to configuration changes [A] and [B] above, relative
to [C] are:
Time Step Relative Run-Time Speed-Up
========= ================= ========
[A]
1 61% 1.64x
2 62% 1.61x
[B]
1 79% 1.27x
2 89% 1.12x
[ Editors' request: If you have similar data for your application, please send it in! ]
Programming Environment 3.0 (PE3.0) for the T3E was made the default during test time on October 14, 1997. By simply recompiling all objects and re-linking using the default, you will obtain the latest compilers and libraries, which correct some known system stability problems.
<< All yukon users requested to recompile and re-link all code. >>
For the purpose of comparison, you may restore the previous PE by executing the command:
module switch PrgEnv PrgEnv.old
And, having done so, would switch back to the PE3.0 default by executing:
module switch PrgEnv.old PrgEnv
Many T3D/T3E users will already be familiar with the features of Apprentice, a tool which provides comprehensive information about the performance of programs through a powerful graphical interface. Apprentice presents low-level information, reporting on each line of the source code and providing a great deal of information on particular aspect of performance, such as memory loads and stores and calling trees.
The down-side of Apprentice is the cost of this information. The run-time of programs increases by a factor of 3-4 when compiled with Apprentice enabled and the graphical interface requires a good connection to the host system(*).
PAT is a simpler, less intrusive, text-based alternative which provides information on basic aspects of performance, such as:
PAT can be used with C, C++, and Fortran 90 programs. Users simply re-link with the PAT run-time library and pass the pat.cld loader directive as shown below.
yukon$ cc *.o -l pat pat.cld -o a.out
yukon$ CC *.o -l pat pat.cld -o a.out
yukon% f90 *.o -l pat pat.cld -o a.out
After running the resulting executable, a pdf.<nnnn> file will have been generated from which the user can either extract specific information using PAT's command line or interactive mode.
The command line interface allows users to employ PAT within batch jobs to obtain performance information on long runs with realistic data sets. The interactive mode allows general investigation of program performance.
Our experience with a few applications shows little impact on overall program run-time compared against normal execution. It should be noted that PAT periodically samples the program counter so results are a statistical estimate of the time spent in each subroutine/function. This is the same approach taken by HPM, the hardware performance monitor, on Cray vector machines, and in most cases is perfectly adequate.
The following output shows two of PAT's capabilities on a 19 PE run of a locally parallelised seismic hazard code.
pat -p gives a profile of the code showing the percent of time spent in each subroutine. The final column gives a measure of the statistical confidence of the sampled measurements.
yukon% pat -p ./a.out pdf.75028
Profile Information:
Percent 90% Conf.
Interval
PSTEP 55% 0.1
GTRAN 9% 0.1
MPI_Bcast 8% 0.1
IGTRAN 3% 0.0
RADFG 3% 0.0
RADBG 3% 0.0
RADF3 2% 0.0
RADB3 2% 0.0
RFFTF1 1% 0.0
D3FFT 1% 0.0
_T3EMPI_unbalanced_tree 1% 0.0
D2FFT 1% 0.0
pat -h <subroutine name> gives information on the load balance across the processors for the named subroutine. The following output shows that processors 16-18 have relatively little work to do for PSTEP.
yukon% pat -h PSTEP ./a.out pdf.74322
Load Balance Histogram for PSTEP
--------------------------------------------------
0 |****|****|****|****|****|****|****|****|****|****|
--------------------------------------------------
1 |****|****|****|****|****|****|****|****|****|****|
--------------------------------------------------
2 |****|****|****|****|****|****|****|****|****|****|
--------------------------------------------------
3 |****|****|****|****|****|****|****|****|****|****|
--------------------------------------------------
4 |****|****|****|****|****|****|****|****|****|****|
--------------------------------------------------
5 |****|****|****|****|****|****|****|****|****|****|
--------------------------------------------------
6 |****|****|****|****|****|****|****|****|****|****|
--------------------------------------------------
7 |****|****|****|****|****|****|****|****|****|****|
--------------------------------------------------
8 |****|****|****|****|****|****|****|****|****|****|
--------------------------------------------------
9 |****|****|****|****|****|****|****|****|****|****|
--------------------------------------------------
10 |****|****|****|****|****|****|****|****|****|****|
--------------------------------------------------
11 |****|****|****|****|****|****|****|****|****|****|
--------------------------------------------------
12 |****|****|****|****|****|****|****|****|****|****|
--------------------------------------------------
13 |****|****|****|****|****|****|****|****|****|****|
--------------------------------------------------
14 |****|****|****|****|****|****|****|****|****|****|
--------------------------------------------------
15 |****|****|****|****|****|****|****|****|****|****|
--------------------------------------------------
16 |* | | | | | | | | | |
--------------------------------------------------
17 |* | | | | | | | | | |
--------------------------------------------------
18 |* | | | | | | | | | |
--------------------------------------------------
There also exist a number of routines so users can create trace files of specific activity to satisfy many individual reporting requirements. For details: man pat.
* While the graphical interface of Apprentice requires a good
connection to the host system, it does have a -r option which
prints a textual report to standard output (stdout) that summarizes
program performance information. This report gives the total time
of inclusive and exclusive subroutines and breaks down the time
into time spent in overhead, in parallel work, and in I/O.
A: {{ In Unix, how can you list the contents of the current directory,
and the contents of every subdirectory within the current
directory, but not descend any deeper into the depths of the tree
of subdirectories? }}
# This was a tricky one! You might use "find . -prune ..." or
# "ls `ls`", but to be concise, try this:
ls *
Q: Fortran 90 programmers: what does this do, and why?
cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc
program my_other_car_is_a_dog_sled
integer j(10)
do 100 i = 1, 10
j(i) = i
100 continue
do 200 i = 1. 10
j(i) = j(i) + i * 100
200 continue
print*, j
end
cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc
[ Answers, questions, and tips graciously accepted. ]
Contact:
Donald Bahls ARSC User Consultant ph: 907-450-8674 Ed Kornkven ARSC HPC Specialist ph: 907-450-8669 Arctic Region Supercomputing Center University of Alaska Fairbanks PO Box 756020 Fairbanks AK 99775-6020
Send comments and questions to the current editors using this Contact Form.E-mail Subscriptions:
| Newsletter Index | Quick-Tip Index | Search Newsletters |
Arctic Region Supercomputing Center
PO Box 756020, Fairbanks, AK 99775 | voice: 907-450-8600 | email:
home | search | about | support | news | science | resources