ARSC HPC Users' Newsletter 216, March 23, 2001
- SV1e Upgrade
- UAF Colloquium Series: Burton Smith, March 29
- Parallel Programming Course, Next Wednesday
- Using "segldr" to Specify Cray-Optimized Routines
- Quick Tip
During facility downtime on Mar 31 - Apr 1, 2001, the processors on chilkoot will be upgraded to Cray's new SV1e processors.
Chilkoot will be the first SV1e in production
The SV1e will have a faster clock and enhancements to cache and scalar processing.
ARSC users, note the extended downtime, the weekend of Mar 31-Apr 1: see "news downtime" and "news PE3.5" for more details and scheduling.
We encourage you to test your codes under PE3.5 and UNICOS 10.0.1.0 (which is to be installed, Monday) as soon as possible. This will help verify this software upgrade, prior to the processor upgrade.
As noted in "news PE3.5", to use PE3.5 effectively, you MUST set the environment variable, TARGET , to cray-sv1 .
UAF Colloquium Series: Burton Smith, March 29
The UAF Department of Mathematical Sciences and ARSC are jointly sponsoring a Mathematical Modeling, Computational Science, and Supercomputing Colloquium Series. The schedule and abstracts for the '00-'01 academic year are available at:
How Shall We Program High Performance Computers? Dr. Burton Smith Chief Scientist Cray Incorporated
Date: Thursday, March 29, 2001 Time: 1:00-2:00 PM Location: Butrovich 109ABSTRACT:
Uniprocessor computer architecture has traditionally been motivated by programming languages and operating systems, with benchmarks written in the usual languages also having some influence. In high performance computing the situation is curiously reversed, with architecture determining the principal characteristics of programming languages, operating systems, and benchmarks. The result has been chaos; a "software crisis" has been declared, and better tools for the development of parallel software have been demanded. The outlook for good tools is bleak without a new approach to the problem, which should include the engineering of computer systems with both system and application software in mind and the development of programming abstractions that are both effective and efficient on hardware we can build.THE SPEAKER:
Burton Smith is Chief Scientist of Cray Inc. He received the BSEE from the University of New Mexico in 1967 and the Sc.D. from MIT in 1972. From 1985 to 1988 he was Fellow at the Supercomputing Research Center of the Institute for Defense Analyses in Maryland. Before that, he was Vice President, Research and Development at Denelcor, Inc. and was chief architect of the HEP computer system. Dr. Smith is a Fellow of both the ACM and the IEEE, and winner of the IEEE-ACM Eckert-Mauchly award in 1991. His main interest is general purpose parallel computer architecture.
Parallel Programming Course, Next WednesdayARSC Training: Parallel Computing Concepts Wednesday, March 28, 2001, 2-4pm
In this course, Jeff McAllister will introduce parallel computing concepts, and message passing algorithms (using MPI) for new and existing codes.For details and registration:
Using "segldr" to Specify Cray-Optimized Routines
While working with a user code on the SV1, we discovered, by scanning the loader's cautionary messages, that it contained explicit code for several standard scientific subroutines that are also available in Cray's optimized libraries. The subroutines may have been copied from Numerical Recipes, and presumably their inclusion guarantees portability.
The reason people use Crays, however, is performance.
Once you've ported your code, and it's running correctly, you can worry about performance. An important step is to realize that generically coded routines just don't beat tuned vendor libraries. We've mentioned this several times in the past. For example, in T3D Newsletter issue #96 , appeared the article
18:1 Speedup Demonstrated: Free for Using System Libraries
Here's a result from that article, where the times are in seconds for performing an identical matrix multiply on the T3D:
Timings for the code are as follows: Time for naive matrix multiplication: 46.09 Time for structured matrix multiplication: 34.33 Time for BLAS1 matrix multiplication: 11.18 Time for BLAS3 SGEMM matrix multiplication: 2.52
Given a large code, the practical questions become:
- How can I determine if subroutines are duplicated?
- If so, how can I tell the loader to use the vendor versions?
The Cray loader, "segldr," cautions you if a module which appears in one of your object, or ".o" files, is duplicated in either another object file or in a library (".a" file). Thus, you simply need to review the the warnings already issued by the loader.
For a manageable test program, I've used a Fortran77 version of the linpacks benchmark, with the array size increased to 400x400. The source is a single file, and it includes hand-written versions of several common matrix/vector operations.
Here's the compile command:
CHILKOOT$ f90 -o linpacks400 linpacks400.f
And the warnings from segldr:
ldr-290 f90: CAUTION Duplicate entry point 'SGEFA' was encountered. Entry in module 'SGEFA' from file 'linpacks400.o' has been used. Entry in module 'SGEFA' from file '/opt/ctl/craylibs/18.104.22.168/libsci.a' has been ignored. ldr-290 f90: CAUTION Duplicate entry point 'SAXPY' was encountered. Entry in module 'SAXPY' from file 'linpacks400.o' has been used. Entry in module 'SAXPY' from file '/opt/ctl/craylibs/22.214.171.124/libsci.a' has been ignored. ldr-290 f90: CAUTION Duplicate entry point 'SDOT' was encountered. Entry in module 'SDOT' from file 'linpacks400.o' has been used. Entry in module 'SDOT' from file '/opt/ctl/craylibs/126.96.36.199/libsci.a' has been ignored. [[ ...4 similar warnings cut... ]]
To reload with the libsci version, you need to issue directives to segldr. The cleanest method I've found is to create a directives file. It takes two directives, LIB and MODULES.
"LIB=<library file>" seems to be required, even if "-l" is used on the command line. It tells segldr what libraries to load.
"MODULES=<module name>:<library file>" tells segldr to load <module name> from <library file> , even if it encounters duplicate copies of <module name> .
Here's the file of directives I created for the linpacks test:File: segldr.dir
LIB=/opt/ctl/craylibs/188.8.131.52/libsci.a MODULES=SDOT:/opt/ctl/craylibs/184.108.40.206/libsci.a MODULES=SGEFA:/opt/ctl/craylibs/220.127.116.11/libsci.a MODULES=SGESL:/opt/ctl/craylibs/18.104.22.168/libsci.a MODULES=SMXPY:/opt/ctl/craylibs/22.214.171.124/libsci.a MODULES=SSCAL:/opt/ctl/craylibs/126.96.36.199/libsci.a MODULES=SAXPY:/opt/ctl/craylibs/188.8.131.52/libsci.a LIB=/opt/ctl/craylibs/184.108.40.206/libu.a MODULES=ISAMAX:/opt/ctl/craylibs/220.127.116.11/libu.a
Having said this is "clean", I realize it's pretty ugly. When processing the files specified in the MODULE directives, segldr doesn't honor search paths. Thus, complete paths are required.
To use the segldr directive file, you must pass the option, "-i <directives file name>" to segldr. Here's one approach, showing how to invoke segldr directly:
$ f90 -c linpacks400.f $ segldr -i segldr.dir linpacks400.o -o linpacks400.libsci
Here's another approach, which uses f90's "-Wl..." option to pass options through to segldr:
$ f90 -o linpacks400.libsci linpacks400.f -Wl"-i segldr.dir"
Either approach will work, and you'd probably chose that which introduces the fewest changes into an existing makefile.
When executed, segldr now issues different warnings:
ldr-162 f90: CAUTION The loader has ignored duplicate module 'SGEFA' from file 'linpacks400.o'. ldr-162 f90: CAUTION The loader has ignored duplicate module 'SGESL' from file 'linpacks400.o'. ldr-162 f90: CAUTION The loader has ignored duplicate module 'SAXPY' from file 'linpacks400.o'. ldr-162 f90: CAUTION The loader has ignored duplicate module 'SDOT' from file 'linpacks400.o'. ldr-162 f90: CAUTION The loader has ignored duplicate module 'SSCAL' from file 'linpacks400.o'. ldr-162 f90: CAUTION The loader has ignored duplicate module 'ISAMAX' from file 'linpacks400.o'. ldr-162 f90: CAUTION The loader has ignored duplicate module 'SMXPY' from file 'linpacks400.o'.
These warnings tell us all we need for this simple program: the hand-coded routines from linpacks400.f have been ignored, which implies they were loaded, as we specified explicitly in the directives, from libsci.
If you want more information, and to blow your mind, tell segldr to dump a load map. This option, "-M loadmap.out,epxrf" , will dump entry point cross references to the file "loadmap.out" , which can be browsed or grepped for specific modules, to find out from whence they came.
So... did it speed things up to substitute the cray libraries? Quite a bit. Here's the hpm output for both versions:Hand-coded matrix/vector routines (linked without segldr directives):
Group 0: CPU seconds : 8.34398 CP executing : 2503194174 Million inst/sec (MIPS) : 66.21 Instructions : 552483791 Avg. clock periods/inst : 4.53 % CP holding issue : 70.92 CP holding issue : 1775227364 Inst.buffer fetches/sec : 0.02M Inst.buf. fetches: 166592 Floating adds/sec : 69.10M F.P. adds : 576527734 Floating multiplies/sec : 67.73M F.P. multiplies : 565107088 Floating reciprocal/sec : 0.00M F.P. reciprocals : 20792 Cache hits/sec : 83.61M Cache hits : 697681792 CPU mem. references/sec : 217.35M CPU references : 1813602060 Floating ops/CPU second : 136.82MCray libsci routines (with segldr directives to use libsci):
Group 0: CPU seconds : 3.48794 CP executing : 1046381505 Million inst/sec (MIPS) : 65.88 Instructions : 229774910 Avg. clock periods/inst : 4.55 % CP holding issue : 71.06 CP holding issue : 743524588 Inst.buffer fetches/sec : 0.03M Inst.buf. fetches: 114008 Floating adds/sec : 165.95M F.P. adds : 578839265 Floating multiplies/sec : 161.54M F.P. multiplies : 563429095 Floating reciprocal/sec : 0.01M F.P. reciprocals : 30984 Cache hits/sec : 10.44M Cache hits : 36414633 CPU mem. references/sec : 99.10M CPU references : 345654132 Floating ops/CPU second : 327.50M
More information on segldr is available in "man segldr" and in Cray's on-line documentation:
Quick-Tip Q & A
A: [[ What's up with this? [[ [[ chilkoot% rm ldat.199801.* [[ Arguments too long. [[ chilkoot% [[ chilkoot% ls ldat.199801.* [[ Arguments too long. Thanks to Terry Jones: ===================================================== Check to see how many files with names beginning with ldat.199801: (do an "ls grep ldat.199801 wc -l" or something equivalent). If it is a very large number of files, the problem is likely to be that with the wildcard (*), the list of files that this expression matches is simply too large. I cannot say how large is too large, but the shell will expand the wildcard and generate a list of files. The rm and ls commands will get called with this list attached to the end of the command line (e.g., "ls file.*" might expand to "ls file.1 file.2 file.3", and so on). If the list is longer than the shell can handle (I am sure there is a max_list parameter in there somewhere!), then the error message above is what is displayed. I've had this happen before, when manipulating large quantities of files all named the same, located in the same directory. Thanks to Richard Griswold: ======================================================= When ldat.199801.* is expanded, there are too many characters for the shell to handle. Instead you can do something like this and avoid the limitations of the shell: \ls grep "^ldat\.199801\..*" Since I alias ls to "ls -al", I use a backslash in front of the ls command to bypass the alias. This will get the default behavior instead of the aliased behavior. This command will get you a list of the files. If you want to delete them, simply pipe the output to "xargs rm": \ls grep "^ldat\.199801\..*" xargs rm If you want a long listing, you can pipe the output to "ls -l": \ls grep "^ldat\.199801\..*" xargs ls -l You can even rename the output: \ls grep "^ldat\.199801\..*" sed "s/\(.*\)/mv '\1' 'myfiles.\1'/" \ (sh) Of course, you'll need some additional code inside the sed script to handle single quotes in your file names. Q: Is there a way to peek at my NQS job's stdout and stderr files (.o and .e files), while the job is still running? I'm in debug mode, here, and wasting a lot of CPU time because these jobs must run to completion before I can see ANY output.
[[ Answers, Questions, and Tips Graciously Accepted ]]
Ed Kornkven ARSC HPC Specialist ph: 907-450-8669 Kate Hedstrom ARSC Oceanographic Specialist ph: 907-450-8678 Arctic Region Supercomputing Center University of Alaska Fairbanks PO Box 756020 Fairbanks AK 99775-6020
Subscribe to (or unsubscribe from) the e-mail edition of the
ARSC HPC Users' Newsletter.
Back issues of the ASCII e-mail edition of the ARSC T3D/T3E/HPC Users' Newsletter are available by request. Please contact the editors.