ARSC T3E Users' Newsletter 191, March 16, 2000
Yukon is busy! Help Us Provide More Cycles!
There are several reasons why application (APP) PEs on a T3E may be idle even when NQS jobs are queued, dying to run. One of them is addressed by a simple addition to NQS scripts.This change is so simple that we ask all ARSC users to make it. Here's the explanation:
When the last "mpprun" in your qsub script returns, the APP PEs that it was using become idle. Since this was the LAST mpprun, your script no longer needs the APP PEs. However, NQS will not release them to other jobs until your script either terminates or explicitly release them.
Please explicitly release the PEs by adding the following two lines immediately after the last "mpprun" command in your qsub scripts:
qalter -l mpp_p=0 # release parallel processors echo qsub -q mpp -eo -o /dev/null # force NQS to rescan queuesThis will release the APP processors and tell NQS to start someone else's job, while allowing your job to complete all of its normal single-processor work.
It's important to add the above lines before ANY "post-mpprun" commands. Even innocent looking commands, like "mv" or "cp", can take a very long time (if the file system is busy or the files big, for instance).
As an example, this script:
#QSUB -q mpp #QSUB -l mpp_p=50 #QSUB -l mpp_t=4:00:00 cd /u1/uaf/morris/progs mpprun -n50 ./myprog data1 mv restart.* ../RESTART/ cat myprog.log >> run.logwould be modified to this:
#QSUB -q mpp #QSUB -l mpp_p=50 #QSUB -l mpp_t=4:00:00 cd /u1/uaf/morris/progs mpprun -n50 ./myprog data1 qalter -l mpp_p=0 # release parallel processors echo qsub -q mpp -eo -o /dev/null # force NQS to rescan queues mv restart.* ../RESTART/ cat myprog.log >> run.logNOTES:
-
Here's an explanation of the command,
echo qsub -q mpp -eo -o /dev/null # force NQS to rescan queuesIt is required because "qalter" doesn't do the obvious and tell NQS to check for waiting jobs. The command submits a job which will run, but have no effect except to awaken NQS. Here's what the options do:- " -q mpp ", in the absence of any " -l " options, runs the job in yukon's single queue.
- " -eo " causes the script's STDERR to be merged with its STDOUT.
- " -o /dev/null " sends the STDOUT to /dev/null (instead of creating a " .o " file in your directory).
The " echo | " pipes a new-line character to the STDIN of "qsub". This is needed because " qsub ," if no script file is given, expects input from STDIN. In the absence of the " echo | " the command will hang, waiting for input.
-
For anyone who is "chaining" jobs, the above is an alternate method for submitting the required "do_nothing" job,we described in newsletter #176 (
/arsc/support/news/t3enews/t3enews176/index.xml
).
Also, if you're chaining, please move the above pair of lines from the end of your script (if that's where it is now) to immediately following the last " mpprun ".
- Please send us feedback on yukon throughput. We hope to see improvements if everyone honors the above request. Also, let us know if you have other ideas for improving efficiency.
Faster T3E Code
Several ARSC users have recently reported significant improvements by using the split2 option on the f90 compiler.This option splits loops to make better use of the streams into memory hardware. In some cases there is no gain, other users report a 10-20% improvement. As with any higher level of optimization, you should perform test cases, both in terms of performance changes and in verifying results are still correct.
As discussed in the article, "CF90 Optimization Options," in newsletter, #127,
/arsc/support/news/t3enews/t3enews127/index.xml
an aggressive set of options to try with the f90 compiler might be:-O3,aggress,unroll2,pipeline2,split2
Three good books for spring reading
The Clockwork Muse: A practical guide to writing theses, dissertations, and books. Eviatar Zerubavel. Havard University Press. ISBN 0-674-13586-5.
The Nature of Mathematical Modeling. Neil Gershenfeld. Cambridge University Press. ISBN 0-521-57095-6.
Structured Adaptive Mesh Refinement (SAMR) Grid Methods. Scott GBaden, Nikos Chrisochoides, Dennis Gannon, Michael Norman. Springer. ISBN 0-387-98921-8.
Reading the above set should give you ideas on how to improve your algorithms and tell people how you did it in time for the summer conference season.CUG SUMMIT 2000 Preliminary Program Now On-Line
The complete Preliminary Program for the upcoming CUG SUMMIT 2000 in Noordwijk is on-line.It is accessible from the CUG home page at http://www.cug.org/
from the European server at http://cug2000.sara.nl/ (Noordwijk CUG SUMMIT 2000 Home Page)
from the US server at http://www.fpes.com/cug2000/ (CUG Office CUG SUMMIT 2000 Home Page)
Arctic Climate Modelers to Meet in Fairbanks, September 2000
From an announcement we received recently:> ARC-MIP, the Arctic Regional Climate Model Intercomparison Project will > hold its first meeting at Fairbanks, Alaska, USA, on September 13-15. > This meeting will coincide with the WCRP ACSYS Numerical > Experimentation Group meeting which will be held the same week on > September 11-12. > > In ARC-MIP, models developed by research teams from Europe, Australia, > USA, and Canada are invited to perform a common set of simulations over > two common domains: one that covers much of the Arctic Ocean, and a > second that concentrates at higher grid resolution over the western > Arctic corresponding to the location of the SHEBA ice camp, > > http://sheba.apl.washington.edu/. > > Participation in the workshop and the ARC-MIP project is open. Modelers > and observationalists are invited to attend this workshop. The first > workshop will be rather exploratory. Difficulties encountered in > modeling the Arctic (clouds, surface schemes, dynamics, etc.) will be > discussed as well as observations taken during the SHEBA experiment. > Issues of funding will be discussed. The common simulations to be > performed will also be defined during the workshop. Oral presentations > by workshop participants are welcome. > > A preliminary ARC-MIP web site is located at: > > http://cires.colorado.edu/lynch/workshop/
Quick-Tip Q & A
A: {{ I want to share some directories with members of my group, giving
{{ them write access. This command:
{{
{{ chmod -R g+rwX ~
{{
{{ would work, but I don't want anyone messing with my dot files.
{{ How can I share a directory with my group?
Make your home directory group-executable (only). Group members
will be able to "cd" into your home directory, but neither see nor
change anything (thus, your dot files will be safe). This command
would do it:
chmod g=x ~
Next, give your group rx or rwx permission on the desired
subdirectories and files. You might create a subdirectory,
"GROUP", and store in it everything needed by the group. The
following command adds group-read and -write permission to
everything below GROUP and adds group-execute to everything below
GROUP that was owner-execute.
chmod -R g+rwX ~/GROUP
With this set-up, group members can cd through the "blind" home
directory into GROUP, assuming they know in advance of its
existence. Once in GROUP they have full access.
(By the way, it's a security policy violation at ARSC to give write
access to your home directory, except to yourself.)
Q: I'm in four Unix permission groups:
yukon$ groups
hyprfast bigfoot wulfdown marsbar
Whenever I create a new file, it's in "hyprfast," but I've left
those guys behind. So I'm always running, or forgetting to run,
chgrp:
yukon$ chgrp wulfdown new.file.out
How can I change my default group to "wulfdown"?
[ Answers, questions, and tips graciously accepted. ]
Current Editors:
E-mail Subscriptions:
Ed Kornkven ARSC HPC Specialist ph: 907-450-8669 Kate Hedstrom ARSC Oceanographic Specialist ph: 907-450-8678 Arctic Region Supercomputing Center University of Alaska Fairbanks PO Box 756020 Fairbanks AK 99775-6020
-
Subscribe to (or unsubscribe from) the e-mail edition of the
ARSC HPC Users' Newsletter.
-
Back issues of the ASCII e-mail edition of the ARSC T3D/T3E/HPC Users' Newsletter are available by request. Please contact the editors.
