ARSC HPC Users' Newsletter 224, July 20, 2001
Mark Your Calendar
- August 6-24 :
- As part of ARSC Faculty Camp, several presentations on ARSC, supercomputing, and visualization will be open to the wider UAF community.
- August 9 :
- Presentations by ARSC Summer Interns.
- September 4-7 :
- Members of the IBM Advanced Computing Technology Center (ACTC) will be on site presenting training on compilers, tools and the latest IBM technologies. This will be open to ARSC SP users and prospective SP users. (If you'd like to see the aspen turn yellow in the sub-Arctic, but get out before the snow flies, now you've got an excuse. :-)
More details as they're available... both in this newsletter and on the ARSC web pages.
ps: We need BEAR stories! (There's still time for a camping trip :-)
Yukon PE Upgrade
Programming Environment 3.4 (PE3.4) on the T3E will be made the default PrgEnv on Wednesday, August 1, 2001. ARSC T3E users are encouraged to test codes under this environment by executing the command:
module switch PrgEnv PrgEnv.new
and recompiling.At the time of this switch, the current PrgEnv (PE3.3) will be retained as PrgEnv.old, and PrgEnv.3501 will become the new PrgEnv.new.
Switching Craylibs Modules / Testing SV1 FFT Routines
Once again, we find the ability to retain multiple versions of libraries and compilers, and switch between them easily, is a really nice feature.
Unresolved issues in two SV1 user codes were cleared up recently when the users switched back from the default craylibs version to craylibs 3.3.0.2. It is suspected that this is an issue with the FFT routines, but investigation is ongoing.
If you feel the need to try this, you will use the command:module switch <CURRENT CRAYLIB> craylib.3.3.0.2
where<CURRENT CRAYLIB>
depends on which programming environment you're currently loading. It's easiest to just list the component versions using the "module list" command, and look for "craylibs". For instance:
CHILKOOT$ module list
Currently Loaded Modulefiles:
1) modules 5) cf90.3.3.0.2 9) CCtoollib.3.0.1.0 13) nqe
2) craylibs.3.5.0.1 6) CC.3.3.0.2 10) cal.10.1.0.6 14) PrgEnv.new
3) craytools.3.5.0.1 7) CC_sv1.3.5.0.1 11) craytools_archive
4) cf90_sv1.3.5.0.1 8) CCmathlib.3.0.1.0 12) mpt.1.4.0.0
shows that for this user, "craylibs.3.5.0.1" is loaded. To switch, he/she would do this:
module switch craylibs.3.5.0.1 craylib.3.3.0.2
If you're compiling in an NQS job, note that you'll have to add the switch command to either the job script (above the compile or make commands) or to your .profile or .cshrc files.
Yukon Queues: Policy and Limits
To better support the testing and development of parallel programs, we have readjusted our longstanding queue policy.
You are still limited to one job (queued OR running) in the larger/longer queues. This ensures that different users' jobs will alternate.
However, you may now submit up to 3 simultaneous jobs to "small" (10 PEs x 8 hours) or any of the "Quick" queues (30 minutes). We've also increased the NQS limits on these queues. Thus, if PEs are available, you could have up to 3 "small" jobs actually running at the same time. (Make sure your jobs don't trample each others files. You can't be gauranteed they'll run to completion in the order submitted.)
If these changes cause unforeseen problems, we may have to revisit them. Stay in touch... we love feedback from users!
Here's the new text of "news queue_policy":
T3E Batch Queue Policy
======================
The T3E is a popular, limited resource. ARSC has found that T3E
users are willing to work together to provide fair access to the
queues. Please:
1) Queue or run(*) no more than one job at a time in any of the
following queues. Also, do not queue or run jobs in more than
two different queues, at the same time, from this list of
queues:
grand gcp_grand
xxlarge gcp_xxlarge
xlarge
large
medium
2) Do not queue or run more than 3 jobs at a time in any of
the queues:
small
Qgrand
Qxxlarge
Qxlarge
Qlarge
Qmedium
Qsmall
3) Do not queue or run more than five jobs at a time in
the queue:
single
(*) "Queue or run" means having a job showing the status "Q"
or "R" in the output from "qstat -a".
As an example, if user "goodman" submitted a job which ended up
in the "large" queue, he/she would not submit another to this
queue until the first had run to completion.
Meanwhile, "goodman" could also have one job in the "medium"
queue, two in the "Qsmall" queue and even a couple in "single."
In general, try to use as few processors as are necessary
and be flexible in the number of processors with which your codes
can run. This tends to increase the overall throughput, scheduling
efficiency, and number of people able to use the system at a given
time.
Contact User Services (consult@arsc.edu or 907-450-8602) if you
have any questions concerning this policy. Also, please contact
us if you feel that the queues are being misused, and we will work
to resolve the situation. ARSC may hold or delete jobs that are
submitted in violation of this policy.
Addendum:
=========
There are 2 ways to work within the existing policy that can
* reduce time spent overseeing your work (i.e. logging in to
execute yet another qsub command.)
* effectively extend the runtime limits of submitted jobs
1) Job chaining:
Submit the next queue-limit-defined section of your job when the
previous one finishes, within the same script. This is the best
option as job order is preserved, wait time is minimized, and you
don't run the risk of violating the 1 job per queue policy.
Job chaining is described in more detail in T3E newsletter #176 at
/arsc/support/news/t3enews/t3enews176/index.xml
2) #QSUB -a option in your NQS script:
If job chaining isn't possible, you can submit several jobs at once,
using #QSUB -a to set the time NQS will begin queuing your requests
so queued/running time doesn't overlap with other jobs belonging to
you. Using qstat -a, you will need to periodically check your job
list to insure that only one job has a Q,C,H, or R-status at a time
(Our policy does not limit the number of jobs in W-status, waiting
to be queued.)
To avoid violating the 1 job per queue policy with this method you
will need to specify that job #2 will start quite a long time --
preferably about 24 hours -- after job #1 to reduce the chance that
they will overlap if job #1 is delayed due to system maintenance or
checkpointing. (See news holding_jobs.)
Use of #QSUB -a is described in T3E newsletter #184 at
/arsc/support/news/t3enews/t3enews184/index.xml
Quick-Tip Q & A
A:[[ grep ITSELF shows up whenever I grep the output from ps.
[[ For instance,
[[
[[ ICEHAWK1$ ps -aef
grep xloadl
[[ mortimer 15172 20020 1 11:27:42 pts/0 0:00 grep xloadl
[[ mortimer 19344 20020 0 Jun 26 pts/0 3:09 xloadl
[[
[[ I know I'm running grep, so why should grep tell me (like, would I
[[ grep grep?)? How can I get rid of this. Can you switch it off?
# Derek Bastille, Rich Hickey, and Richard Griswold all gave this
# answer:
ps -aef
grep xloadl
grep -v grep
# Here's Richard's explanation:
Grep is simply returning all lines that match the string "xloadl". Since
the command "grep xloadl" shows up in the ps output, grep will return that
line. Grep doesn't know that you are looking at ps output and that it is
supposed to ignore itself. To do this use the -v flag, which tells grep
to ignore all matching input.
# Kate Hedstrom suggested an alias:
alias psg 'ps -aux
grep \!*
grep -v grep'
# David Gever provides perhaps the cleanest solution:
ps -e
grep xloadl
# with explanation:
On the grep question, I found that "ps -e" will not list the grep
process itself, while the "ps -ef" will. This is true on a SUN
workstation, but may apply in the user's platform too.
Q: I had some .lst and .lst.gz files, all of which I wanted to
hide in a separate subdirectory. I used "mv" (the cryptic Unix
alternative, it seems, to click and drag). It seems to have worked,
but "mv" complains, and I don't know if I should worry about this.
termite$ ll
total 10
drwx------ 2 morty groop 4096 Jul 12 17:21 lst_files/
-rw------- 1 morty groop 34129 Jul 12 17:20 ptsv.f
-rw------- 1 morty groop 20011 Jul 12 17:20 ptsv_gen.f
-rw------- 1 morty groop 791991 Jul 12 17:20 ptsv_gen.lst.gz
-rw------- 1 morty groop 10233 Jul 12 17:20 ptsv.lst.gz
-rw------- 1 morty groop 550 Jul 12 11:10 vgen.lst
termite$
termite$ mv *lst* lst_files
lst_files - Invalid argument
termite$
termite$ ll
total 8
drwx------ 2 morty groop 4096 Jul 12 17:21 lst_files/
-rw------- 1 morty groop 34129 Jul 12 17:20 ptsv.f
-rw------- 1 morty groop 20011 Jul 12 17:20 ptsv_gen.f
termite$
termite$ ll lst_files
total 2
-rw------- 1 morty groop 791991 Jul 12 17:20 ptsv_gen.lst.gz
-rw------- 1 morty groop 10233 Jul 12 17:20 ptsv.lst.gz
-rw------- 1 morty groop 550 Jul 12 11:10 vgen.lst
termite$
[[ Answers, Questions, and Tips Graciously Accepted ]]
Current Editors:
E-mail Subscriptions:
Ed Kornkven ARSC HPC Specialist ph: 907-450-8669 Kate Hedstrom ARSC Oceanographic Specialist ph: 907-450-8678 Arctic Region Supercomputing Center University of Alaska Fairbanks PO Box 756020 Fairbanks AK 99775-6020
-
Subscribe to (or unsubscribe from) the e-mail edition of the
ARSC HPC Users' Newsletter.
-
Back issues of the ASCII e-mail edition of the ARSC T3D/T3E/HPC Users' Newsletter are available by request. Please contact the editors.
