ARSC HPC Users' Newsletter 224, July 20, 2001
Mark Your Calendar
- August 6-24 :
- As part of ARSC Faculty Camp, several presentations on ARSC, supercomputing, and visualization will be open to the wider UAF community.
- August 9 :
- Presentations by ARSC Summer Interns.
- September 4-7 :
- Members of the IBM Advanced Computing Technology Center (ACTC) will be on site presenting training on compilers, tools and the latest IBM technologies. This will be open to ARSC SP users and prospective SP users. (If you'd like to see the aspen turn yellow in the sub-Arctic, but get out before the snow flies, now you've got an excuse. :-)
More details as they're available... both in this newsletter and on the ARSC web pages.
ps: We need BEAR stories! (There's still time for a camping trip :-)
Yukon PE Upgrade
Programming Environment 3.4 (PE3.4) on the T3E will be made the default PrgEnv on Wednesday, August 1, 2001. ARSC T3E users are encouraged to test codes under this environment by executing the command:
module switch PrgEnv PrgEnv.newand recompiling.
At the time of this switch, the current PrgEnv (PE3.3) will be retained as PrgEnv.old, and PrgEnv.3501 will become the new PrgEnv.new.
Switching Craylibs Modules / Testing SV1 FFT Routines
Once again, we find the ability to retain multiple versions of libraries and compilers, and switch between them easily, is a really nice feature.
Unresolved issues in two SV1 user codes were cleared up recently when the users switched back from the default craylibs version to craylibs 184.108.40.206. It is suspected that this is an issue with the FFT routines, but investigation is ongoing.If you feel the need to try this, you will use the command:
module switch <CURRENT CRAYLIB> craylib.220.127.116.11where
<CURRENT CRAYLIB>depends on which programming environment you're currently loading. It's easiest to just list the component versions using the "module list" command, and look for "craylibs". For instance:
CHILKOOT$ module list Currently Loaded Modulefiles: 1) modules 5) cf18.104.22.168.2 9) CCtoollib.22.214.171.124 13) nqe 2) craylibs.126.96.36.199 6) CC.188.8.131.52 10) cal.10.1.0.6 14) PrgEnv.new 3) craytools.184.108.40.206 7) CC_sv220.127.116.11.1 11) craytools_archive 4) cf90_sv18.104.22.168.1 8) CCmathlib.22.214.171.124 12) mpt.126.96.36.199shows that for this user, "craylibs.188.8.131.52" is loaded. To switch, he/she would do this:
module switch craylibs.184.108.40.206 craylib.220.127.116.11
If you're compiling in an NQS job, note that you'll have to add the switch command to either the job script (above the compile or make commands) or to your .profile or .cshrc files.
Yukon Queues: Policy and Limits
To better support the testing and development of parallel programs, we have readjusted our longstanding queue policy.
You are still limited to one job (queued OR running) in the larger/longer queues. This ensures that different users' jobs will alternate.
However, you may now submit up to 3 simultaneous jobs to "small" (10 PEs x 8 hours) or any of the "Quick" queues (30 minutes). We've also increased the NQS limits on these queues. Thus, if PEs are available, you could have up to 3 "small" jobs actually running at the same time. (Make sure your jobs don't trample each others files. You can't be gauranteed they'll run to completion in the order submitted.)
If these changes cause unforeseen problems, we may have to revisit them. Stay in touch... we love feedback from users!
Here's the new text of "news queue_policy":
T3E Batch Queue Policy ====================== The T3E is a popular, limited resource. ARSC has found that T3E users are willing to work together to provide fair access to the queues. Please: 1) Queue or run(*) no more than one job at a time in any of the following queues. Also, do not queue or run jobs in more than two different queues, at the same time, from this list of queues: grand gcp_grand xxlarge gcp_xxlarge xlarge large medium 2) Do not queue or run more than 3 jobs at a time in any of the queues: small Qgrand Qxxlarge Qxlarge Qlarge Qmedium Qsmall 3) Do not queue or run more than five jobs at a time in the queue: single (*) "Queue or run" means having a job showing the status "Q" or "R" in the output from "qstat -a". As an example, if user "goodman" submitted a job which ended up in the "large" queue, he/she would not submit another to this queue until the first had run to completion. Meanwhile, "goodman" could also have one job in the "medium" queue, two in the "Qsmall" queue and even a couple in "single." In general, try to use as few processors as are necessary and be flexible in the number of processors with which your codes can run. This tends to increase the overall throughput, scheduling efficiency, and number of people able to use the system at a given time. Contact User Services (firstname.lastname@example.org or 907-450-8602) if you have any questions concerning this policy. Also, please contact us if you feel that the queues are being misused, and we will work to resolve the situation. ARSC may hold or delete jobs that are submitted in violation of this policy. Addendum: ========= There are 2 ways to work within the existing policy that can * reduce time spent overseeing your work (i.e. logging in to execute yet another qsub command.) * effectively extend the runtime limits of submitted jobs 1) Job chaining: Submit the next queue-limit-defined section of your job when the previous one finishes, within the same script. This is the best option as job order is preserved, wait time is minimized, and you don't run the risk of violating the 1 job per queue policy. Job chaining is described in more detail in T3E newsletter #176 at /arsc/support/news/t3enews/t3enews176/index.xml 2) #QSUB -a option in your NQS script: If job chaining isn't possible, you can submit several jobs at once, using #QSUB -a to set the time NQS will begin queuing your requests so queued/running time doesn't overlap with other jobs belonging to you. Using qstat -a, you will need to periodically check your job list to insure that only one job has a Q,C,H, or R-status at a time (Our policy does not limit the number of jobs in W-status, waiting to be queued.) To avoid violating the 1 job per queue policy with this method you will need to specify that job #2 will start quite a long time -- preferably about 24 hours -- after job #1 to reduce the chance that they will overlap if job #1 is delayed due to system maintenance or checkpointing. (See news holding_jobs.) Use of #QSUB -a is described in T3E newsletter #184 at /arsc/support/news/t3enews/t3enews184/index.xml
Quick-Tip Q & A
A:[[ grep ITSELF shows up whenever I grep the output from ps. [[ For instance, [[ [[ ICEHAWK1$ ps -aef grep xloadl [[ mortimer 15172 20020 1 11:27:42 pts/0 0:00 grep xloadl [[ mortimer 19344 20020 0 Jun 26 pts/0 3:09 xloadl [[ [[ I know I'm running grep, so why should grep tell me (like, would I [[ grep grep?)? How can I get rid of this. Can you switch it off? # Derek Bastille, Rich Hickey, and Richard Griswold all gave this # answer: ps -aef grep xloadl grep -v grep # Here's Richard's explanation: Grep is simply returning all lines that match the string "xloadl". Since the command "grep xloadl" shows up in the ps output, grep will return that line. Grep doesn't know that you are looking at ps output and that it is supposed to ignore itself. To do this use the -v flag, which tells grep to ignore all matching input. # Kate Hedstrom suggested an alias: alias psg 'ps -aux grep \!* grep -v grep' # David Gever provides perhaps the cleanest solution: ps -e grep xloadl # with explanation: On the grep question, I found that "ps -e" will not list the grep process itself, while the "ps -ef" will. This is true on a SUN workstation, but may apply in the user's platform too. Q: I had some .lst and .lst.gz files, all of which I wanted to hide in a separate subdirectory. I used "mv" (the cryptic Unix alternative, it seems, to click and drag). It seems to have worked, but "mv" complains, and I don't know if I should worry about this. termite$ ll total 10 drwx------ 2 morty groop 4096 Jul 12 17:21 lst_files/ -rw------- 1 morty groop 34129 Jul 12 17:20 ptsv.f -rw------- 1 morty groop 20011 Jul 12 17:20 ptsv_gen.f -rw------- 1 morty groop 791991 Jul 12 17:20 ptsv_gen.lst.gz -rw------- 1 morty groop 10233 Jul 12 17:20 ptsv.lst.gz -rw------- 1 morty groop 550 Jul 12 11:10 vgen.lst termite$ termite$ mv *lst* lst_files lst_files - Invalid argument termite$ termite$ ll total 8 drwx------ 2 morty groop 4096 Jul 12 17:21 lst_files/ -rw------- 1 morty groop 34129 Jul 12 17:20 ptsv.f -rw------- 1 morty groop 20011 Jul 12 17:20 ptsv_gen.f termite$ termite$ ll lst_files total 2 -rw------- 1 morty groop 791991 Jul 12 17:20 ptsv_gen.lst.gz -rw------- 1 morty groop 10233 Jul 12 17:20 ptsv.lst.gz -rw------- 1 morty groop 550 Jul 12 11:10 vgen.lst termite$
[[ Answers, Questions, and Tips Graciously Accepted ]]
Ed Kornkven ARSC HPC Specialist ph: 907-450-8669 Kate Hedstrom ARSC Oceanographic Specialist ph: 907-450-8678 Arctic Region Supercomputing Center University of Alaska Fairbanks PO Box 756020 Fairbanks AK 99775-6020
Subscribe to (or unsubscribe from) the e-mail edition of the
ARSC HPC Users' Newsletter.
Back issues of the ASCII e-mail edition of the ARSC T3D/T3E/HPC Users' Newsletter are available by request. Please contact the editors.