ARSC HPC Users' Newsletter 224, July 20, 2001

Mark Your Calendar

August 6-24 :
As part of ARSC Faculty Camp, several presentations on ARSC, supercomputing, and visualization will be open to the wider UAF community.
August 9 :
Presentations by ARSC Summer Interns.
September 4-7 :
Members of the IBM Advanced Computing Technology Center (ACTC) will be on site presenting training on compilers, tools and the latest IBM technologies. This will be open to ARSC SP users and prospective SP users. (If you'd like to see the aspen turn yellow in the sub-Arctic, but get out before the snow flies, now you've got an excuse. :-)

More details as they're available... both in this newsletter and on the ARSC web pages.

ps: We need BEAR stories! (There's still time for a camping trip :-)

/arsc/support/news/hpcnews/hpcnews223/index.xml

Yukon PE Upgrade

Programming Environment 3.4 (PE3.4) on the T3E will be made the default PrgEnv on Wednesday, August 1, 2001. ARSC T3E users are encouraged to test codes under this environment by executing the command:

module switch PrgEnv PrgEnv.new

and recompiling.

At the time of this switch, the current PrgEnv (PE3.3) will be retained as PrgEnv.old, and PrgEnv.3501 will become the new PrgEnv.new.

Switching Craylibs Modules / Testing SV1 FFT Routines

Once again, we find the ability to retain multiple versions of libraries and compilers, and switch between them easily, is a really nice feature.

Unresolved issues in two SV1 user codes were cleared up recently when the users switched back from the default craylibs version to craylibs 3.3.0.2. It is suspected that this is an issue with the FFT routines, but investigation is ongoing.

If you feel the need to try this, you will use the command:

module switch <CURRENT CRAYLIB> craylib.3.3.0.2

where

<CURRENT CRAYLIB>

depends on which programming environment you're currently loading. It's easiest to just list the component versions using the "module list" command, and look for "craylibs". For instance:

  CHILKOOT$ module list
  Currently Loaded Modulefiles:
    1) modules              5) cf90.3.3.0.2         9) CCtoollib.3.0.1.0   13) nqe
    2) craylibs.3.5.0.1     6) CC.3.3.0.2          10) cal.10.1.0.6        14) PrgEnv.new
    3) craytools.3.5.0.1    7) CC_sv1.3.5.0.1      11) craytools_archive  
    4) cf90_sv1.3.5.0.1     8) CCmathlib.3.0.1.0   12) mpt.1.4.0.0        
shows that for this user, "craylibs.3.5.0.1" is loaded. To switch, he/she would do this:

module switch craylibs.3.5.0.1 craylib.3.3.0.2

If you're compiling in an NQS job, note that you'll have to add the switch command to either the job script (above the compile or make commands) or to your .profile or .cshrc files.

Yukon Queues: Policy and Limits

To better support the testing and development of parallel programs, we have readjusted our longstanding queue policy.

You are still limited to one job (queued OR running) in the larger/longer queues. This ensures that different users' jobs will alternate.

However, you may now submit up to 3 simultaneous jobs to "small" (10 PEs x 8 hours) or any of the "Quick" queues (30 minutes). We've also increased the NQS limits on these queues. Thus, if PEs are available, you could have up to 3 "small" jobs actually running at the same time. (Make sure your jobs don't trample each others files. You can't be gauranteed they'll run to completion in the order submitted.)

If these changes cause unforeseen problems, we may have to revisit them. Stay in touch... we love feedback from users!

Here's the new text of "news queue_policy":


  T3E Batch Queue Policy
  ======================
  The T3E is a popular, limited resource.  ARSC has found that T3E
  users are willing to work together to provide fair access to the
  queues.  Please:

    1) Queue or run(*) no more than one job at a time in any of the
           following queues. Also, do not queue or run jobs in more than 
       two different queues, at the same time, from this list of 
       queues:

         grand          gcp_grand
         xxlarge        gcp_xxlarge
         xlarge 
         large  
         medium

        2) Do not queue or run more than 3 jobs at a time in any of 
           the queues:
           
             small
             Qgrand     
             Qxxlarge  
             Qxlarge  
             Qlarge  
             Qmedium
             Qsmall

    3) Do not queue or run more than five jobs at a time in 
           the queue:

             single



  (*) "Queue or run" means having a job showing the status "Q" 
      or "R" in the output from "qstat -a".


  As an example, if user "goodman" submitted a job which ended up
  in the "large" queue, he/she would not submit another to this 
  queue until the first had run to completion.  

  Meanwhile, "goodman" could also have one job in the "medium" 
  queue, two in the "Qsmall" queue and even a couple in "single." 
  
  In general, try to use as few processors as are necessary
  and be flexible in the number of processors with which your codes 
  can run. This tends to increase the overall throughput, scheduling 
  efficiency, and number of people able to use the system at a given
  time.
  
  Contact User Services (consult@arsc.edu or 907-450-8602) if you 
  have any questions concerning this policy. Also, please contact 
  us if you feel that the queues are being misused, and we will work 
  to resolve the situation.  ARSC may hold or delete jobs that are 
  submitted in violation of this policy.



  Addendum:
  =========
 
  There are 2 ways to work within the existing policy that can
 
    * reduce time spent overseeing your work (i.e. logging in to
      execute yet another qsub command.)
  
    * effectively extend the runtime limits of submitted jobs


  1) Job chaining:

  Submit the next queue-limit-defined section of your job when the
  previous one finishes, within the same script. This is the best
  option as job order is preserved, wait time is minimized, and you
  don't run the risk of violating the 1 job per queue policy.

  Job chaining is described in more detail in T3E newsletter #176 at

       
    

    /arsc/support/news/t3enews/t3enews176/index.xml
    

    


  2) #QSUB -a option in your NQS script:

  If job chaining isn't possible, you can submit several jobs at once,
  using #QSUB -a to set the time NQS will begin queuing your requests
  so queued/running time doesn't overlap with other jobs belonging to
  you.  Using qstat -a, you will need to periodically check your job 
  list to insure that only one job has a Q,C,H, or R-status at a time 
  (Our policy does not limit the number of jobs in W-status, waiting 
  to be queued.)

  To avoid violating the 1 job per queue policy with this method you 
  will need to specify that job #2 will start quite a long time -- 
  preferably about 24 hours -- after job #1 to reduce the chance that 
  they will overlap if job #1 is delayed due to system maintenance or
  checkpointing.  (See news holding_jobs.)

  Use of #QSUB -a is described in T3E newsletter #184 at

      
    

    /arsc/support/news/t3enews/t3enews184/index.xml
    

Quick-Tip Q & A


A:[[ grep ITSELF shows up whenever I grep the output from ps.
  [[ For instance,
  [[
  [[ ICEHAWK1$  ps -aef 
 grep xloadl
  [[   mortimer 15172 20020   1 11:27:42  pts/0  0:00 grep xloadl 
  [[   mortimer 19344 20020   0   Jun 26  pts/0  3:09 xloadl 
  [[
  [[ I know I'm running grep, so why should grep tell me (like, would I
  [[ grep grep?)?  How can I get rid of this.  Can you switch it off?



# Derek Bastille, Rich Hickey, and Richard Griswold all gave this
# answer:

  ps -aef 
 grep xloadl 
 grep -v grep


# Here's Richard's explanation:

Grep is simply returning all lines that match the string "xloadl".  Since
the command "grep xloadl" shows up in the ps output, grep will return that
line.  Grep doesn't know that you are looking at ps output and that it is
supposed to ignore itself.  To do this use the -v flag, which tells grep
to ignore all matching input.


# Kate Hedstrom suggested an alias:

  alias psg 'ps -aux 
 grep \!* 
 grep -v grep'


# David Gever provides perhaps the cleanest solution:

  ps -e 
 grep xloadl

# with explanation:

On the grep question, I found that "ps -e" will not list the grep
process itself, while the "ps -ef" will. This is true on a SUN
workstation, but may apply in the user's platform too.




Q: I had some .lst and .lst.gz files, all of which I wanted to
   hide in a separate subdirectory. I used "mv" (the cryptic Unix
   alternative, it seems, to click and drag).  It seems to have worked,
   but "mv" complains, and I don't know if I should worry about this.

    termite$ ll
      total 10
      drwx------    2 morty    groop      4096 Jul 12 17:21 lst_files/
      -rw-------    1 morty    groop     34129 Jul 12 17:20 ptsv.f
      -rw-------    1 morty    groop     20011 Jul 12 17:20 ptsv_gen.f
      -rw-------    1 morty    groop    791991 Jul 12 17:20 ptsv_gen.lst.gz
      -rw-------    1 morty    groop     10233 Jul 12 17:20 ptsv.lst.gz
      -rw-------    1 morty    groop       550 Jul 12 11:10 vgen.lst
    termite$ 
    termite$ mv *lst* lst_files
      lst_files - Invalid argument
    termite$ 
    termite$ ll
      total 8
      drwx------    2 morty    groop      4096 Jul 12 17:21 lst_files/
      -rw-------    1 morty    groop     34129 Jul 12 17:20 ptsv.f
      -rw-------    1 morty    groop     20011 Jul 12 17:20 ptsv_gen.f
    termite$ 
    termite$ ll lst_files
      total 2
      -rw-------    1 morty    groop    791991 Jul 12 17:20 ptsv_gen.lst.gz
      -rw-------    1 morty    groop     10233 Jul 12 17:20 ptsv.lst.gz
      -rw-------    1 morty    groop       550 Jul 12 11:10 vgen.lst
    termite$ 

[[ Answers, Questions, and Tips Graciously Accepted ]]


Current Editors:
Ed Kornkven ARSC HPC Specialist ph: 907-450-8669
Kate Hedstrom ARSC Oceanographic Specialist ph: 907-450-8678
Arctic Region Supercomputing Center
University of Alaska Fairbanks
PO Box 756020
Fairbanks AK 99775-6020
E-mail Subscriptions: Archives:
    Back issues of the ASCII e-mail edition of the ARSC T3D/T3E/HPC Users' Newsletter are available by request. Please contact the editors.
Back to Top