ARSC T3E Users' Newsletter 130, November 14, 1997

Special SC97 Issue

Many ARSC staff members, including Guy and two ARSC Visualization Specialists, will be at SC97. If you're dealing with some thorny problem, they may be able to help--and they'd like to meet you in any case.

Here are some displays of interest:


Booth    Organization
=====    ============================================================
417      SGI/Cray
421      Portland Group (PGHPF)
834      Pallas (VAMPIR)
R211     ARSC

Poster   Presenter
======   ======================================
P35      Don Morton, University of Montana/ARSC

Goodbye ARSC T3D, Next Tuesday

Sad but true, the torch is passed to a new generation.

New Queues on Yukon

As of last week, two new queues are active on yukon. These were created to improve overall throughput and are part of a continual process of improvement which will see further changes in the coming months. Users can see the current queue configuration by reading "news queues" and the current queue policy by reading "news queue_policy."

The new queues are:


       - grand:  jobs requiring over 64 PEs.

        - single: jobs requiring only a single PE but not suited to 
                  interactive sessions.

Both queues are limited to 4 hours runtime.

grand queue The grand queue is accessed, as with other queues, by specifying the number of processors required and the desired runtime. Note users should specify only the number of processors needed in the job and not the number in the queue name. For example, the following 80 PE job would be routed to the grand queue:


               #QSUB -q mpp 
                #QSUB -l mpp_p=80 
                #QSUB -l mpp_t=4000
               #!/bin/csh

               cd my_working_directory
               mpprun -n 80 ./a.out

Initially, the grand queue is scheduled to be active during specific periods depending on demand and system loading. See "news queues" for the current schedule.

single queue The single queue is obtained by NOT specifying ANY mpp_t or mpp_p limits. As an example, this request would run in the single queue:


               #QSUB -q mpp
               #!/bin/csh

               echo ' started '
               sleep 60
               echo ' completed '

All serial T3E jobs which run on one PE, whether started interactively, through the single queue, a larger queue, with or without "mpprun -n1", will run on command rather than application PEs. The single queue lets us be honest about this. You can't run a 1 PE job on an application node.

Note that application PEs typically have about 245MBytes of memory free for the user's programs where Command PEs are multi-tasked, responsible for other activity, and have less available memory.

If you do explicitly request 1 PE in a qsub script, NQS will route the request through an 8 PE batch queue, but the scheduler will still run the job on a command PE. This effectively removes 1 PE from the NQS resources without actually using an application PE. This can block other jobs that could run. This situation is reminiscent of the T3D problem documented in newsletter 115 ( /arsc/support/news/t3enews/t3enews115/index.xml ) which required a user to request 2 PEs in order to run on 1 PE.

The single queue is intended to allow users to run tasks which don't require large numbers of processors but which do need to be scheduled. Activities such as the pre and post- processing of data or tarfile/archive creation are likely candidates.

(NOTE: This queue will be revised in the next few weeks to allow users to specify runtime limits.)

Notes on Queue Behaviour

Freeze outs. Due to the nature of the system it is likely that jobs which require a large fraction of the system can be frozen out. They never manage to get the needed resources since there are always some smaller jobs running on the system.

Our current solution is to halt smaller queues until all active work is complete and then start the bigger queues. ARSC performs a similar procedure on Tuesdays before test time to prevent user jobs starting if they cannot complete prior to test time. (A useful tip is to do interactive work on Tuesdays since there is often more available time here than the rest of the week.) This practice will continue but will be eased with the installation of roll-out and check-pointing in the next release of the OS (likely installation date late 97).

Efficient Utilisation. A reminder, users are encouraged to use only as many PEs as are necessary to obtain the memory required for their tasks. Excessive use of processors may reduce the time taken for the individual applications but will result in increased wait time, both for the user who must wait for these resources to become available and for other users who must wait for this user's job to complete.

On the T3E, jobs are no longer limited to powers of 2. ARSC already has several users making use of non-power of two processors and achieving reduced turnaround for jobs. This results in a greater throughput for the system overall.

Next Changes. In the coming weeks the queue system will see queue name changes and new queues. Queues will no longer refer to specific numbers of processors or runtime limits. New queues with different runtime limits will be created to improve system utilisation. Up-to-date information on the current configuration can be found in news queues .

Survey. Later this year, all users of the ARSC T3E system will receive a questionnaire regarding current and preferred queue use. ARSC actively encourages feedback from users on the subject of queue structures and behaviour.

Quick-Tip Q & A


A: {{ How do you invent and remember good passwords!? }}

   # Stay tuned... 

Q: What would be a good Quick-Tip question for next week?! 



  ** Apologies to Lennon & McCartney **
  
        Yesterday,
        All those backups seemed a waste of pay.
        Now my database has gone away.
        Oh I believe in yesterday.
        
        Suddenly,
        There're not half the files there used to be,
        And there's a milestone
        hanging over me
        The system crashed so suddenly.
        
        I pushed something wrong
        What it was I could not say.
        
        Now all my data's gone
        and I long for yesterday-ay-ay-ay.
        
        Yesterday,
        The need for back-ups seemed so far away.
        I knew my data was all here to stay,
        Now I believe in yesterday.
    

  (Reminder: there are still no backups on yukon, and there is a
  seven-day purge policy on /tmp.   Work continues on a storage
  solution.)

[ Answers, questions, and tips graciously accepted. ]


Current Editors:
Ed Kornkven ARSC HPC Specialist ph: 907-450-8669
Kate Hedstrom ARSC Oceanographic Specialist ph: 907-450-8678
Arctic Region Supercomputing Center
University of Alaska Fairbanks
PO Box 756020
Fairbanks AK 99775-6020
E-mail Subscriptions: Archives:
    Back issues of the ASCII e-mail edition of the ARSC T3D/T3E/HPC Users' Newsletter are available by request. Please contact the editors.
Back to Top