ARSC T3E Users' Newsletter 168, May 14, 1999

1-PE Jobs

Each T3E processor is configured to be one of the types:
  • CMD: command PEs, for user shells and single PE processes.
  • APP: application PEs, for parallel applications
  • OS: operating system PEs, for the operating system
(The UNICOS/mk command, "grmview -l", will show the current configuration.)

Under UNICOS/mk 2.0.3 (the current OS on yukon) and prior versions, all 1-PE jobs run on CMD PEs, not on APP PEs.

If you request 1 PE from NQS, however, NQS treats the job like a parallel application (even though the OS runs the job on a CMD PE). Thus, NQS believes that one less APP PE is available, which may lead to another users' job being unnecessarily blocked.

At ARSC, you should use the "single" queue, and not the "small" queue, to run 1-PE jobs. This is a special queue to get around the 1-PE problem.

Using "single" prevents NQS from mistakenly crediting an APP PE to your job. It enables you to run two or more 1-PE jobs simultaneously, and it keeps your jobs out of contention with jobs in the "small" queue.

From "news queues" on yukon, here's how to use "single":

  >  To route a job to the "single" queue, do not specify a PE or
  >  runtime limit.  (If either limit is specified, the job will be
  >  routed to another queue.)  For example, given this specification:
  >    #QSUB -q mpp
  >  NQS would route the request to the "single" queue.
Why all the confusion? NQS, the OS, and the applications themselves play different roles, and don't always cooperate.

NQS keeps its own tally of how many PEs it has assigned and how many are available. For instance, on a system with the NQS "global PE limit" set to 128, NQS might "release" a 128-PE request. It would then calculate that 0 PEs remain available. Once "released," the action of the request is handled by the OS, and not NQS.

In one scenario, the job (qsub script) might sit and compile Fortran code for 30 minutes, thus idling 128 PEs, but NQS wouldn't know about it, and wouldn't be able to assign other waiting jobs to those PEs. NQS must assume that all requested PEs are in use.

On the other hand, if an interactive user launched a 2-PE job while the 128-PE NQS request was compiling, the OS would indeed notice the available processors and start the 2-PE job. Unfortunately, the 128-PE request would then be blocked by the 2-PE job when it finished compiling, and NQS wouldn't be able to run anything else because, to its knowledge, 128-PEs would be in use.

In another scenario, a 1-PE request might appear in the "small" queue. When NQS releases a 1-PE "small" request, it subtracts 1 from its pool of available APP PEs, even though the job runs on a CMD PE, and thus under-counts the number of available APP PEs. This can again lead to jobs being blocked.

This problem with 1-PE jobs occurs from time-to-time on yukon, which has a global MPP PE limit of 256. Here's an example, as shown by the "qstat -m" command:

  yukon$ qstat -m
  QUEUE NAME                RUN     QUEUE-PE'S    R-PE'S  R-TIME  P-TIME
                          LIM/CNT     LIM/CNT     LIMIT   LIMIT   LIMIT
  ----------------------- --- ---  ------ ------  ------  ------  ------
  System                    6/0        --/0           --      --      --
  gcp_grand                 1/0       256/0          256   57600   57600
  gcp_xxlarge               3/0       200/0          160   28800   28800
  Qxxlarge                  1/0       160/0          160    1800    1800
  xxlarge                   1/0       160/0          160   14400   14400
  grand                     1/0       256/0          256   14400   14400
  single                   10/1         0/0            1      --      --
  Qxlarge                   1/0       100/0          100    1800    1800
  Qlarge                    3/0       120/0           50    1800    1800
  Qmedium                   4/0        60/0           20    1800    1800
  Qsmall                    4/0        30/0           10    1800    1800
  Qgrand                    1/0       256/0          256    1800    1800
  medium                    4/1        60/18          20   28800   28800
  large                     4/1       200/50          50   28800   28800
  small                     4/1        30/1           10   28800   28800
  xlarge                    2/1       132/60         100   14400   14400
  ----------------------- --- ---  ------ ------  ------  ------  ------
  yukon                   100/5       256/129   
  ----------------------- --- ---  ------ ------  ------  ------  ------
In this table, the column:

shows NQS's count of the APP PEs in use, listing them by queue and for the entire machine. There were three actual parallel applications running, with the sizes 18, 50, and 60, for a total of 128 PEs.

Examine the row:

 small                     4/1        30/1           10   28800   28800 
which shows how NQS mistakenly counted 1 PE used in the "small" queue, and thus obtained a total of 129 total PEs in use. At this point, NQS refused to launch a waiting 128-PE request that could have run. From NQS's point of view, this would have consumed a total of 257 PEs, exceeding the global limit. This is apparent in the row for the overall totals:

 yukon                   100/5       256/129 
(The situation is resolved fairly easily by a sysadmin, but only when someone is on duty to notice it.)

The column:

shows the count of jobs running, by queue. Note that there was one job in the single queue. Following the row:

 single                   10/1         0/0            1      --      -- 
to the QUEUE-PE'S/CNT column, note that this request did not count against NQS's PE total. This is the correct behavior for the "single" queue.

Run 1-PE jobs in "single"!

Debugging Debugging With FLUSH?

The print statement is an ever-popular debugging tool.

This week, a user was trying to diagnose a Fortran code. It launched Okay, but immediately hung. He inserted this,

as the first executable statement in the program, recompiled, and ran, and again, it hung. It never even printed "BEGIN".

The solution was to debug the debugging statement by changing it to this:

      CALL FLUSH (101)

The write statement had executed successfully the first time, but the write buffer had not filled up before the problem which caused the "hang" was reached, and thus, the "BEGIN" had never been "flushed" to the user's console. "FLUSH" forces the contents of a write buffer out to the specified unit number (101 is used for standard output), even if the buffer is not yet full.

Quick-Tip Q & A

A:{{ You're not sure if you compiled with Apprentice, PAT, or VAMPIR
      enabled in your current executable. How can you find out?  }}

  Two answers.  The first didn't work for VAMPIR and in either case,
  you may have to scrutinize the output for hints:

    what a.out    
 egrep -i "apprentice
    strings a.out 
 egrep -i "apprentice

  Three examples:

    yukon$ what a.out.1 
 egrep -i "apprentice
        apprentice/Lib/apprif.c 30.0    11/20/97 14:50:55
        apprentice/Lib/cal.s    20.3    05/22/97 12:27:01
        apprentice/Lib/comm.c   30.0    11/20/97 14:50:55

    yukon$ strings a.out.1 
 egrep -i "apprentice
 head -5 
    WARNING:  The Apprentice Runtime Information File (RIF) is being written
    barrier that not all PE's entered.  The Apprentice

    yukon$ strings a.out.2 
 egrep -i "apprentice
 head -5


Q: "I can't login! I keep on trying... The Kerberos (so-called) server
   accepts my 'kerberos password,' asks for my 'card-code,' which 
   I enter, but then it says:

       Enter Next Token:

   I enter my SecurID PIN into my SecurID card (AGAIN), type the 'next
   token' which appears on the card, but it doesn't work!"

   (What should this person do?)

[ Answers, questions, and tips graciously accepted. ]

Current Editors:
Ed Kornkven ARSC HPC Specialist ph: 907-450-8669
Kate Hedstrom ARSC Oceanographic Specialist ph: 907-450-8678
Arctic Region Supercomputing Center
University of Alaska Fairbanks
PO Box 756020
Fairbanks AK 99775-6020
E-mail Subscriptions: Archives:
    Back issues of the ASCII e-mail edition of the ARSC T3D/T3E/HPC Users' Newsletter are available by request. Please contact the editors.
Back to Top