ARSC T3E Users' Newsletter 148, August 7, 1998

The Ever-Popular qstat Command

qstat with the -a option is well-known, as it tells whether your work is moving forward. Other interesting, albeit less urgent, questions are also answered by qstat . For instance:

If you wonder,

  • What are all the queues on the system?
  • Which queues have been stopped?

Type: qstat

  • What are the PE and time limits on the queues?

Type: qstat -m

  • What's the <request-id> of my job?
  • Is my job waiting, queued, running, or held?

Type: qstat -a

  • Has my job (<request-id>) been checkpointed?
  • When did it start?
  • How many CPU-Seconds has it burned?

Type: qstat -f <request-id>

  • In a given queue (<queue-name>) what's the maximum number of PEs?
  • Longest run-time?
  • Maximum concurrent user-jobs?

Type: qstat -f <queue-name>

For more information:

  • See man qstat , which explains all of the columns, rows, and status codes in the various output formats.
  • See newsletter #137 : /arsc/support/news/t3enews/t3enews137/index.xml and the article: "Hold, Checkpoint, and Restart of NQS Jobs on Yukon" which explains the most common "qstat -a" status codes related to checkpointing.
  • ARSC users, see news queues which describes the ARSC queues and basic NQS commands. (The same information is available from qstat -m and man pages).
  • ARSC users, see "news queue_policy" which describes our policies for fair access to the queues. (This information is not available elsewhere.)

Published Work of T3E Newsletter Readers

[ This is an occasional series, designed to increase collaboration between readers of this newsletter. Whenever you publish research done using the T3E or concerning the T3E, let us know. Send us the citation, relevant web sites, abstract, and your e-mail address (if you like), and we'll pass it on. ]

Prefix Computations on Symmetric Multiprocessors

By David R. Helman and Joseph Ja'Ja'. Technical Report Number: CS-TR-3915 and UMIACS-TR-98-38. Institute for Advanced Computer Studies (UMIACS) and the Department of Electrical Engineering, University of Maryland, College Park, July 1998.

-The paper is available in PostScript format via WWW at:


We introduce a new optimal prefix computation algorithm on linked lists which builds upon the sparse ruling set approach of Reid-Miller and Blelloch. Besides being somewhat simpler and requiring nearly half the number of memory accesses, we can bound our complexity with high probability instead of merely on average. Moreover, whereas Reid-Miller and Blelloch targeted their algorithm for implementation on a vector multiprocessor architecture, we develop our algorithm for implementation on the symmetric multiprocessor architecture (SMP). These symmetric multiprocessors dominate the high-end server market and are currently the primary candidate for constructing large scale multiprocessor systems. Our prefix computation algorithm was implemented in C using POSIX threads and run on three symmetric multiprocessors - the DEC AlphaServer, the Silicon Graphics Power Challenge, and the HP-Convex Exemplar. We ran our code using a variety of benchmarks which we identified to examine the dependence of our algorithm on memory access patterns. For some problems, our algorithm actually matched or exceeded the optimal sequential solution using only a single thread. Moreover, in spite of the fact that the processors must compete for access to main memory, our algorithm still resulted in scalable performance up to 16 processors, which was the largest platform available to us.

David R. Helman Institute for Advanced Computer Studies Internet:

The seeds of rich galaxy clusters in the Universe

By F. Governato, C.M. Baugh, C.S. Frenk, S. Cole, C.G. Lacey, T. Quinn, & J. Stadel. Nature 392, 359-361, 1998.

A copy of this and other papers, and images from various simulations run with PKDGRAV, are available at the following URL:

Project description:

Combining large N-body simulations with semy-analytical methods that describe the formation and evolution of galaxies, the authors were able to clarify the link between the recently observed high redshift galaxies and present day galaxies. In fact, the regions of space where clusters are forming at high redshift also host the brightest galaxies at that epoch. Those primeval galaxies will rapidly make stars and merge together to form the massive aggregates of stars called elliptical galaxies that populate present day galaxy clusters. This work was part of an extended collaboration between researchers of two different institutions: University of Washington (WA) and the University of Durham (UK). For the simulations, a parallel treecode named PKDGRAV was used. The code has been developed at UW by Joachim Stadel and Tom Quinn.

Fabio Governato Physics Department, Durham University.

(You Think YOU'VE Got Parallelism?) DNA Computing

The term "parallelism" seems a little awkward when used to describe the simultaneous sprouting of grass seed tossed on a suburban yard, or the dissolving of sodium chloride spooned into a pot of boiling vermicelli. It's used in discussions of DNA computing, however, which is based on similarly biological/chemical processes, quite unlike the bit-twiddling and MPI broadcasting we all know and love.

So, what is DNA computing?

In its August, 1998 issue (v 279, n 2, p 54), Scientific American ran a nice article on it, written by Leonard Adleman, who made made the first DNA computation way back in 1994. To learn more, you might read that article or (of course) do a web search--where you're guaranteed some hits.

For now, here's a brief intro, condensed from the sciam article.

A DNA molecule can be thought of as a long string of A's, T's, G's, and C's (representing adenine, thymine, guanine, and cytosine). This is logically similar to a strings of 0's and 1's, and while nature uses such molecules to encode things like hairiness, we can now use them to encode, say, integers.

We have some operations we can perform on DNA molecules:

  1. duplicate them
  2. create their compliments
  3. bind (or concatenate) them
  4. search them for a predetermined subsequence and cut them at that point
  5. sort a collection of DNA molecules based on length
  6. from a collection, select (and separate) individuals containing a predetermined subsequence
  7. synthesize them with any arbitrary sequence (up to a length of 100)

While this is not exactly an FFT library, people are inventing algorithms using these operations, and solving problems.

"Parallelism" occurs because, rather than performing these operations on individual molecules in serial fashion, we perform them on an entire collection (10^14 molecules in a test tube, for instance) simultaneously.

Adleman showed the potential of DNA computing by solving the Hamiltonian path problem on a graph of seven nodes and 14 paths. Here's his algorithm, as printed in Scientific American:

Given a graph with n vertices,

  1. generate a set of random paths through the graph
  2. for each path in the set:
    1. check whether that path starts at the start vertex and ends at the end vertex. If not, remove that path from the set.
    2. check if that path passes through exactly n vertices. If not, remove that path from the set.
    3. for each vertex, check if that path passes through that vertex. If not, remove that path from the set.
  3. If the set is not empty, then report that there is a Hamiltonian path. If the set is empty report that there is no Hamiltonian path.

The laboratory methods sound pretty clunky (and I assume they've been better automated by now), and it took him a week to actually execute the algorithm on "live" DNA.

Very briefly, the process involved inventing an appropriate encoding for each of the nodes in the graph, using A, G, C, and T; then having a commercial lab synthesize actual DNA molecules (trillions of individual molecules) corresponding to the encodings; then mixing the the molecules together in a test tube, adding enzymes, adjusting the temperature, decanting, etc. in order to execute the various operations; and, finally, "reading" off the answer to verify that it was correct.

Quick-Tip Q & A

A: {{ I am wasting too much time in re-compilation. Is parallel make
      available on the T3E? }}

You can simulate parallel make by setting the NPROC environment for use
by the compiler.  ARSC requests that users keep NPROC set at or below a
value of 4.

Here's an excerpt from "man cc" or "man f90":


   The following environment variables are taken from the execution

   NPROC       Specifies the number of processes used for simultaneous
               compilations.  The default is 1.  When more than one
               source file is specified on the command line,
               compilations may be multiprocessed by setting the
               environment variable NPROC to a value greater than 1.
               You can set NPROC to any value, but large values can
               overload the system.

Q: How do I determine which version of f90 I am using?

[ Answers, questions, and tips graciously accepted. ]

Current Editors:
Ed Kornkven ARSC HPC Specialist ph: 907-450-8669
Kate Hedstrom ARSC Oceanographic Specialist ph: 907-450-8678
Arctic Region Supercomputing Center
University of Alaska Fairbanks
PO Box 756020
Fairbanks AK 99775-6020
E-mail Subscriptions: Archives:
    Back issues of the ASCII e-mail edition of the ARSC T3D/T3E/HPC Users' Newsletter are available by request. Please contact the editors.
Back to Top