ARSC HPC Users' Newsletter 253, August 30, 2002

Guide to Choosing OpenMP Scope Attributes for Variables

[ Thanks to John Larson for contributing this article. ]

Guide to Choosing OpenMP Scope Attributes for Variables:

Consider a PARALLEL DO construct, and assume the loop is actually parallel.

      sum = 0.0
C$omp PARALLEL DO  SHARED ( ?? )  PRIVATE ( ?? )  ??
      DO I = 1, N
         A(I) = x
         B(I) = Y(K)
         temp = A(I) * B(I)
         sum = sum + temp
      END DO
      print *, sum

Make a list of all the variables in the loop.

The following rules help to identify the scope requirements of the variables in a parallel loop.

  1. Variables only simply subscripted by the parallel loop index (I) are SHARED.
    1. See A and B.
    2. Subscript must be only the loop index. If A(I) = x were A(I-1) = x, then there is a dependence between iterations that may prevent parallelization.
    3. These variables can appear on the left hand side and/or the right hand side of assignment statements.
    4. Each OpenMP thread accesses different elements of these arrays based on the iterations assigned to the thread. Two threads cannot access the same element because no two threads are assigned the same iteration.
  2. Variables not subscripted by the parallel loop index.
    1. Variables appearing only on the right hand side of assignment statements (or the variable is read only)
      1. are SHARED.
      2. See N, x, Y and K.
      3. The variables were assigned values before the parallel loop and those values are not changed in the parallel loop.
      4. Each OpenMP thread accesses the same variable. Read access does not change the value of the variable, and no conflicts or races can occur.
    2. Variables used in reduction operations are SHARED.
      1. See sum.
      2. The reduction variable is initialized before the parallel loop and is used after the parallel loop.
      3. For each variable specified in a REDUCTION clause, the compiler creates invisible PRIVATE variables for each OpenMP thread in which to compute partial results. These PRIVATE variables are carefully accumulated to the visible SHARED variable before the loop is completed.
    3. Variables that are defined before used are PRIVATE.
  3. See temp.
  4. The variable is not referenced before or after the parallel loop.
  5. The compiler creates a variable with the same name but with a different storage location for each OpenMP thread. A thread cannot access the variable of the same name in another thread.

The parallel loop index (I) is handled by the compiler, and it should not appear in a PRIVATE or SHARED clause. The compiler will ignore whatever specification is given, and do the right thing.

The correct scoping specifications for this parallel loop are

C$omp  PARALLEL DO SHARED ( A, B, N, x, Y, K, sum )
C$omp&             PRIVATE ( temp ) REDUCTION ( +:sum )

Or, more simply, using the fact that the default scope is SHARED,

C$omp  PARALLEL DO PRIVATE ( temp ) REDUCTION ( +:sum )

If all variables in a loop cannot be described as in the guidelines above, then the loop needs deeper analysis or it is not parallel. If it can, then the loop is parallel and you got the scoping right!

XLF @PROCESS Directive

IBM's XLF Fortran compiler allows you to specify some "-q<option>" compiler options inside the source file itself, using @PROCESS directives.

I investigated @PROCESS in hopes it would let me selectively apply "-qhot" (high order transformations) to some loops but not others, similar to Cray's f90 directives, "VECTOR" and "NOVECTOR." As described in issue #250 , "-qhot" can do a great job of rearranging loops and replacing math functions with their vector equivalents.

As it turns out, @PROCESS only works at the level of the subroutine and other compilation units. Also, many "-q" options are completely unavailable via @PROCESS, including, unfortunately, "hot" and "nohot".

Here's a longer introduction to @PROCESS, from the XLF User Guide, p 51:

Specifying Options in the Source File

By putting the @PROCESS compiler directive in the source file, you can specify compiler options to affect an individual compilation unit. The @PROCESS compiler directive can override options specified in the configuration file, in the default settings, or on the command line.

In fixed form, @PROCESS can start in column 1 or after column 6. In free form, the @PROCESS compiler directive can start in any column. You cannot place a statement label or inline comment on the same line as an @PROCESS compiler directive.

By default, option settings you designate with the @PROCESS compiler directive are effective only for the compilation unit in which the statement appears. If the file has more than one compilation unit, the option setting is reset to its original state before the next unit is compiled. Trigger constants specified by the DIRECTIVE option are in effect until the end of the file (or until NODIRECTIVE is processed).

The @PROCESS compiler directive must usually appear before the first statement of a compilation unit. The only exceptions are when specifying SOURCE and NOSOURCE; you can put them in @PROCESS directives anywhere in the compilation unit.

As far as I know, there are two ways to determine if a given xlf "-q" option is available via @PROCESS:

  1. Read the section of the XLF User Guide for the option in question. If there's no mention of @PROCESS, then it's not available. If there is a mention of @PROCESS, it will be to tell you that @PROCESS is available.

    The sections on "hot" and "nohot" don't mention @PROCESS. It is documented as available for a number of other options, including, for example:

    free fixed intsize realsize listopt recur

  2. Try it, and see if the compiler ignores it. For example:
      @PROCESS free
      @PROCESS nohot
      program hello
      $ xlf90  -qfixed -qhot hello.f
      "hello.f", line 2.10: 1520-003 (W) Option NOHOT is ignored.
      "hello.f", 1520-022 (W) The FREE option has been replaced by the option FREE(F90).
      ** hello   === End of Compilation 1 ===
      1501-510  Compilation successful for file hello.f.

    Notice the compiler message:

      "Option NOHOT is ignored"

    Method 2, of course, is how I originally discovered that "@PROCESS nohot" wasn't available.

Queued on the Cray

The SV1ex has been quite busy lately. If you're wondering if your job is still queued, and why, look for its NQS "status", which is shown in the last column, labeled "ST," of the "qstat -a" output.

This is from a "qstat -a" snapshot, taken last week:

$ qstat -a
-------------  ---- ----- ------------------ ---- ---- ------ ------ ---
92800.chilkoot job1 user1 background@chilkoot      599  71680 144000 Qqr
92806.chilkoot job2 user2 large@chilkoot             0 262144 144000 Qgm
92792.chilkoot job3 user3 large@chilkoot           728 137216 120005 Qcu

From "man qstat", the major status code on all three jobs is:

  Major status code: 
     Q    QUEUED
             The request is in a queue and is eligible for routing or

Explaining the minor status codes, one at a time:

Job 1:

92800.chilkoot job1 user1 background@chilkoot      599  71680 144000 Qqr

From "man qstat", the "qr" minor status code indicates:

      qr    The queue run limit was reached.
What's the "queue run limit"? See "qstat -l":

    $ qstat -l
    QUEUE NAME        RUN       MEMORY          QUICKFL     USR GRP
    --------------- --- --- ------- ------- ------- ------- --- ---
    background        1/1   1572864/71680        --/0         2  --
In the "RUN" column, "1/1" shows that the limit is one, and one job is running. I.e., the "queue run limit" had been reached, so this job had to wait.

Job 2:

92806.chilkoot job2 user2 large@chilkoot             0 262144 144000 Qgm

From "man qstat":

       gm    The global memory limit was reached.
What's the "global memory limit"? Again, see "qstat -l", but look at the last line, before the "/" in the "MEMORY" column:

  $ qstat -l

  QUEUE NAME                RUN       MEMORY          QUICKFL     USR GRP
  ----------------------- --- --- ------- ------- ------- ------- --- ---
  chilkoot                100/9   3686400/3495000  640000/0        15  **
  ----------------------- --- --- ------- ------- ------- ------- --- ---

All jobs, regardless of queue, sum to the global memory used, so this is typically the memory limit reached first.

Sometimes a job blocked on one condition becomes blocked on a second condition as soon as the first clears. We commonly see a job status go from Qgm to Qcu or Qqr. NQS seems to check memory early.

Job 3:

92792.chilkoot job3 user3 large@chilkoot           728 137216 120005 Qcu

From "man qstat":

       cu    The complex user run limit was reached.

Queues are grouped in "complexes." To see the complexes and limits associated with them, see "qstat -L":

  $ qstat -L
  NQS QUEUE COMPLEX LIMITS: chilkoot      
  --------------- --- --- ------- ------- ------- ------- --- ---
  userque          40/7   3584000/1144832      --/0         3  40 
  cbkg              6/1   1572864/71680        --/0         2   2 
  cspecial          4/1   2457600/1658880      --/0         2   2 

Use "qstat -f large" to find out which "complex" the "large" queue is in:

  $ qstat -f large

The complex user run limit was reached for this user because he had three other jobs running in queues in the same, "userque" complex. To list all the queues in "userque", type "qstat -C userque".

Quick-Tip Q & A

A:[[ I thought OpenMP was supposed to be easy!  It won't even compile. 
  [[ What's wrong???
  [[ Here's the relevant loop, printed with line numbers, followed by the
  [[ error message:
  [[   +32  #pragma omp for reduction(+:overallsum)
  [[   +33      for (n = 0; n != ARRSZ; n++) {
  [[   +34        overallsum += array[n] ;
  [[   +35      }
  [[   +36
  [[ibmsp$  xlc_r -qsmp=omp -o openmp_tester openmp_tester.c
  [[  "openmp_tester.c", line 33.17: 1506-818 (S) Controlling expression
  [[  of the for loop is not in the canonical form.

  The loop test expression, (n != ARRSZ), should be replaced by 
  (n < ARRSZ).

  From the OpenMP C/C++ 1.0 standard (available at:
) :

    "The canonical form allows the number of loop iterations to
    be computed on entry to the loop."

  And the nitty-gritty: 

    The 'for' directive places restrictions on the structure of the
    corresponding 'for' loop.  Specifically, the corresponding 'for'
    loop must have canonical shape:

      for (init-expr; var logical-op b; incr-expr)
      One of the following:
        var = lb 
        integer-type var = lb
      One of the following:
        var += incr 
        var -= incr 
        var = var + incr 
        var = incr + var 
        var = var - incr  
      A signed integer variable. If this variable would otherwise be
      shared, it is implicitly made private for the duration of the
      for.  This variable must not be modified within the body of the
      for statement. Unless the variable is specified lastprivate, its
      value after the loop is indeterminate.

      One of the following:

      lb, b, and incr 
      Loop invariant integer expressions.  There is no synchronization
      during the evaluation of these expressions.  Thus, any evaluated
      side effects produce indeterminate results.

      This computation is performed with values in the type of var,
      after integral promotions. In particular, if value of b - lb +
      incr cannot be represented in that type, the result is

Q:  What's an "ulp"--a typo? a word? an acronym? I noticed it in issue
    #250.  Should I care?

[[ Answers, Questions, and Tips Graciously Accepted ]]

Current Editors:
Ed Kornkven ARSC HPC Specialist ph: 907-450-8669
Kate Hedstrom ARSC Oceanographic Specialist ph: 907-450-8678
Arctic Region Supercomputing Center
University of Alaska Fairbanks
PO Box 756020
Fairbanks AK 99775-6020
E-mail Subscriptions: Archives:
    Back issues of the ASCII e-mail edition of the ARSC T3D/T3E/HPC Users' Newsletter are available by request. Please contact the editors.
Back to Top