ARSC T3E Users' Newsletter 134, January 9, 1998

Good F90 Programming Practice

If you're starting a new project in F90, you might benefit from the document:


            European Standards For Writing and Documenting 
                    Exchangeable Fortran 90 Code

Available at:

http://www.met-office.gov.uk/sec5/NWP/NWP_F90Standards.html

It outlines some Fortran90 code design issues taken by the European Meteorological and Climate centres in the development of shared F90 code. Of particular interest are some of the comments about the use of newer Fortran90 features. For instance array notation should be used whenever possible yet show the array's shape in parentheses, e.g.:


     1dArrayA(:) = 1dArrayB(:) + 1dArrayC(:)

     2dArray(:, :) = scalar * Another2dArray(:, :)

The document is full of wonderfully helpful hints on code readability and recording developments/modifications, etc. It should perhaps be considered compulsory reading for anybody developing Fortran90 code or staring to take advantage of F90 features in old F77 codes.

Memory Management Tips

[ Etienne Gondet of IDRIS/CNRS responded to our previous "Quick-Tip" question with the following discussion. Thanks, Etienne! ]


I maintain a Web-Page (in French) that summarizes the different tools
available about memory management on the T3E:

http://www.idris.fr/su/Parallele/utilitaires_aleph/memoire/Page_memoire.html

In English, here is a summary of four tools for determining a program's memory requirements:


  1 - static tool : 

        size -m a.out
  
  2 - post-mortem tool (max memory occupation) : Job Accounting

        ja
        mpprun -n 4 a.out
        ja -cfstlSMC
  
  3 - interactive tool to be used during or after execution, a 
      local development from the acctcom tool (Unix Standard) :

        t3esar -r -v
  
  4 - my favorite, a dynamic function to estimate use of the heap
      (specific to CRAY PVP or MPP but fairly useful for debugging) : 

        IHPSTAT(X) 

Detailed Description:

1 - static tool : size -m a.out


Example :
/fft/web/>size -m a.out

        File name: a.out
     Machine type: CRAY-T3E
    Number of PEs: 4
            Label: No label
         Stripped: no
            Besus: 1
        Use Eregs: yes
        Use Gsegs: yes
         Userinfo: 0x0000000000000000
 Transfer address: 0x800000000
 -------------------------------------------------------------------

 Vaddr              Size S RWX D A G   Guess  DiskOff Length  Zeroed
 800000000 code    D3440 N r-x 0 N N   D3440     1000  D3440       0
 100000000 data   D6E0C0 N rw- + N Y  D720C0    D4440  36200   776C0
 200000000 stack   C0000 N rw- - Y Y   C4000        0      0       0
 F20000000 regs        0 N rw- 0 N N       0        0      0       0
 F10000000 besus       0 N rw- 0 N N       0        0      0       0
 F00000000 eregs       0 N rw- 0 N N       0        0      0       0
 300000000 data      400 Y rw- + N Y    4400        0      0       0
 47FFFFFE0 data     2000 Y rw- - Y Y    6000        0      0       0
 Symbols                                       10A640  2D810
  1087040 (decimal) bytes will be initialized from disk.
   186384 (decimal) bytes are required for the symbol table.
     1368 (decimal) bytes are required for the header.
  1274792 (decimal) bytes total.

     - The Size column gives size of each area in bytes but coded in hexa.

     - The first of the 2 data areas contains common (FORTRAN) size.

     - For example, here you have D6E0C0 bytes (in Hexa)=14082240 bytes.

     - With PVM, MPI, data and stack lines are important. 


  You can also notice : 
     - N when you compile with compile the option -X N (non-malleable)

     - The barrier number used in the code. 

2 - post-mortem tool : Job Accounting


        ja 
        mpprun -n 4 a.out
        ja -cfstlSMC

The classical tool for accounting on Cray computer is "ja" (Job Accounting). It works by switch, start accounting and stop it. This produces a summary for all Unix commands launched between the 2 switches.


        Example : 
               ja
               ./a.out
               ja -clfStMC
Output :
-A) The following lines indicates the maximal memory used during 
  execution accumulated over all the PEs.
  
  Maximum memory used              :           41.3711 MWords

-B) The following lines indicates the maximal memory used during 
  execution the PEs which have used the most of memory
  
  # MPP Usage   PE's=3D 4, BESU's=3D 1, PE Memory High Water=3D 10.34 Mw
  
Remarks : A difference between A/N$PES and B gives clues for a poorly 
  load-balanced program.

3- t3esar -r -v [Note: this tool is available at IDRIS/CNRS.]


3.1 - With -r it's like a ps
ARGSPS=3D"-o command,user,apid,jid,pid,ppid,nice,npes,cputime,sctime,
sz,site,vpe,state,stime,tty,himem,vsz,wchan"

without -r it's a local development based on acctcom (Unix command).
ARGSACCTCOM=3D"-PJfwMmX" "-pt" for more details
# X to have exact start date of each process and not accounting date
# m  for mean core size
# f  for status process
# w  for wait times
# J  for job IDs
# P  for specific MPP informations
# M  for additional memory usage data as HIMEM

3.2 Post-Mortem tool, the column HIMEM gives the maximal memory used
during execution on the PEs which have used the most of memory.

Example 1 : t3esar
                          MPP  START   ELAPSED      CPU      MPP-TIME HIMEM
DATE  EXEC.  USER      JOB-ID PE   TIME   (SECS)   (SECS)    (SECS)   (MW)
May10 #exec  rlab001 B   1736  8 19:36:57 15476.46 123603.58 123794.88 5.25
May13 pg.exe rlab002 I   1734  8 19:35:41 19922.94 159173.03 159366.77 8.00
May16 a.out  rgrp006 B   1803 32 20:03:11 23672.65 440036.72 494676.21 13.06

Example 2 :
     t3esar -r gives the memory used at the moment t for every PE of a
running process (column SIZE).

     COMMAND    USER    TT   STIME      APID    JID NPES NICE
     a.out     plab001 batch 09:09 196b8103e22  3699   8   26
   PID    USER(SECS)   SYS(SECS) SITE VPE     STATE    SYSCALL   SIZE(Mw)
 59071       6633.10       89.46   0   0    Running           6.90576
 59075       6663.04       91.06   1   1    Running           6.90576
 59076       6663.07       86.82   2   2    Running           6.90576
 59077       6663.07       89.90   3   3    Running           6.90576

4 - IHPSTAT(X) dynamic tool to examine the heap usage. [Editor's Note: see Newsletter #97, /arsc/support/news/t3dnews/t3dnews97/index.xml, for more on IHPSTAT.]

This is an intrinsic integer FORTRAN function on CRAY PVP and MPP systems. The most useful values for X are:


     IHPSTAT(1)  : heap length.

     IHPSTAT(12) : Memory available to expand the heap via allocate, 
                       malloc or new

     IHPSTAT(11) : Memory that the Operating System can recover (if you call 
                       HPSHRINK() after free in C/C++ or deallocate in f90).

Other various values for X (see the man page) :
     4 : Number of allocated blocks.
     10: Size of the largest free block.
     13: Heap beginning address (bytes on T3E, words on PVP).
     14: Heap ending address (bytes on T3E, words on PVP).

See also : call HPDUMP(), call HPSHRINK().

Remarks : when your program aborts with an operand range error, IHPSTAT(12) can be very useful in locating the bug. For example a PVM collective operation must be called by each PE and if the code fails to do this, the buffers are not deallocated and, in a do-loop with a big iteration count, this can lead to an ORE.

Quick-Tip Q & A


A: {{ On the T3E, how can you determine how much memory your running 
      program uses? }}


  The previous article provides several possibilities.  Here are two
  more.  One uses grmview only, the second uses grmview and mppview.
  
  GRMVIEW ONLY METHOD:
  ==================================================================
  With your program executing, issue the command:
 
    grmview -l

  This will produce two tables: a table of PE data similar to this
  (I've deleted some rows & columns):

                 Ap. Size  Number Aps.
     Type    PE  min  max running limit   x  y  z Clock UsrMem FreMem
    + APP     0    2   90       1     1   0  0  0   450    246    174
    + APP   0x1    2   90       1     1   1  0  0   450    246    174
    .....
    + APP   0x7    2   90       1     1   1  3  0   450    246    174
    + APP   0x8    2   90       0     1   2  0  0   450    246    246
    + APP   0x9    2   90       0     1   3  0  0   450    246    246
    .....

  And a table of jobs, like this:

    Exec Queue: 4 entries total. 3 running, 1 queued
      uid   gid  acid    Label Size  BasePE   ApId Command      Note    
     3395  1393  1393     -      32    0x14  67415 a.out          -    
     3398   882   882     -      36    0x34  67421 a.out          -    
     3362   507   507     -       8       0  86345 a.out          -    
      
  To evaluate, use the second table to determine the PEs on which your
  job is running and match to PEs in the first table.  User 3362, for
  instance, has a job of size 8 PEs which starts at base PE 0, and thus
  is running on PEs 0, 0x1, 0x2, ... 0x7.  Of 246 MBytes available per
  PE, he/she is using (246 - 174 = ) 72 MBytes, or about 32%.



  GRMVIEW / MPPVIEW METHOD:
  ==================================================================
  
  Step 1: 
  =======
   Issue the command:
  
    grmview -deq
  
  which will list the PE number ranges (in decimal rather than hex) 
  of running jobs. For example:
  
  
    Exec Queue: 4 entries total. 3 running, 1 queued
      uid   gid  acid    Label Size    PE Range    ApId Command     Note    
     3395  1393  1393     -      32  00020-00051  67415 a.out         -    
     3398   882   882     -      36  00052-00087  67421 a.out         -    
     3362   507   507     -       8  00000-00007  86345 a.out         -    

  
  Step 2: 
  =======
  Issue the command:
  
    mppview -t

  which puts you into the text mode of mppview. Possible interrogations
  are described in the man page or by typing "help." In particular
  (don't expect a prompt), you may issue this query:

    print_function memused
  
  which will dump (by decimal PE number) the percentage of memory being
  used: 

   
    0 1  32
    1 1  32
    2 1  32
    3 1  32
    4 1  32
    5 1  32
    6 1  32
    7 1  32
    8 1   3
    9 1   3
    .......
  
  For example, grmview shows that user 3362 is still running on nodes
  0-7.  Mppview shows that he/she is still using about 32% of the
  available memory.  As expected, mppview's measure agrees with
  grmview's.


  [ Editorial Comment: 

  The above example is actual yukon output (except that the ids and
  names have been changed).  Clearly, the job is not using memory fully
  and could presumably run on 3 PEs.

  Its runtime might be greater given 3 rather than 8 PEs (depending on
  its scalability), but NQS would probably schedule it earlier (smaller
  is better for scheduling). It is thus possible that, even with a
  longer runtime, the job might complete sooner. And regardless,
  running on 3 rather than 8 PEs would free 5 PEs for other users.

  In general, please run on as few PEs as necessary. ]



Q: In F90, how can you check whether an ALLOCATE has completed
   correctly or prevent your program from halting if it attempts to
   claim more memory than is available?

[ Answers, questions, and tips graciously accepted. ]


Current Editors:
Ed Kornkven ARSC HPC Specialist ph: 907-450-8669
Kate Hedstrom ARSC Oceanographic Specialist ph: 907-450-8678
Arctic Region Supercomputing Center
University of Alaska Fairbanks
PO Box 756020
Fairbanks AK 99775-6020
E-mail Subscriptions: Archives:
    Back issues of the ASCII e-mail edition of the ARSC T3D/T3E/HPC Users' Newsletter are available by request. Please contact the editors.
Back to Top