ARSC HPC Users' Newsletter 262, January 24, 2003

User's Introduction to ARSC Supercomputers

Upcoming training:

  Date:       Weds., Feb 12th, 2pm
  Location:   Butrovich 109
  Instructor: Kate Hedstrom, ARSC

    Architectures and capabilities of the Cray SV1ex, Cray SX-6,
      Cray T3E, IBM SP cluster, IBM Regatta, and linux cluster
    Programming models
    Programming Environments:
      Performance analysis tools
    Running jobs
      Interactive and batch
      Submitting batch jobs
      Checking job status
    Model output
      Storing files
      Visualizing results

For now, e-mail to register, or with questions. Our training web pages and regular registration forms, are not up yet but we'll have another announcement later.

A Note On Vectorization and Inlining On The SX-6

[ Thanks to Ed Kornkven of ARSC. ]

Here's a pair of Fortran routines:

      subroutine addcirc(radii, n, colsum)
      implicit none
      integer j, n
      real radii(n), colsum, area

      colsum = 0.0
      do 50 j=1, n
         colsum = colsum + area(radii(j))
   50 continue

      real function area(r)
      implicit none
      real r, pi
      data pi /3.141592653589/

      area = pi * r * r

When these are compiled (on the SX-6 front-end, using the sxf90 cross-compiler), as follows:

       sxf90 -c -Cvopt -Wf,'-pvctl fullmsg' ex2.f

We get these messages from the compiler:

       f90: vec(3): ex2.f, line 8: Unvectorized loop.
       f90: opt(1025): ex2.f, line 9: Reference to this function 
            inhibits optimization.
       f90: vec(10): ex2.f, line 9: Vectorization obstructive 
            procedure reference.  :area

The function call in the middle of the DO loop prevents it from being vectorized. The solution is to have the compiler "inline" the function, thus removing the function call and the obstacle to vectorization. Adding the option "-pi exp=area" requests inline expansion of "area," as follows:

       sxf90 -c -Cvopt -pi exp=area -Wf,'-pvctl fullmsg' ex2.f

And we get:

       f90: vec(3): ex2.f, line 8: Unvectorized loop.
       f90: opt(1025): ex2.f, line 9: Reference to this function 
            inhibits optimization.
       f90: vec(10): ex2.f, line 9: Vectorization obstructive 
            procedure reference.  :area

What happened? It turns out that the presence of the DATA statement inside the function prevents the optimizer from doing the inlining. A simple solution is to replace the DATA statement with an assignment:

      real function area(r)
      implicit none
      real r, pi
!      data pi /3.141592653589/

      pi = 3.141592653589
      area = pi * r * r

Recompiling again gives both inlining and vectorization, as desired:

       f90: vec(1): ex2.f, line 8: Vectorized loop.
       f90: vec(24): ex2.f, line 8: Iteration count is assumed. 
                     Iteration count=5000
       f90: opt(1222): ex2.f, line 9: Procedure expanded inline.
       f90: vec(26): ex2.f, line 9: Macro operation Sum/InnerProd.

Don't forget stdlib.h

Definitions for many utility functions used in C programs, including "malloc," appear in stdlib.h. If you use any of them, be sure to include stdlib.h in the appropriate files.

This example shows a problem encountered this week at ARSC, porting a program to the SX-6 that had failed to include stdlib.h:

/* Program: ptrsz.c             */

#include <stdio.h>

/* #include <stdlib.h> */     /*** COMMENTED OUT FOR ERROR ***/

main () {
  char *p;

  printf ("sizeof(p): %ld\n", (long) sizeof (p));

  if ((p = (char*) malloc (sizeof(char) * 1)) == NULL) {
    printf ("malloc failed\n");
    exit (1);
  else {
    printf ("malloc succeeded\n");

  *p = 'x';
  printf ("p= \'%c\'\n", *p);
/* end of ptrsz.c               */

This won't work on the SX-6:

  rime$ cc -o ptrsz ptrsz.c
    "ptrsz.c", line 9: warning: improper pointer/integer precision : op "CAST"
  rime$ ./ptrsz
    sizeof(p): 8
    malloc succeeded
    core dumping
    Bus error(coredump)

We can create a similar situation on the IBM p690:

  iceflyer 293% cc -q32 -o ptrsz ptrsz.c
  iceflyer 294% ./ptrsz
    sizeof(p): 4
    malloc succeeded
    p= 'x'
  iceflyer 295% cc -q64 -o ptrsz ptrsz.c
  iceflyer 296% ./ptrsz                 
    sizeof(p): 8
    malloc succeeded
    Segmentation fault(coredump)

The p690 can issue a warning like the SX-6, if we ask for it:

  iceflyer 297% cc -qwarn64 -q64 -o ptrsz ptrsz.c
    "ptrsz.c", line 9.12: 1506-745 (I) 64-bit portability: possible
    incorrect pointer through conversion of int type into pointer.

  iceflyer 298% ./ptrsz                 
    sizeof(p): 8
    malloc succeeded
    Segmentation fault(coredump)

We'll take our explanation from CSIRO's HPCCC Users' FAQ:

  What is the meaning of the compiler message: warning: improper
  pointer/integer precision : op "CAST"?

  If you find the warning associated with "calloc" or "malloc"... the
  memory allocation routines are defined in stdlib.h and if this is not
  included, the return type of the alloc routines automatically becomes
  int (default). NEC's cc IS ANSI compliant.  However the sizes of data
  types are NOT prescribed by the standard. See C Programmer's Guide,
  Chapter1, 1.2.7 Data Types and Sizes showing (SX5 has IEEE - float0

              Type        Size(in bits)
              ----        -------------
              char        8
              short       16
              int         32
              long        64
              long long   64
              pointer     64
              float       32
              double      64
              long double 128
              enum        32

  On machines/compilers where sizeof(int) == sizeof(void*), it will of
  course work. But ANSI C does not require this. In fact many references
  on ANSI C specifically warn against this assumption. The NEC warning
  is because it may or may not work, but since cc has found a sizeof
  problem (smaller to larger) it puts out the message. Up to the
  programmer to see if it matters.  ;-(

  The moral: .. add #include <stdlib.h> to the .c files or appropriate

Following this advice, if we restore "#include <stdlib.h>" to the test program, no more core dump! The p690 case is slightly more interesting, so here you go:

  iceflyer 288% cc -q32 -o ptrsz ptrsz.c
  iceflyer 289% ./ptrsz
    sizeof(p): 4
    malloc succeeded
    p= 'x'

  iceflyer 290% cc -qwarn64 -q64 -o ptrsz ptrsz.c
  iceflyer 291% ./ptrsz                 
    sizeof(p): 8
    malloc succeeded
    p= 'x'

You'll generally find these routines (at a minimum) declared in stdlib.h:

    abs        labs
    div        ldiv
    atof       atoi
    atol       strtod
    strtol     strtoul
    calloc     malloc
    realloc    free
    abort      exit
    system     getenv        
    bsearch    qsort        
    rand       srand        

9,984 Miles Per Gallon

An ARSC staffer stumbled on this while looking around at CFD examples. Gasoline powered "cars" approaching 10,000 mpg (on short flat test tracks, but still...):

Come on car buyers... let's demand better efficiency!

Quick-Tip Q & A

A:[[ Perl should make this easy... but it's driving me nuts!
  [[ In this example, I want to use search and replace to eliminate 
  [[ the bold html tags from some lyrics I've been working on, replacing 
  [[ the formerly emboldened text with the same text, prefaced by the 
  [[ word "really".  E.g.:
  [[        The weather is here, I <b>wish</b>
  [[        you were mine, the sky is <b>so 
  [[        cloudy</b>, I sleep and I pine.
  [[ Here's my perl script:
  [[    #!/usr/local/bin/perl -w
  [[    $all = join '', <>;
  [[    $all =~ s
really $1
  [[    print "$all";
  [[ The script puts the entire file into one string so it can search
  [[ across line breaks. The modifiers to "s" are:
  [[    g : match every occurance (not just the first)
  [[    s : match newline characters with "."
  [[    i : ignore case
  [[ Here's the output:
  [[        The weather is here, I really wish</b> 
  [[        you were mine, the sky is <b>so 
  [[        cloudy, I sleep and I pine.
  [[ You can see for yourself what happened.  Has anyone else ever had 
  [[ this problem?  What can I do?

# Thanks to Olivier Golinelli, Rich Griswold, and Steve Deitz.  Here
# are two of the three explanations.

  The symbol * matches the MAXIMAL number of characters.  You must
  replace the MINIMAL number of characters with *?

  Then :
    $all =~ s
really $1


The problem is that .* is greedy.  From the perlre manpage:

  By default, a quantified subpattern is "greedy", that is, it will match
  as many times as possible (given a particular starting location) while 
  still allowing the rest of the pattern to match.  If you want it to
  match the minimum number of times possible, follow the quantifier with
  a "?".  Note that the meanings don't change, just the "greediness":

           *?     Match 0 or more times
           +?     Match 1 or more times
           ??     Match 0 or 1 time
           {n}?   Match exactly n times
           {n,}?  Match at least n times
           {n,m}? Match at least n but not more than m times

Simply replacing .* with .*? in your regex will do the trick.

Q: What's an easier way to get the value of pi into my C/C++ or Fortran

[[ Answers, Questions, and Tips Graciously Accepted ]]

Current Editors:
Ed Kornkven ARSC HPC Specialist ph: 907-450-8669
Kate Hedstrom ARSC Oceanographic Specialist ph: 907-450-8678
Arctic Region Supercomputing Center
University of Alaska Fairbanks
PO Box 756020
Fairbanks AK 99775-6020
E-mail Subscriptions: Archives:
    Back issues of the ASCII e-mail edition of the ARSC T3D/T3E/HPC Users' Newsletter are available by request. Please contact the editors.
Back to Top