ARSC HPC Users' Newsletter 286, February 13, 2004

ARSC User Training Reminder

As previously announced, you're invited to attend any lectures of PHYS-683 "Core Skills in Computational Physics." ARSC staff and Physics faculty are teaching this course and, although users are always welcome to schedule individual help, it is only source of formal ARSC training this spring.

Next Tuesday is the first in a series of four talks on parallel programming. For the full schedule, see:

http://www.arsc.edu/support/training.html

Vectorizing a Recurrence

The vectorization of a loop can be inhibited by a variety of factors, including recurrence. A recurrence occurs when a calculation in one iteration depends on a result computed in an earlier iteration.

Imagine my confusion yesterday when the SX-6 compiler vectorized an "un-vectorizable" loop, and even produced the expected answer. Before getting into that, here's more background on recurrence.

The following loop creates a table of factorials, and shows a recurrence which should not vectorize:


       F(0) = 1
       do i = 1,N
         F(i) = F(i-1) * i
       enddo

The expected sequence of assignments is what you'd get in scalar mode. This is shown here, where F'(i) indicates that element "i" has been updated by the loop, and F(i) indicates that the element has not been updated and retains its original value:


  F'(1) = F(0) * 1
  F'(2) = F'(1) * 2
  F'(3) = F'(2) * 3
  ...
  F'(N) = F'(N-1) * N
This can be rewritten as:

  F'(1) =   F(0) * 1
  F'(2) =  (F(0) * 1) * 2
  F'(3) = ((F(0) * 1) * 2) * 3
  ...
  F'(N) = F(1) * PRODUCT( 1..N )

If this were processed in vector mode, the updated array elements wouldn't be available when needed. This is because the code calls for sequential elements, and in vector mode, entire chunks of the array are updated in single instructions. (On the X1, the nominal size of these "chunks" is 64, the length of the vector registers. On the SX-6, it's 256.) If you could trick the compiler into vectorizing the loop, the result would look like this:


  F'(1) = F(0) * 1
  F'(2) = F(1) * 2
  F'(3) = F(2) * 3
  ...
  F'(N) = F(N-1) * N

Here's a complete test program followed by output from a run:


       program fact
       implicit none
       integer,parameter :: N=15
       integer(kind=8), dimension(0:N) :: F
       integer :: i

       F(0) = 1
       do i = 1,N
         F(i) = F(i-1) * i
       enddo

       write (*,"(I)") (F(i),i=0,N)
       end

rime$ ./fact
                      1
                      1
                      2
                      6
                     24
                    120
                    720
                   5040
                  40320
                 362880
                3628800
               39916800
              479001600
             6227020800
            87178291200
          1307674368000

When the program is compiled on the X1, the ftn compiler issues the message we'd expect, that the loop is unvectorizable:


  KLONDIKE$ ftn -Onegmsgs,msgs -o fact fact.f

         do i = 1,N
  ftn-6005 ftn: SCALAR FACT, File = fact.f, Line = 8 
    A loop starting at line 8 was unrolled 8 times.
  ftn-6254 ftn: VECTOR FACT, File = fact.f, Line = 8 
    A loop starting at line 8 was not vectorized because a recurrence was
    found on "F" at line 9.
  ftn-6751 ftn: STREAM FACT, File = fact.f, Line = 8 
    A loop starting at line 8 was not multi-streamed because a recurrence
    was found on "F" at line 9.

But, as mentioned, the SX-6 has some magic. Here's the SX-6 compiler message:


  rime$ f90 -Wf"-pvctl fullmsg"  -o fact fact.f           

  fact.f:

  f90: vec(1): fact.f, line 8: Vectorized loop.
  f90: opt(1037): fact.f, line 9: Feedback of array elements.
  f90: vec(26): fact.f, line 9: Macro operation Iteration.
  f90: fact.f, fact: There are 3 diagnoses.

For more information on compiler messages, check the online SX-6 Fortran User's Guide. We find this:


--  1037 : Feedback of array elements. 

      Cause:
        Same array element is referenced/defined on another loop pass. 
      Action:
        Vectorization or parallelization oriented optimization is not
        performed.


    
--  26 : Macro operation /1/.

      Cause:
        Vectorization is performed using /1/ as a macro operation.
      Action:
        Vectorization is performed.

Here's the explanation of "macro operations" and in particular "Iteration" (which is only one of several such operations):


5.1.2.4  MACRO OPERATIONS


    Although patterns like the following do not satisfy the
    vectorization conditions for definitions and references, the
    compiler recognizes them to be special patterns and performs
    vectorization by using proprietary vector instructions. See the
    following examples:

    ...

    
Iteration:


      An iteration shown in the following example is vectorized unless X
      is a complex type.

         X(I)=<exp> +- X(I-1)
         X(I)=<exp> * X(I-1)
         X(I)=<exp1> +- X(I-1) * <exp2>
         X(I)=(<exp1> +- X(I-1)) * <exp2>

      An iteration like the following example consists of multiple
      statements and is vectorized.

         t=<exp1> +- X(I-1)
         X(I)=t * <exp2>

The compiler clearly knows that the recurrence can't be vectorized normally, but it has a special vector instruction available for just this situation.

Here's a very similar loop for which the magic doesn't work. Both the X1 and SX-6 refuse to vectorize this:


       VL = 2
       F(0:VL-1) = 1
       do i = VL,N
         F(i) = F(i-VL) * i
       enddo

This begs the question, if the machine could use a vector length of two instead of 64 (on the X1) or 256 (on the SX-6), couldn't it vectorize the loop and still get the correct answer? This topic deserves a followup article, next issue.

Book Review: "The Cogwheel Brain"

[ Thanks to Guy Robinson for another book review. ]

Those who were interested in the LEO computer project (see review in the last newsletter), and enjoy reading about history, might consider the following two books about an earlier age:

"The Cogwheel Brain", Doron Swade, ISBN 0-349-11239-8 "The Difference Engine", Doron Swade, ISBN 0-670-91020-1

The author, Doron Swade, was the person behind the six year long project to reconstruct the machines designed by Charles Babbage. In these books he presents a very thorough description of the machines themselves, and the original efforts of Babbage in the attempt to build a first complete system. He also chronicles the challenges he faced in the construction of the modern version, currently found in the Science Museum in London.

In response to my review of LEO, some folks commented that British technological efforts always seem to end in failure. In "The Backroom Boys: The Secret Return of the British Boffin," Francis Spufford, ISBN 0-571-21496-7, a number of more successful projects are described.

Starting in the same era as the LEO project, it traces how British science and engineering moved from being a large scale effort of aircraft and rockets to being a computing and biotech based enterprise. The book is a fine mixture of talks of scientists and engineers having ideas and trying to "make a go of them," selling ideas to those with the money, or indeed, just getting support of ideas. It is a collection of tales which shows the huge variety of approaches within industry to research and development.

Lecture Next Tuesday: Collaborative Research at Purdue / UAF

You are invited to attend the following open lecture next Tuesday:

TITLE:

Interdisciplinary Research and Computational Infrastructure at Purdue University and Opportunities for Collaboration with the University of Alaska Fairbanks

WHEN:

1-2 pm Tuesday, February 17, 2004

WHERE:

Elvey Auditorium, 214 Elvey Building

SPEAKER:

Gilbert L. Rochon, Ph.D., MPH Associate Vice President for Collaborative Research & Engagement Information Technology at Purdue University

TOPIC:

Recent advances in pedagogical policy, capital infusion for construction and computing infrastructure, and strategic focus on cluster faculty hiring has transformed the climate for attaining research preeminence at Purdue University. Accordingly, Purdue has won national recognition as one of the top venues for conducting research, surpassed only by Fox Chase Cancer Center in Philadelphia and preceding Yale University, to round out the top three. Rochon's presentation describes key aspects of the collaborative research climate, with an emphasis on specific potential opportunities for a strategic multidisciplinary partnership between the University of Alaska Fairbanks and Purdue University, two of the nation's Land, Sea, and Space Grant Universities. Discussions will include potential collaboration between the Arctic Region Supercomputing Center and Purdue's Rosen Center for High Performance Computing, between the Alaska SAR Facility, Purdue's Laboratory for Applications of Remote Sensing (LARS) and the nascent Purdue Terrestrial Observatory (PTO). Related opportunities for potential coalescence described herein include the NSF Extensible Teragrid Facility (ETF), USGS IndianaView Program, the Envision Center for Data Visualization & Perceptualization, the Purdue Discovery Park, the NanoHub, the Crisis Grid/Disaster Informatics Center of Excellence initiative, the Center for Education and Research in Information Assurance and Security (CERIAS), the Computer Research Institute (CRI), and an array of engagement and outreach initiatives.

About Gilbert Rochon:

Gilbert L. Rochon is the Associate Vice President for Collaborative Research and Engagement at Purdue University. His primary research interests relate to remote sensing, visualization, GIS and GPS applications to urban and regional environmental sustainability under threat by anthropogenic impact and biogenic disasters.

Hosted by the Geophysical Institute and the Arctic Region Supercomputing Center, UAF. Check here for any changes:

http://www.arsc.edu/news/archive/purdue.html

ARSC Spellers Win the BizBee... Again

Last night, ARSC's team took 1st place, for the third time in 9 years, in the annual "Biz-Bee", a spelling-bee fund raiser for the Literacy Council of Alaska. The 2004 "Crayons" consisted of Dale Clark, Kate Hedstrom, and Greg Newby.

The winning words were:


  kathak - an intricate dance of northern India that includes passages
           of narrative pantomime

  manes - the spirits of the dead and gods of the lower world in 
          ancient Roman belief

They also spelled:

  lansquenet
  chevesaile
  terpsichorean
  anaglyph
  eutaxy
  icarian
  odyssey
  tephra
  hardanger

Congrats, Crayons!

Quick-Tip Q & A



A:[[ After many years of abuse, the multiply unit of my favourite 
  [[ calculator finally gave out. 
  [[
  [[ As I've been tasked to compute the squares of all numbers from 
  [[ 1 to 100, I'm really not looking forward to typing in 99 and then
  [[ +99, 98 times or pressing the repeat key. I guess I'd also have
  [[ to repeat the calculation to make sure I'd really got the right
  [[ answer to add to the effort! Any suggestions as to a better way
  [[ to do this with fewer key presses and perhaps less chances of
  [[ making a mistake?
  [[ 
  [[   ps:  My calculator is pretty simple, it doesn't have a log
  [[        function.  ;-)


  #
  # From Rich Griswold: 
  #

  It's a little hard to respond to an email without using a computer! :)


  # 
  # Andrew Markiel, Rich Griswold, Hank Happ, Matt MacLean, and Tom Logan 
  # all responded with some version of the following. This is Tom Logan's:
  #

  Here's my vote for a solution.  I'll be interested to see is anyone
  else comes up with a method for fewer keystrokes =
    1 + 4*2 + 45*3 + 50*4 = 344 total key presses to compute all of the
  squares from 1 to 100.

  ------

  First off, lets find a recurrence relation for our function, SQ():
    We know,
        SQ(i) = i^2 
        SQ(i-1) = (i-1)^2 = i^2 -2i +1 
    Thus,
        SQ(i) - SQ(i-1) = 2i-1, or

  *** SQ(i) = SQ(i-1)+2i-1 *** 
  For initial conditions, we have SQ(1) = 1.

  By noting that the sequence 2i-1 is {3,5,7,...}, we find the
  calculator keystrokes are really quite simple:

    i        keystrokes        value = SQ(i)
    ------------------------------------------
    1        1                1
    2        +3               4
    3        +5               9
    4        +7               16
    5        +9               25
    6        +11              36
    ...       ...             ...
    97       +193             9409
    98       +195             9604
    99       +197             9801
    100      +199             10000
    ------------------------------------------


  # 
  # Jed Brown's answer keeps us on our toes...
  #

  The divide function still works right?  99^2 = 99/(1/99)

  But really, use a computer.  
  % echo '(1:100).^2' 
 octave




Q: A word processor I use on another operating system is always
   changing what I type.  For instance "..." becomes one character which
   looks like three dots spaced a little differently.  Text export of a
   file containing these things doesn't fix them.  How do I get rid of
   these when I ftp the file to my Unix box, where my "..." now looks
   like "\311" ?

[[ Answers, Questions, and Tips Graciously Accepted ]]


Current Editors:
Ed Kornkven ARSC HPC Specialist ph: 907-450-8669
Kate Hedstrom ARSC Oceanographic Specialist ph: 907-450-8678
Arctic Region Supercomputing Center
University of Alaska Fairbanks
PO Box 756020
Fairbanks AK 99775-6020
E-mail Subscriptions: Archives:
    Back issues of the ASCII e-mail edition of the ARSC T3D/T3E/HPC Users' Newsletter are available by request. Please contact the editors.
Back to Top