ARSC T3D Users' Newsletter 33, April 28, 1995

The scalar_fastmath Routines of benchlib

In newsletter #29 (3/31/95) I announced the availability of benchlib on the ARSC T3D. The sources for these libraries are available on the ARSC ftp server in the file:

The compiled libraries are also available on Denali in:

and the sources are available in: /usr/local/examples/mpp/src. In Newsletter #30 (4/7/95) I described the "pref" routine of lib_util.a, in this newsletter I cover the contents of lib_scalar.a. This library provides replacements for six of the Fortran intrinsics and an additional three routines that compute useful mathematical functions. The replacement routines are: sqrt, alog, cos, sin, exp and exponentiation (the ** operator). By using the same name as the library routine in /mpp/lib/libm.a, the user need only link in this library and the calls to these intrinsics will be satisfied with versions from the new library instead of the default library.

The three additional routines are:

  sqrti    - computes sqrt(x)/x or its mathematical equivalent
  coss_s   - a complex function  of a single real variable x,
             that returns a complex number (cos(x),-sin(x))
  cuberoot - computes x ** (1/3)
these routines have their own subroutine names.

Timings for both the default version in /mpp/lib/libm.a and lib_scalar.a are in the following table:

  Times in clock ticks per call (smaller is better)

                  libm.a         lib_scalar.a

  sqrt              235              122
  alog              244              159
  cos               369              186
  sin               200              177
  exp               198              152
  **               1400              291

                  equivalent     lib_scalar.a
                  Fortran code

  sqrti             296              134
  coss_s            399              176
  cuberoot         1673              597
I have not done extensive testing of the benchlib routines to see how accurate they are. For the test cases I tried, I didn't see any difference but I have a feeling that the default libraries are more accurate.


Most of the routines in libsci for the T3D are only single PE routines. But there are two extensions to the BLAS (Basic Linear Algebra Subroutines) that will become the the basis for parallel linear algebra routines. These two additional libraries are the BLACS (Basic Linear Algebra Communication Subroutines) and PBLAS (Parallel Basic Linear Algebra Subroutines). These two libraries can then be used to implement ScaLAPACK, the parallel (scalable) version of LAPACK (Linear Algebra PACKage). All of these libraries are or will be publicly distributed through Oak Ridge National Labs (ORNL).

I believe that these libraries will be the vehicle that CRI uses to provide parallel linear algebra routines on the T3D/T3E. CRI will probably provide optimized versions of the ScaLAPACK routines in libsci that use the same names and argument lists as the publicly available versions from ORNL. This has been the situation on CRI machines with BLAS, LINPACK, EISPACK, LAPACK, and PVM. For the users, this is a win/win situation, they get optimized routines on the T3D and their code uses routines for which there are portable, publicly available libraries.

So I think that time invested in ScaLAPACK is time that will payoff in the future. Last week in na-digest there was a request for input on ScaLAPACK and I'm passing that request onto our T3D users as a glimpse into where parallel linear algebra on the T3D/T3E is headed. There are already preliminary versions of some BLASC and PBLAS routines on the T3D and I'll describe them in future newsletters.

  > From: Jack Dongarra <>
  > Date: Wed, 19 Apr 1995 11:48:37 -0400
  > Subject: Band Systems Survey
  > Dear Colleagues,
  > As part of the ScaLAPACK project we intend to include software
  > for the solution of banded linear systems. Since algorithm
  > performance is greatly impacted by the characteristics of the
  > system being solved, we solicit your input to help us design our
  > software for the kinds of systems found in practice.
  > ScaLAPACK is a freely available software package for scalably
  > solving problems in numerical linear algebra on distributed
  > memory parallel computers. If you have a need to solve large
  > banded linear systems, now is the time to provide us with your
  > requirements.
  > We ask you to fill out the following short questionnaire.  If
  > you have colleagues(engineers, physicists, or researchers in
  > other disciplines) who may not normally read NA-NET but who may
  > be working with banded matrices, we encourage you to pass this
  > questionnaire on to them.
  > We hope to receive your responses within the next few weeks. A
  > summary of responses will be posted to NA-NET. 
  > Andy Cleary        Jack Dongarra           Xiaobai Sun          
  > Univ. of Tenn.     Univ. of Tenn./ORNL     Duke Univ.          
  > (For more information on ScaLAPACK, see: )
  > ________________________________________________________
   Questionnaire on the use of Banded Linear Systems  

  > --------------------------------------------------------
  > We are concerned with applications in which a requirement is the
  > solution of the linear system A*X = B in which A is an n x n
  > banded matrix. A matrix is banded if there are two parameters bl
  > and bu such that
  >    i > j & (i-j) > bl   ->   A(i,j) = 0, and 
  >    j > i & (j-i) > bu   ->   A(i,j) = 0. 
  > That is, elements in the lower triangle more than bl elements from 
  > the main diagonal are identically zero and likewise for elements in 
  > the upper triangle more than bu elements from the main diagonal.
  > Elements of the main diagonal may or may not be zero depending on
  > other characteristics of the problem. 
  > The matrices X and B are n x k matrices. The j-th column of X,
  > X(:,j), is the solution of the system with respect to the right
  > hand side B(:,j). 
  > If your work involves the solution of such systems, we would
  > like to ask you to fill out the following questionnaire and
  > e-mail to:
  > --------------------------Snip Here-----------------------------
  > Personal Details
  > 1) Your Name:
  > 2) Your Affiliation:
  > 3) The nature of your project/research:
  > Problem characteristics
  > 3) How many equations (n) do your banded linear systems typically have?
  > 4) How many diagonals above/below the main diagonal (bl, bu) are non-zero? 
  > 5) How many right-hand sides (k) do you typically have at once?
  > 6) Are your matrices symmetric/Hermitian? If so, are they
  >    positive definite ? 
  > 7) Are your matrices diagonally dominant ? Any other
  >    characteristics ? 
  > 7) Do you have any preference for iterative algorithms or direct algorithms ? 
  >    If direct methods are used, is partial pivoting required and 
  >    satisfactory for numerical stability?
  > Problem generation
  > 8) How do banded linear systems arise in your application, in
  >    particular, are the banded systems typically generated as stand-alone
  >    problems, or as a step in a longer series of calculations?
  > 10) Are you currently using parallel computers to solve these problems?
  > 11) If you are using a parallel computer for these problems,
  >     where are the matrices A, B and X located on entry into a
  >     system solution routine ? Are they  on disk(s), in a single
  >     processor's (or host's) memory, or distributed amongst the processors?
  >     If the latter, what distribution do you use?
  > ------------------------------

More on Makefiles

In last week's newsletter I gave an example of a makefile that changes enviromental variables. I was under the misconception that the SHELL of the makefile was always the shell that invokes the makefile. That's not correct. The default shell for make is the Bourne shell (/bin/sh). That's why export works in the example I gave. From within a makefile you can print out environmental variables using the commands:

      echo $$TARGET
      echo $$SHELL

The Switch to the 1.2 Programming Environment

A new feature of the Apprentice tool is that you are able to save the results of a "report" or "observation" to a text file. If anyone has trouble with this option please contact Mike Ess.

List of Differences Between T3D and Y-MP

The current list of differences between the T3D and the Y-MP is:
  1. Data type sizes are not the same (Newsletter #5)
  2. Uninitialized variables are different (Newsletter #6)
  3. The effect of the -a static compiler switch (Newsletter #7)
  4. There is no GETENV on the T3D (Newsletter #8)
  5. Missing routine SMACH on T3D (Newsletter #9)
  6. Different Arithmetics (Newsletter #9)
  7. Different clock granularities for gettimeofday (Newsletter #11)
  8. Restrictions on record length for direct I/O files (Newsletter #19)
  9. Implied DO loop is not "vectorized" on the T3D (Newsletter #20)
  10. Missing Linpack and Eispack routines in libsci (Newsletter #25)
  11. F90 manual for Y-MP, no manual for T3D (Newsletter #31)
I encourage users to e-mail in differences that they have found, so we all can benefit from each other's experience.
Current Editors:
Ed Kornkven ARSC HPC Specialist ph: 907-450-8669
Kate Hedstrom ARSC Oceanographic Specialist ph: 907-450-8678
Arctic Region Supercomputing Center
University of Alaska Fairbanks
PO Box 756020
Fairbanks AK 99775-6020
E-mail Subscriptions: Archives:
    Back issues of the ASCII e-mail edition of the ARSC T3D/T3E/HPC Users' Newsletter are available by request. Please contact the editors.
Back to Top