ARSC T3D Users' Newsletter 4, September 16, 1994

Data Size Types

The combination of the Cray-Y-MP and Cray-T3D has to be a textbook case for 'heterogeneous' computing. And even though both machines are based on '64 bit processors' the differences in data types are very 'hetero'. Here are some differences I know of, maybe users can send in their own:

  1. C data types
  2. Fortran data types
  3. Floating point formats

C Data Types

Rather than search the manuals its sometimes easier to make up a test case and just run it. Recently I've tried out:

    short tshort;
    int tint;
    long tlong;
    float tfloat;
    double tdouble;
    tshort = tint = tlong = 1;
    tfloat = tdouble = 1.0;
    printf( "sizeof(tshort) is \t%d bytes\n", sizeof( tshort ) );
    printf( "sizeof(tint) is \t%d bytes\n", sizeof( tint ) );
    printf( "sizeof(tlong) is \t%d bytes\n", sizeof( tlong ) );
    printf( "sizeof(tfloat) is \t%d bytes\n", sizeof( tfloat ) );
    printf( "sizeof(tdouble) is \t%d bytes\n", sizeof( tdouble ) );
On the Y-MP we get:

  sizeof(tshort) is       8 bytes
  sizeof(tint) is         8 bytes
  sizeof(tlong) is        8 bytes
  sizeof(tfloat) is       8 bytes
  sizeof(tdouble) is      8 bytes
On the T3D we get:

  sizeof(tshort) is       4 bytes
  sizeof(tint) is         8 bytes
  sizeof(tlong) is        8 bytes
  sizeof(tfloat) is       4 bytes
  sizeof(tdouble) is      8 bytes
The two data types that differ, short and float, have to be handled differently on each machine. One of our users had a C calling Fortran program that worked on the M98 but failed on the T3D. The problem was traced to a float on the C side was being passed to the Fortran side and the Fortran subroutine just naturally expected a 64 bit quantity. Copying the float to a temporary double in C and passing that variable solved the problem.

(Looking at the C environment on Y-MP can take the wind out of any assumptions a C programmer might have of data type sizes.)

Fortran Data Types

Again, I used a program to interrogate the machine itself. On the T3D I used:

  c     real*2  r2         ! not allowed
        real*4  r4
        real*8  r8
  c     real*16 r16        ! not yet implemented
  c     double precision d ! not yet implemented
        real r
        r4 = 2.0
        r8 = 2.0
        r  = 2.0
        do 10 i = 1, 1026
          r4 =  r4 / 2
          r8 =  r8 / 2
          r  =  r  / 2
        print *, i, r4, r8, r
     10 continue
The commented out statements are the result of compiler generated messages. The output looks like:

  1,  1.,  1.,  1.
  2,  0.5,  0.5,  0.5
  3,  0.25,  0.25,  0.25
  4,  0.125,  0.125,  0.125
  5,  6.25E-2,  6.25E-2,  6.25E-2
  6,  3.125E-2,  3.125E-2,  3.125E-2
  7,  1.5625E-2,  1.5625E-2,  1.5625E-2
  8,  7.8125E-3,  7.8125E-3,  7.8125E-3
  9,  3.90625E-3,  3.90625E-3,  3.90625E-3
  10,  1.953125E-3,  1.953125E-3,  1.953125E-3


  1022, 4.45014771701440277E-308, 4.45014771701440277E-308, 4.45014771701440277E-308
  1023, 2.22507385850720138E-308, 2.22507385850720138E-308, 2.22507385850720138E-308
  1024, 0.,  0.,  0.
It is surprising that real*4 is accepted by the compiler without a warning message, but then implemented the same as real*8 and real. An independent check confirmed that real*4 uses as much storage as real (i.e., 64 bits).

On the M98, the final test program looked like:

  c     real*2  r2         ! not allowed
        real*4  r4
        real*8  r8
        real*16 r16       
        double precision d 
        real r
        r4 = 2.0
        r8 = 2.0
        r  = 2.0
        r16 = 2.d0
        d = 2.d0
        do 10 i = 1, 30000
          r4 =  r4 / 2
          r8 =  r8 / 2
          r  =  r  / 2
          r16 = r16 / 2
          d  =  d  / 2
        print *, i, r4, r8, r, r16, d
     10 continue
and the sample output like:

  1,  3*1.,  2*1.E+0
  2,  3*0.5,  2*5.E-1
  3,  3*0.25,  2*2.5E-1
  4,  3*0.125,  2*1.25E-1
  5,  3*6.25E-2,  2*6.25E-2
  6,  3*3.125E-2,  2*3.125E-2
  7,  3*1.5625E-2,  2*1.5625E-2
  8,  3*7.8125E-3,  2*7.8125E-3
  9,  3*3.90625E-3,  2*3.90625E-3
  10,  3*1.953125E-3,  2*1.953125E-3


  8189,  3*1.4668830940439E-2465,  2*1.4668830940438777324971299136E-2465
  8190,  3*7.3344154702194E-2466,  2*7.3344154702193886624856495682E-2466
  8191,  3*0.,  2*0.
So on the Y-MP both the 64-bit and 128-bit reals are implemented and the real*4 is implemented in 64 bits.

Floating Point Formats

From the behavior of the loops in the above test programs we see that the floating point formats are different. In future newsletters I'll go into the details of these differences.

LAPACK on the T3D

A user of the ARSC T3D asked about the status of LAPACK on the T3D. This was because not all of the LAPACK routines are library in /mpp/lib/libsci.a. A guess can be made of which routines are in /mpp/lib/libsci.a by doing:

  ar -t /mpp/lib/libsci.a 
Comparing this list to the list of routines in the public domain version of lapack (source available from anonymous ftp from we get an explicit list of missing routines from /mpp/lib/libsci.a that are in the public domain version:

  cbdsqr.o cgbtf2.o cgebak.o cgebal.o cgebd2.o cgebrd.o cgees.o  cgeesx
  cgeev.o  cgeevx.o cgegs.o  cgegv.o  cgehd2.o cgehrd.o cgelq2.o cgels.o
  cgelss.o cgelsx.o cgeql2.o cgeqpf.o cgeqr2.o cgerq2.o cgesvd.o cgetf2.o
  cggbak.o cggbal.o cggglm.o cgghrd.o cgglse.o cggqrf.o cggrqf.o cggsvd.o
  cggsvp.o chbev.o  chbevx.o chbtrd.o cheev.o  cheevx.o chegs2.o chegst.o
  chegv.o  chetd2.o chetf2.o chetrd.o chgeqz.o chpev.o  chpevx.o chpgst.o
  chpgv.o  chptrd.o chsein.o chseqr.o clabrd.o clacgv.o clacon.o clacpy.o
  clacrt.o cladiv.o claein.o claesy.o claev2.o clags2.o clagtm.o clahef.o
  clahqr.o clahrd.o claic1.o clangb.o clange.o clangt.o clanhb.o clanhe.o
  clanhp.o clanhs.o clanht.o clansb.o clansp.o clansy.o clantb.o clantp.o
  clantr.o clapll.o clapmt.o claqgb.o claqge.o claqsb.o claqsp.o claqsy.o
  clar2v.o clarfx.o clargv.o clartg.o clartv.o clascl.o claset.o clasr.o
  classq.o claswp.o clasyf.o clatbs.o clatps.o clatrd.o clatrs.o clatzm.o
  clauu2.o clauum.o clazro.o cpbtf2.o cpotf2.o cpteqr.o csrscl.o cstein.o
  csteqr.o csytf2.o ctgevc.o ctgsja.o ctrevc.o ctrexc.o ctrsen.o ctrsna.o
  ctrsyl.o ctrti2.o ctzrqf.o cung2l.o cung2r.o cungbr.o cunghr.o cungl2.o
  cungr2.o cungtr.o cunm2l.o cunm2r.o cunmbr.o cunmhr.o cunml2.o cunmr2.o
  cunmtr.o cupgtr.o cupmtr.o icmax1.o
  sbdsqr.o scsum1.o sgbtf2.o sgebak.o sgebal.o sgebd2.o sgebrd.o sgees.o
  sgeesx.o sgeev.o  sgeevx.o sgegs.o  sgegv.o  sgehd2.o sgehrd.o sgelq2.o
  isgels.o  sgelss.o sgelsx.o sgeql2.o sgeqpf.o sgeqr2.o sgerq2.o sgesvd.o
  sgetf2.o sggbak.o sggbal.o sggglm.o sgghrd.o sgglse.o sggqrf.o sggrqf.o
  sggsvd.o sggsvp.o shgeqz.o shsein.o shseqr.o slabad.o slabrd.o slacon.o
  slacpy.o sladiv.o slae2.o  slaebz.o slaein.o slaev2.o slaexc.o slag2.o
  slags2.o slagtf.o slagtm.o slagts.o slahqr.o slahrd.o slaic1.o slaln2.o
  slangb.o slange.o slangt.o slanhs.o slansb.o slansp.o slanst.o slansy.o
  slantb.o slantp.o slantr.o slanv2.o slapll.o slapmt.o slapy2.o slapy3.o
  slaqgb.o slaqge.o slaqsb.o slaqsp.o slaqsy.o slaqtr.o slar2v.o slarfx.o
  slargv.o slartg.o slartv.o slaruv.o slas2.o  slascl.o slaset.o slasr.o
  slassq.o slasv2.o slaswp.o slasy2.o slasyf.o slatbs.o slatps.o slatrd.o
  slatrs.o slatzm.o slauu2.o slauum.o slazro.o sopgtr.o sopmtr.o sorg2l.o
  sorg2r.o sorgbr.o sorghr.o sorgl2.o sorgr2.o sorgtr.o sorm2l.o sorm2r.o
  sormbr.o sormhr.o sorml2.o sormr2.o sormtr.o spbtf2.o spotf2.o spteqr.o
  srscl.o  ssbev.o  ssbevx.o ssbtrd.o sspev.o  sspevx.o sspgst.o sspgv.o
  ssptrd.o sstebz.o sstein.o ssteqr.o ssterf.o sstev.o  sstevx.o ssyev.o
  ssyevx.o ssygs2.o ssygst.o ssygv.o  ssytd2.o ssytf2.o ssytrd.o stgevc.o
  stgsja.o strevc.o strexc.o strsen.o strsna.o strsyl.o strti2.o stzrqf.o
(On denali, the /lib/libsci.a is not in a format that allows "ar" to extract the deck names.)

To provide a work-around for this situation, I compiled the public domain sources of lapack and have placed a T3D version of the lapack library in:

I am running the extensive testing and timing programs provided in the public domain distribution and the results are correct. ...

Users can now link their program as follows:

  mppldr  ...   /mpp/lib/libsci.a /user/local/examples/mpp/lib/lapack.a
This ensures that the available libsci lapack routines are used before the public domain lapack routines. Only the single precision (64 bit) and complex versions are available because the cf77 compiler doesn't support double precision or double complex.

>>> An Error in the Public Domain Source for LAPACK <<<

The /mpp/bin/cf77 compiler discovered an error in the routines SLAGS2 and CLAGS2 that was easy to correct. The cf77 and cft77 compilers by default flag uses of a variable that has not been set. (Neither cf77 or cft77 on the M98 detected these errors.) I added the obviously missing lines and added the corrected versions for the T3D lapack.a.

These routines are in the list of Lapack routines missing from /mpp/lib/libsci.a.

Next week we'll finish the calling C from Fortran series and start looking at program sizes.

Current Editors:
Ed Kornkven ARSC HPC Specialist ph: 907-450-8669
Kate Hedstrom ARSC Oceanographic Specialist ph: 907-450-8678
Arctic Region Supercomputing Center
University of Alaska Fairbanks
PO Box 756020
Fairbanks AK 99775-6020
E-mail Subscriptions: Archives:
    Back issues of the ASCII e-mail edition of the ARSC T3D/T3E/HPC Users' Newsletter are available by request. Please contact the editors.
Back to Top