ARSC T3D Users' Newsletter 105, September 20, 1996

Test of KAI C++ Compiler

[ One of our Research Assistants, Shawn Houston, installed and evaluated the KAI T3D C++ Compiler on ARSC systems. Here's a summary of his report. Send me email, and I'll provide the test programs and makefiles.]

The KAI C++ compiler is actually a set of optimization tools and a front end to the Cray C compiler. The compiler runs under a script, KCC, that controls the compilation process. The script preprocesses, compiles, and links the source files. Next, it decompiles the linked program, and reprocesses them, then relinks. I used the latest release of the compiler, version 2.9, which is not quite a complete implementation of the C++ working paper. I assume that KAI version 3.0 will be a complete implementation.

Uniprocessor Test:

The KAI compiler comes with a test program, "caxpy.C," that performs some complex valued arithmetic using vectors and times the operations. I compiled the code with both the KAI compiler and the Cray C++ compilers. I ran the programs produced with identical parameters on one PE.


The KAI executable for this one vector-oriented code is faster but the KAI executable is considerably larger.

    KAI executable: 622632 bytes
    Cray executable: 228568 bytes

Kai caxpy output times: "caxpy 1000 1000"
    Time for caxpy1 = 0.379962 seconds [21.0547 Mflops]
    Time for caxpy2 = 0.319968 seconds [25.0025 Mflops]
    Time for caxpy3 = 0.440056 seconds [18.1795 Mflops]

Cray CC output times: "caxpy 1000 1000"
    Time for caxpy1 = 0.39996 seconds [20.002 Mflops]
    Time for caxpy2 = 0.39996 seconds [20.002 Mflops]
    Time for caxpy3 = 1.40006 seconds [5.71404 Mflops]

Multiprocessor Test:

How does the KAI compiler fare in a multiprocessor program? I do not know. I obtained a copy of a small test program from Tom Baring and converted it from C to C++. I compiled it using the Cray CC compiler, and ran the program on 2 and 4 pes, getting output which agreed with that of the original C code.

After several frustrating days I could not get the KAI compiler to compile this one code. It uses the shmem libraries and some intrinsic functions. In all fairness to KAI, I may have installed the compiler wrong, or not understood how to get the KAI compiler to recognize the mpp headers and explicit functions, such as barrier.

KAI C++ Ver 3.0 News Release

Champaign, Illinois - Kuck & Associates, Inc. (KAI(TM)) announces the immediate availability of Version 3.0 of the KAI C++ compiler. This release of KAI C++ includes draft-standard C++ class libraries, support for member templates, and new usability features for large codes. KAI C++ provides the latest C++ features, runs on every major UNIX workstation and the Cray T3D, and delivers significant computational performance enhancements.

KAI C++ Version 3.0 replaces KAI earlier C++ compiler and now includes the established KAI trademark as a part of the name to ensure that KAI C++ is not confused with any other software presently on the market.

New Features & Improvements

KAI C++ Version 3.0 contains the following new features and improvements:

  • Near draft-standard C++ class libraries, not just STL and iostreams
  • Support for member templates
  • Optimization of template expressions
  • Building of libraries that contain template instantiations

Programmer Productivity

The powerful features of KAI C++ make programmers more efficient. The compiler advanced optimizations allow programmers to take full advantage of object-oriented design and software reuse without worrying about performance. KAI C++ makes objects almost as efficient as hand-coded C. Programmers will spend less time trying to correct performance problems, and instead deliver more code that is intuitive and easy to maintain.

Bruce Leasure, Vice President of Technology and senior product manager for KAI C++,emphasizes that KAI C++ will enable programmers to port code to all of the supported platforms without having to re-write it to conform to a different compiler implementation of C++ features. This makes KAI C++ essential for anyone who wants fast, cross-platform portability.

Updates & New Floating Licenses

Customers can immediately update any earlier versions to KAI C++ Version 3.0. There is no charge for customers who have an existing support service agreement or for customers who purchased Version 2 after April 1, 1996. Also, KAI C++ is now available with a Floating License on the SPARC Solaris and IBM AIX systems. Please visit the KAI C++ web page for additional information: .

Supported Platforms

KAI C++ is the only high-performance C++ compiler that developers can use across all of these development and production systems: Digital Alpha UNIX, HP 9000 UX, IBM RS/6000 AIX, SGI Irix (32 and 64 bit), SPARC-based Solaris 2 and Cray T3D.

About Kuck & Associates, Inc.

Kuck & Associates, Inc. (KAI) is internationally known for its leading-edge optimization software. This software enables developers of C/C++ and FORTRAN programs to exploit the high performance of a broad spectrum of advance computer architectures. Customers include most of the prominent U.S. computer manufacturers and many compiler companies. These companies either offer our products directly or incorporate our products into their own to bring the latest optimization technology to their end-users.

KAI optimization products are available for personal computers, workstations and supercomputers. Founded in 1979, KAI employs about 35 computer science professionals.

Copyright 1995-1996 by Kuck & Associates, Inc. All rights reserved. KAI and KAI C++ are trademarks of Kuck & Associates, Inc.


I thought I'd try timing the barrier function using essentially the same program given last week for eurekas. It won't obtain the actual time for barrier release, as it makes extra calls needed for testing eurekas, but is interesting for comparing eurekas and barriers. A couple of notes about this exercise:

  1. In addition to the well-known BARRIER subroutine, CRI provides finer control over this means of synchronizing your code, via the functions:
        SET_BARRIER()   - Registers the arrival of a task at a barrier
        WAIT_BARRIER()  - Suspends task execution until all tasks 
                          arrive at the barrier
        TEST_BARRIER()  - Tests a barrier to determine its state (set or 
    The generic BARRIER() call is simply a call to SET_BARRIER() followed immediately by a call to WAIT_BARRIER():
        BARRIER()       - Registers the arrival of a task at a barrier 
                            and suspends task execution until all other 
                            tasks arrive at the barrier
    Using SET, TEST, and WAIT, you might be able to program some PEs to continue doing useful work while waiting for the remaining PEs to reach the barrier:
           call SET_BARRIER() 
           do while (.NOT. TEST_BARRIER()) 
             do_work ()
    There are 'man' pages for these functions.
  2. My test program, given below, hangs a quarter to half the time. I think the problem is that the barrier calls are too close together, and once in a while a "set" goes undetected at some process' "wait." The traceback (after killing the job) indicates that the job did hang in a barrier call:
    Beginning of Traceback (PE 7):
    Started from address 0x20000c0804 in routine '_sma_deadlock_wait'. Called from line 78 (address 0x20000c0a20) in routine 'barrier'. Called from line 77 (address 0x20000007e0) in routine 'BARRIER_TIMINGS'. Called from line 363 (address 0x2000004628) in routine '$START$'.
    End of Traceback.
    Of course, in a real program, you would avoid barriers as much as possible, and be unlikely to call BARRIER right after SET_BARRIER.

      Program barrier_timings

      implicit none 
      integer trigger_PE         ! Which PE will now trigger event
      integer mc(128)            ! Array to store system info 
      integer MY_PE              ! Intrinsic function to get PE number 
      integer mem_event          ! Shared variable for memory-mode event
      real t1                    ! Temporary storage of start times 
      real t2                    ! Temporary storage of end times 
      real junk
      real delay_start           ! For simulated work, start of spin
      real irtc                  ! Internal function, clock ticks 
      real cp                    ! Clock period in secs
      logical test_event         ! Internal function
      logical test_barrier       ! Internal function
      intrinsic MY_PE
cdir$ shared mem_event

      call gethmc (mc)
      cp = mc(7) * 1.0e-12      ! convert picosecs to secs.

c     Time event propagation when using eureka-mode events
      if (MY_PE() .EQ. N$PES-1) then 
        write (6,1000) "EUREKA-MODE "
        call flush (6)

      do trigger_PE = 0, N$PES - 1
        call clear_event ()        ! In Eureka mode, all PEs must clear

        call barrier ()  ! Make sure all PEs ready to watch for event
        if (MY_PE() .EQ. trigger_PE) then

          ! Kill .1 secs to simulate some work
          delay_start = irtc ()
5         if (irtc() .LT. delay_start + 0.1 / cp) goto 5

          t1 = irtc ()
          call set_event ()         ! Trigger event
          call barrier ()           ! Wait till all PEs detect event
          t2 = irtc ()

          write (6, 1010) MY_PE(), (t2-t1) * cp * 1e6
          call flush (6)
10        if (.NOT. test_event()) goto 10

          ! Inform triggering PE that 1st barrier release was detected
          call set_barrier ()

c     Now use barrier

      if (MY_PE() .EQ. N$PES-1) then 
        write (6,1000) "BARRIER"
        call flush (6)

      do trigger_PE = 0, N$PES - 1

        call barrier ()
        if (MY_PE() .EQ. trigger_PE) then

          delay_start = irtc ()
105       if (irtc() .LT. delay_start + 0.1 / cp) goto 105

          t1 = irtc ()
          call set_barrier ()      ! Trigger release of barrier
          call barrier ()          ! Wait till all PEs detect release
          t2 = irtc ()

          write (6, 1010) MY_PE(), (t2-t1) * cp * 1e6
          call flush (6)
          call set_barrier ()      ! All non-trigger PEs pass barrier

          ! Spin until trigger PE does its set_barrier
110       if (.NOT. test_barrier()) goto 110
          ! Inform triggering PE that 1st barrier release was detected
          call set_barrier ()

1000  format (a,/,"Event_PE ", " Delay(usecs)")
1010  format (i4, "       ", f6.2)

Output from one run on 8 PEs:

Event_PE  Delay(usecs)
   0        10.04
   1         9.46
   2         9.55
   3        10.17
   4         9.24
   5         9.68
   6        11.28
   7         9.73
Event_PE  Delay(usecs)
   0         6.48
   1         6.37
   2         6.21
   3         6.29
   4         6.30
   5         6.44
   6         6.37
   7         6.53


Quick-Tip Q & A

A: {{ In ftp, can you "more" a remote file -- before you "get" it?  How? }}

    # Thanks for reader response.

    # To "more" the remote file, 'tst.remote', use either of:
    ftp> get tst.remote -
    ftp> get tst.remote "

    # (Bonus answer!)  To "more" the local file, 'tst.local':
    ftp> ! more tst.local

Q: If you must look at computers all day (for years), how can you
   reduce eye-strain?

[ Answers, questions, and tips graciously accepted. ]
Current Editors:
Ed Kornkven ARSC HPC Specialist ph: 907-450-8669
Kate Hedstrom ARSC Oceanographic Specialist ph: 907-450-8678
Arctic Region Supercomputing Center
University of Alaska Fairbanks
PO Box 756020
Fairbanks AK 99775-6020
E-mail Subscriptions: Archives:
    Back issues of the ASCII e-mail edition of the ARSC T3D/T3E/HPC Users' Newsletter are available by request. Please contact the editors.
Back to Top