ARSC T3D Users' Newsletter 99, August 9, 1996

Timing Programs on Multiple Cray Platforms

[ Thanks to Mark Dalton of CRI in Los Alamos for sending this in. I have made a few changes for the sake of the Newsletter. ]

This is the way to convert irtc to Mflops, or whatever you wish, on any of the current Crays (YMP, C90, T3D, J90, T90). The big trick is in the function 'get_cp'.


ccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc
c Sample timing program
        program main
        implicit none

        integer startT,endT,diff,N,i
        real cp,Mflops,flops,eps,a,x
        integer irtc
        real slamch, get_cp

        parameter (N=1000000)
        eps = slamch('E')  ! machine epsilon

        a = 1.0 - eps
        print*, "machine epsilon (eps): ", eps
        print*, "1.0 - eps: ", a

        x = a

        startT = irtc()
C Insert your code here
        do i=1,N
          x = x * a
        enddo
C End of your code to be timed
        endT = irtc()

        print*, "(1.0 - eps) ^ ", N, " = ", x
        print*, ""


        diff=endT - startT
        cp = get_cp()

        print*, "nticks: ", diff, " cp: ", cp

        flops  = N / (cp * diff) 
        Mflops = flops / 1e6
        print *,'Mflops = ',Mflops, ' Time (seconds) = ', cp * diff
        end

ccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc

        Real Function get_cp()
        Integer mc(128)
        real nanoseconds
        call gethmc(mc)  ! returns clock period in picoseconds into mc(7)
        clock_period = mc(7)*1.0e-12 ! convert to seconds
        nanoseconds = mc(7)*1.0e-3
        print *,'System Clock Period (nanoseconds): ',nanoseconds
        print *,'System Clock Speed (Mhz): ', (1000/nanoseconds)
        get_cp = clock_period
        end
 
ccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc

###################### makefile ############################

FF=-O0

all:    t3d ymp

t3d:    prog.f 
        TARGET=cray-t3d MPP_NPES=1 /mpp/bin/cf77 $(FF) prog.f -o t3d.exe

ymp:    prog.f
        TARGET=target cf77 $(FF) prog.f -o ymp.exe
############################################################

For these compiles, I turned off optimization. When optimized, however, the Y-MP binary gives an apparent rate of about 150 mflop/s, and the T3D binary gives an apparent rate of about 200,000 mflop/s (teraflops!?), suggesting that the compiler figured out that I was computing a**N, and replaced the loop with a single call. (For more on machine epsilon, see Newsletter #9 and 'man slamch.')

Here are run outputs from the ARSC system:


denali$ t3d.exe
  machine epsilon (eps): 2.22044604925031308E-16
  1.0 - eps: 0.99999999999999978
  (1.0 - eps) ^ 1000000 = 0.99999999977795517
        
  System Clock Period (nanoseconds): 6.6660000000000004
  System Clock Speed (Mhz): 150.01500150015002
  nticks: 53217067 cp: 6.66599999999999993E-9
  Mflops = 2.8189265203238278 Time (seconds) = 0.354744968622

denali$ ymp.exe
  machine epsilon (eps): 1.4210854715202E-14
  1.0 - eps: 0.9999999999999858
  (1.0 - eps) ^ 1000000 = 0.9999999857891311
                
  System Clock Period (nanoseconds): 6.000000000000028
  System Clock Speed (Mhz): 166.6666666666661
  nticks: 129673718 cp: 6.000000000000027E-9
  Mflops = 1.285277149735656 Time (seconds) = 0.7780423080000034

Frontiers '96 Advance Program

[ Note: I've shortened this quite a bit, taken out author lists, lunch menus, etc... but left the names of presentations and programs. If you want the original, let me know, and I'll e-mail it to you. ]

Frontiers '96 Advance Program

The Sixth Symposium on The Frontiers of Massively Parallel Computation

October 27 - 31, 1996 Annapolis, Maryland http://www.aero.hq.nasa.gov/hpcc/front96.html

Sponsored by: IEEE Computer Society

In cooperation with: NASA Goddard Space Flight Center USRA/CESDIS

Frontiers '96 is the sixth in a series of meetings on massively parallel computation, focusing on research related to systems scalable to many hundreds of processors. The conference will include 34 original research papers surrounding the central theme of research related to the exploitation of massive parallelism, and any aspects of the design, analysis, development, and/or use of massively parallel computers. The realm of computing considered includes general purpose, domain specific, and special purpose systems and techniques. Other highlights include two panels, invited speakers, two pre-conference workshops, and conference banquet. Of special note are two sessions discussing interim results of eight Point Design Studies of high performance computing environments - awardees of the proposals solicited by the New Technologies Program in ASC (NSF), the Microelectronics Systems Architecture Program in MIPS (NSF), the Computer Systems Program in CCR (NSF), and in collaboration with DARPA and NASA.

Frontiers '96 features two workshops in areas of rapidly growing interest to the high-performance computing community. The topics of the workshops are 1) The Petaflops Frontier, and 2) Domain Specific Systems. These day and a half workshops are organized to provide a forum for presenting the most recent advances across a broad range of related topics within these interdisciplinary fields. Mixed with the presentations are open discussions on key topics from emerging technologies that may impact future directions to the policies establishing those directions. For more information contact the Program Chair at frontiers96@cesdis.gsfc.nasa.gov.

Highlghts! Invited Speakers:

  • "From ASCI to Teraflops" - John Hopson, Accelerated Strategic Computing Initiative (ASCI)
  • "Parallelism in the Deep Blue Chess Automoton " - Feng-Hsiung Hsu, IBM Research, TJ Watson Laboratory
  • "Independence Day" - Steven Wallach, HP-Convex

Panel Sessions:

  • "How Do We Break the Barrier to the Software Frontier?" - Rick Stevens, Argonne National Laboratory, Panel Chair
  • "Petaflops Alternative Paths" - David Bailey, NASA Ames Research Center, Panel Chair

Workshops:


    ---- Workshop A: The Petaflops Frontier - Part 1 ----
        George Lake,  University of Washington, Chair

The second workshop in this series explores the scaling properties of application algorithms, alternative architecture models, and device technology as they contribute to the feasibility of achieving computing performance in the regime of 10^15 operations per second.


    ---- Workshop B: Domain Specific Systems - Part 1 ---
            Jose Fortes, Purdue University, Chair

This new series of workshops is intended to highlight systems architecture and software that exploit the opportunity of alternative structures and methods to achieve very high performance for possibly narrow ranges of applications.

Topics include special purpose or embedded processors, reconfigurable architectures, SIMD, digital signal processors, image processors, data compression devices, and other application-driven designs.

Technical Program:


    ---- Scheduling 1 ----
  • Gang Scheduling for Highly Efficient Distributed Multiprocessor Systems
  • Integrating Polling, Interrupts, and Thread Management
  • A Practical Processor Design for Multithreading

    ---- Routing ----
  • Deadlock-Free Path-Based Wormhole Multicasting in Meshes*
  • Efficient Multicast in Wormhole-Routed 2D Mesh/Torus Multicomputers: A Network-Partitioning Approach
  • Turn Grouping for Supporting Efficient Multicast in Wormhole Mesh Networks

    ---- Applications & Algorithms ----
  • A3: A Simple and Asymptotically Accurate Model for Parallel Computation
  • Fault Tolerant Matrix Operations Using Checksum and Reverse Computation
  • A Parallel Three-Dimensional Incompressible Flow Solver Package with a Parallel Multigrid Elliptic Kernel
  • A Statistically-Based Multi-Algorithmic Approach for Load-Balancing Sparse Matrix Computations

    ---- Petaflops Computing / Point Design Studies ----
  • Pursuing a Petaflop: Point Designs for 100 TF Computers Using PIM Technologies
  • Hybrid Technology Multi-Threaded Architecture
  • Design Studies on Petaflops Special-Purpose Hardware for Astrophysical Particle Simulations
  • The Illinois Aggressive Cache-Only Memory Architecture Multiprocessor - (I-ACOMA)

    ---- Scheduling 2 ----
  • Largest-Job-First-Scan-All Scheduling Policy for 2D Mesh-Connected Systems
  • Scheduling for Large-Scale Parallel Video Servers
  • Effect of Variation in Compile Time Costs on Scheduling Tasks on Distributed Memory Systems

    ---- SIMD ----
  • Processor Autonomy and It's Effect on Parallel Program Execution
  • Particle-Mesh Techniques on the MasPar
  • Solving Irregular Problems on SIMD Architectures

    ---- I/O Techniques ----
  • Intelligent, Adaptive File System Policy Selection
  • An Abstract-Device Interface for Implementing Portable Parallel-I/O Interfaces
  • PMPIO - A Portable Implementation of MPI-IO
  • Disk Resident Arrays: An Array-Oriented I/O Library for Out-Of-Core Computations

    ---- Memory Management ----
  • Hardware-Controlled Prefeching in Directory-based Cache Coherent Systems
  • Preliminary Insights on Shared Memory PIC Code Performance on the Convex Exemplar SPP1000
  • Scalability of Dynamic Storage Allocation Algorithms
  • An Interprocedural Framework for Determining Efficient Data Redistributions in Distributed Memory Machines

    ---- Synchronization ----
  • A Fair Fast Distributed Concurrent-Reader Exclusive-Writer
  • Locks Improvement Technique for Release Consistency in Distributed Shared Memory Systems
  • A Quasi-Barrier Technique to Improve Performance of An Irregular Application

    ---- Networks ----
  • Performance Analysis and Fault Tolerance of Randomized Routing on Close Networks
  • Performing BMMC Permutations in Two Passes Through the Expanded Delta Network and MasPar MP-2
  • Macro-Star Networks: Efficient Low-Degree Alternatives to Star Graphs for
  • Large-Scale Parallel Architectures

    ---- Performance Analysis ----
  • Modeling and Identifying Bottlenecks in EOSDIS
  • Tools-Supported HPF and MPI Parallelization of the NAS Parallel Benchmarks
  • A Comparison of Workload Traces from Two Production Parallel Machines
  • Morphological Image Processing on Parallel Machines

    ---- Petaflops Computing / Point Design Studies ----
  • MORPH: A Flexible Architecture for Executing Component Software at 100 TeraOPS
  • Architecture, Algorithms and Applications for Future Generation Supercomputers
  • Hierarchical Processors-and-Memory Architecture for High Performance Computing
  • A Scalable-Feasible Parallel Computer Implementing Electronic and Optical Interconnections for 156 TeraOPS Minimum Performance

Current Editors:
Ed Kornkven ARSC HPC Specialist ph: 907-450-8669
Kate Hedstrom ARSC Oceanographic Specialist ph: 907-450-8678
Arctic Region Supercomputing Center
University of Alaska Fairbanks
PO Box 756020
Fairbanks AK 99775-6020
E-mail Subscriptions: Archives:
    Back issues of the ASCII e-mail edition of the ARSC T3D/T3E/HPC Users' Newsletter are available by request. Please contact the editors.
Back to Top