[Menu Bar] Resourses at ARSC Science at ARSC Newsroom Support About ARSC ARSC Home

 

ARSC T3D Users' Newsletter 109, October 25, 1996

Newsletter Index Quick-Tip Index Search Newsletters

Report on The Carolina CUG

Five ARSC staff members spent last week in sunny Charlotte. Dale Clark and I are now back in the land of winter. For this Newsletter, I am going to share excerpts from our CUG reports. A lot of this material was taken from notes, so if we have gotten anything wrong, my apologies to the presenters (corrections, embellishments, and other contributions are, of course, always welcome).

Thanks to Dale for his material, which I'll give first:

Introduction

The 38th Cray User Group (CUG) conference was held in Charlotte, North Carolina during the week of October 14 - 18, 1996. The host site was the North Carolina Supercomputing Center, which was, however, located about 120 miles east at Research Triangle Park. The theme of the conference was "Speeding by Design", inspired by the Charlotte Motor Speedway, "America's premier NASCAR facility and home of the Coca-Cola 600 race event".

Charlotte is a city of some 455,000 people, and was incorporated in 1768. Nicknamed "the Queen City", Charlotte's actual name derives from George III's German queen, Charlotte of Mecklenburg. The weather during the conference was pleasant and sunny, with daily highs in the upper seventies.

Conclusions

This conference, the first since Seymour Cray's death and CRI's merger with SGI, in many respects was classic CUG. Well organized and graciously hosted, its tutorials, BOFs, SICs, MIGs, general sessions and parallel presentations provided useful forums for us users to learn from one another's experiences, and - of at least equal importance - for Cray to instruct us and inform us of their future plans.

Certain problems suggest, however, that future CUGs will be different. Monetary problems, for one thing, have led delegates to approve a move to once-yearly meetings, to be held each Spring, making this CUG 38 the last Fall meeting.

More importantly, the CRI/SGI merger has plunged CUG into an identity crisis. The legal requirement that references to 'CRAY' in CUG bylaws be replaced with references to 'S.G.I./CRAY', for example, potentially opens the door to CUG membership for anyone with an SGI workstation. Does CUG then abandon its focus on high end machines and become a broad-based corporate user group, or should it retain its focus on high performance computing, possibly expanding it to admit users of all high performance computers, whatever the make?

Whatever the resolution, CUG stands to lose its special relationship with Cray. How willingly will Cray share privileged information with a group whose members may include competing supercomputer manufacturers, who could not be kept out under either scenario? More importantly, how willingly will Cray's new parent support CUG? Remarks by SGI's President and CEO, Ed McCracken, seem to indicate that he regards user groups, with their natural focus on current needs and problems, as something of a drag on progress.

Perhaps, as some delegates remarked, the sky will not fall, and CUG is in no real danger. Whether it will be business as usual, and whether CUG can continue its traditions, however, remains to be seen.

================================ reports ===============================

Cray Corporate Report

General session presentation by Bob Ewald, President and COO, CRI

Hardware Report

General session presentation by Steve Johnson, CRI

Software Report

General session presentation by Mike Booth, CRI

Service Report

General session presentation by Mick Dungworth, CRI

Computational Chemistry: Two Success Stories

General session presentation by Lee Bartolotti, NCSC

This was an interesting presentation showing how theoretical chemists, using complex molecular models and massive amounts of computation time, can achieve results that have eluded their experimental chemistry colleagues. The first example was a fine-grained model applied to the small toluene (methyl benzene) molecule, in a search for its decomposition products, a matter of concern in the fragile upper atmosphere. Results pointed to a preponderance of an unexpected decomposition product, a finding that was later confirmed experimentally. The second success story concerned the elucidation of the stereochemistry of a key protein associated with Alzheimer's disease. This involved a coarser-grained model, with resulted in less exact predictions, but still advanced researcher's knowledge of this protein's likely structure.

SGI/Cray Next Generation

General session presentation by Rick Bahr, SGI, and Karl Freund, CRI

A frequently-seen chart showing the expected development of SGI and Cray product lines was again trotted out, the key message seeming to be that both product lines are converging on the SN1 and SN2, scalable node architectures using MIPS processors and key Cray technology. Also discussed were the advantages and disadvantages of current architectures:

The Silicon Valley CUG

General session presentation by David Robertson, Sterling Software

This presentation consisted of a brief but entertaining video of the attractions of the San Jose area, including some sight gags on the conference theme, "seismic computing." From the looks of things, a rattlingly good time is in store for all attendees.

Tom Baring's Material:

I tried to hit every talk or BOF which had "T3E" somewhere in the title. People seem excited about the 'E, and a lot of enthusiasm swirls around it. Early systems were apparently unstable, but they are stabilizing quickly, due to a tremendous and exacting effort on the part of CRI.

T3E Optimization -- Jeff Brooks (CRI)

The T3E uses the Dec Alpha 21164 (EV5) chip. Some specs:

EV5 cache system:

Local Memory System:
Streams allow prefetching from DRAM to fast buffers to speed data loading. There are 6 streams available, they are managed in hardware and are allocated after two secondary cache misses. They achieve >600 MB/sec sustained bandwidth on multiple RHS code. Optimize for streams by keeping # of data streams on RHS's to 6 or less.

Global Memory Access:
512 "E-Registers" are used for GETs and PUTs to global memory. E-register data can load to EV5's registers at 600 MB/s. Using low-level directives, you can compute straight out of E-registers, this is possible optimization for strange strides.

Floating Point Functional Units:
Mult/Add are pipelined and take 4 clock periods. (T3D is 6 cp) Divide is not pipelined and takes 22-60 cp. (T3D is 61 cp) Unroll loops to optimize -- exposes more parallelism to compiler.

Some Speed comparisons w/ T3D:

  4th order Horner's rule polynomial--
    not unrolled (as is): 3.6x faster
    unroll by 4:          6.5x 
  libm intrinsic funcs--
    sqrt:                 5.7x
    1.0/sqrt              5.1x
    alog                  2.9x
    exp                   4.5x
    sin                   2.6x
    cos                   3.1x
    a**b                  3.4x
  saxpy--
    no unroll             4.8x
    unroll                6.4x

Summary:

Jeff has graciously supplied postscript copies of his overheads, which are now available:

pub/mpp/docs/T3Etutorial.ps.Z
At:
   ftp.arsc.edu

SGI Merger: Impact on Cray Users -- Ed McCracken (SGI Pres & CEO)

In '97, SGI will be a $4B company w/ ~11,000 employees.

Principal markets:
1/3 -- manufacturing (automotive and aircraft)
1/3 -- defense & intelligence (image processing, simulations, HPC)
1/3 -- Science (weather, oil/gas, pharm, etc... Universities)
<15% -- "tele-entertainment"

Product categories:
$2B Low-end (desktop) machines
$2B High-end (on the floor) machines, which can be divided:
1/4 in graphics supercomputers
1/4 in database/data warehousing/web servers
1/2 in pure supercomputing, which can be divided:
2/3rds CRI
1/3rd SGI

Five primary groups in SGI makeup:
MTI (MIPS and PowerPC)
Desktop workstations
Scalable Systems
CRI
Silicon interactive (software)

Products:
J90 and T90 new generations
Scalable node product lines (CRI & SGI plans were similar)
SGI SN line is core
CRI responsible for largest configurations

Other comments:
SGI often uses "just in time research." When they need an idea, they go out and find it; they maintain close ties with Universities, in particular Stanford. They are heavier in engineering than research, but compared to competitors, try to stay much higher in R&D -- they want to develop the new ideas, not the clones. They have cut very few CRI employees, and those from redundant departments, like HR.

Installation and configuration of UNICOS/mk on a Cray T3E -- David C. Holst (CRI)

This talk was more for system admins but I have the handout.

A couple things:

Benchmarking the SNL MPI Suite on T3E -- Mike Davis (CRI at SNL)

They moved code to T3E in steps:

See: www.sandia.gov for codes

Comparison of CF77 and CF90

Programmer should be aware of several techniques to get the best compiler optimization out of CF90. My notes don't have the detail, but look for:

Cray Supercomputing Report -- Irene Qualters (VP, CRI)

T3Es built:

"Birds-of-a-feather" discussion on T3E Status -- Steve Reinhart, CRI

Software in delivery:

Stability of existing T3Es:

Streams:

Installing and Configuring a T3E -- David C. Holst (CRI)

At CRI, 3-5 users per shell (support) PE at a time is typical.

Automated T3D error reporting -- M.W. Brown (EPCC)

They have developed a tool, the "patrol" system, for automatic checking of T3D status. It scans mppsyslog for certain patterns; invokes mppping to check communication status; and checks fsmon (free space monitor).

Chemistry Apps on the T3E -- John Carpenter (CRI)

This authoritative talk covered progress in porting chemistry codes to the T3E. Computational goal is to compute 100 atom molecules in 20 hours -- much longer becomes impractical for researchers. Here is the status of some ports:

Gaussian 94 -- being ported to 'E
UniChem     --
GAMESS      -- 'D and 'E versions available
DMOLE       -- 'D version in beta; 'E being ported
Turbomole   -- 'D and 'E versions in beta
QChem       -- 'D and 'E versions in beta
NWChem      -- 'D and 'E versions in beta

HPF-CRAFT "Birds-of-a-feather" discussion -- Doug Miles of PGI

This BOF brought good news, as I had thought that CRAFT on the T3E would be scaled down and that users would be faced with major rewrites. Turns out, T3E-CF90 will support full CRAFT standard. THere will be some minor syntax changes relative to CRAFT 77. Here are three that I got down:

 Directive prefix !DIR$ --> !HPF$
     "     shared       --> DISTRIBUTE
     "     DOSHARED     --> INDEPENDENT

But all CRAFT features will be implemented.

The HPF features in T3E-CF90 will be standard, and thus, if users stick with HPF rather than CRAFT, their code should port to different platforms. The generic HPF uses message passing, but on T3E, PGI will be working toward implementation taking advantage of shmem.

T3E-CF90 will be interoperable w/ totalview and apprentice.

So Optimization Breaks Your Code ... -- R. K. Owen (NAS)

He has developed a tool, "bchop," which helps determine which of many object files has an error under different compiler options. The concept and program are both fairly simple:

  1. create a directory ./good, and put object files produced with known "good" compiler options into it.

  2. create a ./bad directory, a put object files into it with the questionable compiler options.

  3. create a script which can tell good program output from bad.

  4. bchop relinks the program over and over with different combinations of good/bad object files and uses the test script to evaluate the executable resulting from each combination. It does a binary search through the possible combinations for the object file(s) which cause the program to produce bad output.

bchop is available from the NAS www site.

Quick-Tip Q & A

A: {{ What mode should you give a directory so that members of your
      unix permission group can create files in it, edit their own and
      other members' files, remove and rename their own files, but not
      remove or rename anyone else's files? }}

   Set the "sticky-bit" (1xxx).

   Mode: 1770   (denies world permissions)
   Mode: 1775   (gives world read/execute permission)

   Example:

   denali$ mkdir Dir_Sticky
   denali$ chmod 1770 Dir_Sticky
   denali$ ls -ld Dir_Sticky
   drwxrwx--T   2 baring   staff     4096 Oct 11 14:49 Dir_Sticky/

   The "T" in the "ls" output indicates that the sticky-bit is set.  


Q: You telnet to a remote site, and suddenly your backspace key
   produces funny text instead of spacing back.  How do you fix this?

[ Answers, questions, and tips graciously accepted. ]

 


Current Editors:
Thomas J. Baring ARSC Web Specialist ph: 907-450-8619
Donald Bahls ARSC User Consultant ph: 907-450-8674
Arctic Region Supercomputing Center
University of Alaska Fairbanks
PO Box 756020
Fairbanks AK 99775-6020
Contact:
Send comments and questions to the current editors using this Contact Form.
Email Subscriptions: Archives:

 

Newsletter Index Quick-Tip Index Search Newsletters

 

Arctic Region Supercomputing Center
PO Box 756020, Fairbanks, AK 99775 | voice: 907-450-8600 | email:

home | search | about | support | news | science | resources