ARSC HPC Users' Newsletter 213, February 9, 2001
ARSC Advanced Technology Panel, Abstracts
ARSC is hosting several prominent members of the HPC community for discussions and talks concerning directions in advanced technology. The following talks are open to interested parties:
HPCMP Requirements Cray Henry Tuesday, Feb. 13, 2 pm, 401 IARC
Mr. Cray Henry will provide a 45-minute overview of the High Performance Computing Modernization Program (HPCMP), a description of HPCMP requirements and a summary of the benchmark results submitted by vendors. He will also present a high level discussion of the Programming Environment and Training (PET) activity, intended to gather and deploy the best ideas, algorithms, and software tools emerging from the national high performance computing infrastructure into the DoD user community.
Strategic Planning at NCAR Steve Hammond Wednesday, Feb. 14, 9:15 am, 401 IARC
During 2000, Dr. Steve Hammond chaired a committee at the National Center for Atmospheric Research (NCAR) which developed a strategic plan for high performance scientific simulation. This comprehensive plan addresses the way that high performance computers are deployed at NCAR, the management code development projects, the need for more formalized software engineering processes, algorithmic research, data: management, processing and its visualization, and the need for infrastructure to facilitate research among geographically distributed scientists and resources. This presentation will discuss the strategic plan and new efforts underway at NCAR to implement it.
NERSC: A Supercomputer Facility for the Next Millennium Bill Kramer Wednesday, Feb. 14, 10:15 am, 401 IARC
The National Energy Research Scientific Computing Center (NERSC) at Lawrence Berkeley National Laboratory is one of the nation's most powerful unclassified computing resources and a world leader in accelerating scientific discovery through computation. NERSC's vision is to combine the leading edge facilities with intellectual services to accomplish breakthroughs in computational science. This presentation will be an overview of how this balance is met and how NERSC uses the concepts of "Service Architecture" to complement the traditional "System Architecture" approach to Supercomputing facilities. Bill Kramer, Deputy Director of NERSC, will describe the computational resources at NERSC, in particular the new 3.8 Tflop/s IBM SP as well as storage systems with a 1 petabyte capacity and Linux clusters.
The Accelerated Strategic Computing Initiative (ASCI) Program Jim McGraw Thursday, Feb. 15, 9:00 am, 401 IARC
ASCI is the computational portion of the Stockpile Stewardship Program (SSP). The SSP is responsible for creating new plans that will insure the safety and reliability of the US nuclear stockpile as it ages. The SSP relies heavily on the development of sophisticated computational models to help evaluate this safety and reliability. This talk will give a broad overview of the ASCI objectives and describe how those objectives are translated into specific needs. Issues that will be addressed by Dr. Jim McGraw include: achieving adequate performance on specific application problems, transition plans between system generations, remote access capabilities for other DP labs, post-processing capabilities, power consumption and floor space management.
Toward Predictions of Arctic Environmental Change Wieslaw Maslowski Thursday, Feb. 15, 9:00 am, 401 IARC
Understanding of short-to-long term variability of ice extent and mass in the Arctic Ocean, fresh water budget, deep water formation and export is crucial in assessment of the sensitivity of this region to global climate change and its role in driving climate variability. Coupled ice-ocean modeling of the pan-Arctic region constitutes and integral part of multi-agency supported efforts to advance arctic science. A major part of the Naval Postgraduate school's Arctic Modeling effort involves high resolution modeling of the Arctic Ocean and sea ice with prescribed realistic atmospheric forcing. Dr. Wieslaw Maslowski will discuss recent results of model comparisons including different spatial resolutions and the role of increased resolution on representation of ocean and ice physics and thermodynamics from small to large scales.
SV1 GFLOP Contest Winner
It ain't easy to beat a GFLOP on a single SV1 processor!
This made judging the contest easy. In the end, I only got one successful entry, and that was from illustrious co-editor, Guy Robinson. Thus, I'll be treating Guy to a cool draught of his choosing, in some local pub of my choosing, most-likely overlooking the frozen, yet still mighty, Chena River.
Back to GFLOPS.
From the Benchmarker's Guide to CRAY SV1 Systems: "Each [SV1] CPU has 2 add and 2 multiply functional units, allowing each CPU to deliver 4 floating point results per CPU clock cycle. With the 300 MHz CPU clock the peak floating point rate per CPU is 1.2 Gflops/s."
Here are Guy's comments on the program:
-
The dataset in the inner loop is long enough to get a vector going.
-
The datasets references in the outer iterative loop all fits in cache.
-
There are lots of floating point operations with the same data items, which are all in cache. A high order polynomial. (Remember the goal was flops!) The lower the polynomial the lower the flops rating.
-
If we'd made the datasize and loop sizes the same, i.e. 120 then we get a clash on memory access. Sized to 120 gets only 1019.14MFloating ops/CPU second
Here's the timing, using hpm:
CHILKOOT$ f90 -O3 cache_test.f CHILKOOT$ export NCPUS=1 CHILKOOT$ hpm ./a.out STOP executed at line 252 in Fortran routine 'CACHE_TEST' CPU: 810.644s, Wallclock: 812.186s, 3.1% of 32-CPU Machine Memory HWM: 734439, Stack HWM: 49799, Stack segment expansions: 0 Group 0: CPU seconds : 810.64491 CP executing : 243193473792 Million inst/sec (MIPS) : 20.87 Instructions : 16915248327 Avg. clock periods/inst : 14.38 % CP holding issue : 92.60 CP holding issue : 225187690857 Inst.buffer fetches/sec : 0.00M Inst.buf. fetches: 500539 Floating adds/sec : 520.57M F.P. adds : 421997857775 Floating multiplies/sec : 513.70M F.P. multiplies : 416429414904 Floating reciprocal/sec : 3.49M F.P. reciprocals : 2832200008 Cache hits/sec : 15.43M Cache hits : 12504952732 CPU mem. references/sec : 28.19M CPU references : 22850939545 Floating ops/CPU second : 1037.77MAnd Guy's program:
program cache_test
implicit none
integer :: idim,jdim
parameter (idim=121,jdim=121)
real, dimension(idim,jdim) :: a,c
integer i,j
integer iblock,jblock
integer iter,niter
niter=20000
iblock=idim-1
jblock=jdim-1
do i=1,iblock
do j=1,jblock
a(i,j)=0.0
c(i,j)=(i+j*idim)/jdim
enddo
enddo
do iter=1,niter
do i=2,idim-1
do j=2,jdim-1
a(i,j)=
! c(i,j)*(
! c(i,j)*(
! c(i,j)*(
! c(i,j)*(
! c(i,j)*(
! c(i,j)*(
! c(i,j)*(
! c(i,j)*(
! c(i,j)*(
! c(i,j)*(
! c(i,j)*(
! c(i,j)*(
! c(i,j)*(
! c(i,j)*(
! c(i,j)*(
! c(i,j)*(
! c(i,j)*(
! c(i,j)*(
! c(i,j)*(
! c(i,j)*(
! c(i,j)*(
! c(i,j)*(
! c(i,j)*(
! c(i,j)*(
! c(i,j)*(
! c(i,j)*(
! c(i,j)*(
! c(i,j)*(
! c(i,j)*(
! c(i,j)*(
! c(i,j)*(
! c(i,j)*(
! c(i,j)*(
! c(i,j)*(
! c(i,j)*(
! c(i,j)*(
! c(i,j)*(
! c(i,j)*(
! c(i,j)*(c(i,j)+1)-1)+1)+1)+1)+1)
! +1)+1)+1)-1)+1)+1)
! +1)-1)+1)
! +1)-1)+1)
! +1)-1)+1)
! +1)-1)+1)
! +1)-1)+1)
! +1)-1)+1)
! +1)-1)+1)
! +1)-1)+1)
! +1)-1)+1)
! +1
a(i,j)=a(i,j)+
! c(i-1,j)*(
! c(i-1,j)*(
! c(i-1,j)*(
! c(i-1,j)*(
! c(i-1,j)*(
! c(i-1,j)*(
! c(i-1,j)*(
! c(i-1,j)*(
! c(i-1,j)*(
! c(i-1,j)*(
! c(i-1,j)*(
! c(i-1,j)*(
! c(i-1,j)*(
! c(i-1,j)*(
! c(i-1,j)*(
! c(i-1,j)*(
! c(i-1,j)*(
! c(i-1,j)*(
! c(i-1,j)*(
! c(i-1,j)*(
! c(i-1,j)*(
! c(i-1,j)*(
! c(i-1,j)*(
! c(i-1,j)*(
! c(i-1,j)*(
! c(i-1,j)*(
! c(i-1,j)*(
! c(i-1,j)*(
! c(i-1,j)*(
! c(i-1,j)*(
! c(i-1,j)*(
! c(i-1,j)*(
! c(i-1,j)*(
! c(i-1,j)*(
! c(i-1,j)*(
! c(i-1,j)*(
! c(i-1,j)*(
! c(i-1,j)*(
! c(i-1,j)*(c(i,j)+1)-1)+1)+1)+1)+1)
! +1)+1)+1)-1)+1)+1)
! +1)-1)+1)
! +1)-1)+1)
! +1)-1)+1)
! +1)-1)+1)
! +1)-1)+1)
! +1)-1)+1)
! +1)-1)+1)
! +1)-1)+1)
! +1)-1)+1)
! +1
a(i,j)=a(i,j)/2.0
a(i,j)=a(i,j)+
! c(i,j)*(
! c(i,j)*(
! c(i,j)*(
! c(i,j)*(
! c(i,j)*(
! c(i,j)*(
! c(i,j)*(
! c(i,j)*(
! c(i,j)*(
! c(i,j)*(
! c(i,j)*(
! c(i,j)*(
! c(i,j)*(
! c(i,j)*(
! c(i,j)*(
! c(i,j)*(
! c(i,j)*(
! c(i,j)*(
! c(i,j)*(
! c(i,j)*(
! c(i,j)*(
! c(i,j)*(
! c(i,j)*(
! c(i,j)*(
! c(i,j)*(
! c(i,j)*(
! c(i,j)*(
! c(i,j)*(
! c(i,j)*(
! c(i,j)*(
! c(i,j)*(
! c(i,j)*(
! c(i,j)*(
! c(i,j)*(
! c(i,j)*(
! c(i,j)*(
! c(i,j)*(
! (c(i,j)*(c(i,j)+1)+1)+1)+1)+1)+1)
! +1)+1)+1)+1)+1)+1)
! +1)-1)+1)
! +1)+1)+1)
! +1)+1)+1)
! +1)+1)+1)
! +1)-1)+1)
! +1)+1)+1)
! +1)-1)+1)
! +1)+1)+1)
! +1)-1)+1)
! +1
a(i,j)=a(i,j)/(
! c(i,j)*(
! c(i,j)*(
! c(i,j)*(
! c(i,j)*(
! c(i,j)*(
! c(i,j)*(
! c(i,j)*(
! c(i,j)*(
! c(i,j)*(
! c(i,j)*(
! c(i,j)*(
! c(i,j)*(
! c(i,j)*(
! c(i,j)*(
! c(i,j)*(
! c(i,j)*(
! c(i,j)*(
! c(i,j)*(
! c(i,j)*(
! c(i,j)*(
! c(i,j)*(
! c(i,j)*(
! c(i,j)*(
! c(i,j)*(
! c(i,j)*(
! c(i,j)*(
! c(i,j)*(
! c(i,j)*(
! c(i,j)*(
! c(i,j)*(
! c(i,j)*(
! c(i,j)*(
! c(i,j)*(
! c(i,j)*(
! c(i,j)*(
! c(i,j)*(
! c(i,j)*(
! (c(i,j)*(c(i,j)+1)+1)+1)+1)+1)+1)
! +1)+1)+1)+1)+1)+1)
! +1)+1)+1)
! +1)+1)+1)
! +1)+1)+1)
! +1)+1)+1)
! +1)+1)+1)
! +1)+1)+1)
! +1)+1)+1)
! +1)+1)+1)
! +1)+1)+1)
! +1)
enddo
enddo
c=a/5.0
enddo
write(20) a
stop
end
Other notes on performance:
-
I/O and memory bandwidth are often the bottlenecks, not CPU performance.
-
The system-wide average performance achieved on Chilkoot by real user jobs has changed over the months, but is typically about 200 MFLOPS per CPU. The best individual users run consistently at 400-500 MFLOPS per CPU.
-
We run "hpmflop" on chilkoot to do passive monitoring of performance and can provide historical data to users concerning their own jobs. Contact consult@arsc.edu . Better yet, we encourage you to monitor your own jobs. See the article on "HPM", in:
-
Many codes do not scale well as the number of processors is increased. For discussion, see the article on Amdahl's Law, in:
-
Cray provides an excellent manual on performance tuning, titled,
Optimizing Code on Cray PVP Systems, SG-2192
This is available on ARSC's dynaweb server,http://www.arsc.edu:40/
- Performance depends on a various factors... your mileage may vary. ARSC users are always welcome to contact consult@arsc.edu for help understanding or resolving performance issues.
OpenMP MASTER vs SINGLE Follow Up
[[ Thanks to Alan Wallcraft of NRL for sending this in. It is additional discussion of the OpenMP constructs, MASTER and SINGLE. ]]
The MASTER and SINGLE constructs look similar but are significantly different. The MASTER construct is exactly equivalent to:
if (omp_get_thread_num().eq.0) then
...
endif
So only the master thread (thread 0) executes the contents of the construct and all other threads skip it. Note that there is no implied barrier, so whatever the master does to shared variables inside this construct is not necessarily "visible" to other threads until after a subsequent BARRIER (or other synchronization).
SINGLE is a worksharing construct, and all such constructs (and BARRIERs) must be encountered in the same order by all threads, or be encountered by no thread. There is no similar constraint on MASTER. Any one thread (chosen by the implementation) executes the contents of the SINGLE construct and all other threads skip it. By default END SINGLE implies BARRIER, so any changes made to shared variables inside this construct will be "visible" to all threads after END SINGLE. If you don't want a BARRIER, use END SINGLE NOWAIT. It is legal for a compiler to use MASTER to implement SINGLE, and the following would be identical in effect with such a compiler:
!$omp single !$omp master
... ...
!$omp end single !$omp end master
!$omp barrier
!$omp single !$omp master
... ...
!$omp end single nowait !$omp end master
However, most compilers probably implement SINGLE by letting the first thread to arrive at the construct execute the contents. This will be faster than using the MASTER if the MASTER arrives later than the first thread. However, if the contents are inexpensive to compute the difference in performance between SINGLE and MASTER may be very small.
It is confusing that very similar looking directives have different behaviour in terms of BARRIER. I think OpenMP should always allow WAIT where NOWAIT can now be used, since then at least the default BARRIER could be documented by carefull programmers. This was turned down as an addition to the version 2.0 API, but with version 2.0 at least in-line comments will be allowed. So once 2.0 is available I suggest always documenting the barrier with an in-line comment, e.g.:
!$omp end single !wait
UAF Colloquium Series: Jonah Lee, Feb 15
The UAF Department of Mathematical Sciences and ARSC are jointly sponsoring a Mathematical Modeling, Computational Science, and Supercomputing Colloquium Series.
The schedule and abstracts for the '00-'01 academic year are available at:
http://www.dms.uaf.edu/dms/Colloquium.html
The next presentation:
Computational Plasticity Dr. Jonah Lee Department Head Mechanical Engineering Department University of Alaska Fairbanks
Date: Thursday, February 15, 2001 Time: 1:00-2:00 PM Location: Chapman 106
ABSTRACTPlasticity is a subfield of mechanics of materials whenever permanent deformations are involved which are usually precursors to the progressive damage and failure of materials. A few examples, drawn from recent research, will first be given on how computational plasticity is used at different geometric scales to gain a basic understanding of the behavior of materials. Software, language and platform issues will then be discussed. Finally, possible future directions will be discussed.
THE SPEAKERProfessor Jonah Lee has been with UAF for more than 17 years. He is currently Professor of Mechanical Engineering and an affiliate faculty with the Arctic Region Supercomputing Center. His research interests are in computational, theoretical and experimental mechanics of materials. He has received fundings from the National Science Foundation, DOD, Cray Inc. and other agencies for his projects.
ARSC Training, Next Two Weeks
For details on ARSC's spring 2001 offering of short courses, visit:
http://www.arsc.edu/user/Classes.html
Here are the courses available in the next two weeks:ARSC Tour for New and Prospective Users
Wednesday, Feb 14, 2001, 3-4pm
Visualization with MAYA
Wednesday, Feb 21, 2001 2-4pm
Quick-Tip Q & A
A:[[ I'm tired of waiting ages and ages for my code to recompile when
[[ I need to change its parameters. Is there a way to run the code
[[ again with new array sizes, constants, etc. that's any faster?
## Thanks go to Dr. Nic Brummel of the University of Colorado:
Well, in Fortran there is NAMELIST. Since Fortran 90 allows dynamic
memory allocation, you can set array sizes as well as parameters.
Program main
Integer :: nx, ny, nz
Real :: param1
Namelist /problem_size_namelist/ nx, ny, nz, param1
Open(9,file='input',status='old')
Read(9, problem_size_namelist)
Close(9)
...<do your stuff using nx,ny,nz,param1>...
End Program main
<Contents of file "input">
&problem_size_namelist
nx = 128
ny = 128
nz = 350
x_max = 6.
/
<end contents of file "input">
## Editor comments:
Other possibilities include command line arguments and reading
parameters from files, of one's own design, rather than standard
NAMELIST files. Note that with dynamic allocation of arrays, the
compiler might miss opportunities for optimization which it would spot
with static arrays.
Q: Here's a warm-up exercise Guy gives the students in his class on
parallel programming. We thought it might be fun for this
newsletter:
Give an example of parallelism in the real world, and discuss
briefly with respect to concurrency, scalability, locality.
(For instance, cooking dinner. Yes, multiple cooks can work
concurrently, but since too many would spoil the broth, it's not
very scalable. Locality doesn't matter as long as the results
appear at the same place in the right order.)
[[ Answers, Questions, and Tips Graciously Accepted ]]
Current Editors:
E-mail Subscriptions:
Ed Kornkven ARSC HPC Specialist ph: 907-450-8669 Kate Hedstrom ARSC Oceanographic Specialist ph: 907-450-8678 Arctic Region Supercomputing Center University of Alaska Fairbanks PO Box 756020 Fairbanks AK 99775-6020
-
Subscribe to (or unsubscribe from) the e-mail edition of the
ARSC HPC Users' Newsletter.
-
Back issues of the ASCII e-mail edition of the ARSC T3D/T3E/HPC Users' Newsletter are available by request. Please contact the editors.
