ARSC T3E Users' Newsletter 166, April 15, 1999
ARSC and UAF to Get OC-12 Connection to Internet2 in September
[ This is taken from a University of Washington news release. The
full text is available at:
http://www.washington.edu/newsroom/news/1999archive/04-99archive/k040299.html
]
The University of Washington and WCI Cable Inc. are very pleased to announce that WCI Cable, in support of the cooperative Pacific/Northwest Gigapop and national Internet 2 efforts, is providing to the UW a state-of-the-art fiber optic connection from Seattle to the University of Alaska Statewide System in Fairbanks, Alaska.
This connection will use high-speed "SONET OC-12" technology, and will bring the new Internet 2 and NGI (Next Generation Internet) capabilities and technologies to Alaska. It will link the University of Alaska to the Pacific/Northwest Gigapop, which is the major national (and only high speed) Internet 2 network hub for the region, and is located in Seattle.
This contribution will enable the University of Alaska and other research and education partners to become full and active participants in Internet2/Next-Generation-Internet development and will extend the new generation of Internet2 technologies, capabilities and opportunities to the entire research and education community in Alaska.
ARSC Internet Connection Upgraded Last Week
The internet connection between Seattle and ARSC was upgraded by a factor of four on April 7, expanding from 2 to 8 T1 links. Users who use the internet to connect to ARSC from the "lower 48" should notice the improvement.
For the rest of the story, see:
http://www.arsc.edu/pubs/bulletins/QuadAccess.shtml
VAMPIR Images of MPI, SHMEM, and Co-Array Fortran Broadcast
As noted last week, VAMPIR does more than display MPI message passing. Users can select any code activities to inspect, including message or data passing accomplished by packages OTHER than MPI.
A sample code, given below, broadcasts data using MPI_Bcast, SHMEM_BROADCAST, and a hand-coded Co-Array Fortran (CAF) algorithm.
The program times seven broadcasts of different sizes per method. Here's output from a run on 13 PEs on yukon:
(Values are time in seconds per MW broadcast
using packets of "N Words" words on "NPES" PES)
NPES N Words MPI SHBCST CAFBCST
13 1 125.05054 39.57748 43.39218
13 10 7.93934 3.69549 4.11272
13 100 0.85235 0.49472 1.77145
13 1000 0.19300 0.15414 0.93031
13 10000 0.12649 0.12825 0.75670
13 100000 0.10838 0.10720 0.33942
13 1000000 0.10481 0.10537 0.29094
So we can "see" what's happening, the code also includes calls to the vampir_trace API to trace the SHMEM and CAF methods so that they will appear on the VAMPIR global-timeline display along with the MPI method.
The following graph is VAMPIR's complete global timeline. The legend includes the new "activities" introduced explicitly in the code: SHMEM_BCAST, CAF_BCAST, and PUT. The colors assigned to these, and all the symbols, are under user control.
Time progresses from left to right, but at this scale, many events are hidden. The final broadcasts of 1,000,000 words per method, however, take the most time, and are visible just before the program terminates.
Figure 1:
The next graph is "zoomed in" on the 100,000 word broadcast. To conserve space, the legend and the processor labels have been switched off.
In order to understand the hand-coded CAF method better, the code was instrumented to designate each individual "PUT" used within the overall broadcast operation. The PUT operations are revealed in the VAMPIR timeline. The tree-structure of the data distribution is apparent as the data flows from processor 0, down the branches, and eventually to all processors.
Figure 2:
This example suggests the power of VAMPIR to help you visualize the flow of data in your programs, regardless of the communication method you use.
Here's the test code:
c**********************************************************************
c Timing program to compare MPI_Bcast, SHMEM_BROADCAST, and
c hand-coded Co-Array Fortran broadcast. Uses vampir trace
c to trace broadcast sections.
c**********************************************************************
program main
implicit none
include 'mpif.h'
include 'VT.inc'
include "mpp/shmem.fh"
! processor identification
integer :: mype, myimg, penum, imagenum, mypartner, npes
integer :: mpi_root, caf_root, shmem_root
! distribution tree
integer :: istage, nstages
! loops
integer :: ido, nbcast, idata,ndata, i
! timing
real :: start, end, tmpi, tshmem, tcafbcst
! internal
real :: flagvalue, ierr, methn
! shmem
integer, dimension(SHMEM_BCAST_SYNC_SIZE) :: pSync
! data for processing
real, allocatable, dimension(:)[:] :: bcArr
! parameters
parameter (ndata = 1000000, flagvalue = 2000.0 )
parameter (mpi_root = 0, shmem_root = 0, caf_root = 1 )
! setup shmem
data pSync /SHMEM_BCAST_SYNC_SIZE * SHMEM_SYNC_VALUE/
! setup MPI
call MPI_INIT( ierr )
call MPI_COMM_RANK( MPI_COMM_WORLD, mype, ierr )
call MPI_COMM_SIZE( MPI_COMM_WORLD, npes, ierr )
! setup CAF
myimg = this_image ()
! setup VAMPIR definitions and initial setup
#define SHMEMBCAST 11
#define CAFBCAST 12
#define CAFPUT 13
call VTSYMDEF (SHMEMBCAST, "SHMEM_BCAST", "SHMEM_BCAST", ierr)
call VTSYMDEF (CAFBCAST, "CAF_BCAST", "CAF_BCAST", ierr)
call VTSYMDEF (CAFPUT, "PUT", "PUT", ierr)
! allocate data arrays
allocate (bcArr(ndata)[*])
! Generate array to broadcast
if ( myimg .eq. caf_root ) then
do idata=1,ndata
call random_number(bcArr(idata))
enddo
endif
call sync_images()
! copy bcArr to all PEs
if ( myimg .ne. caf_root ) then
bcArr(1:ndata) = bcArr(1:ndata)[caf_root]
endif
! print headers for output tables
if ( mype .eq. mpi_root ) then
write (6, '(A4,A12,7(A1,A10))')
& 'NPES',
& 'N Words',
& ' ',
& 'MPI',
& ' ',
& 'SHBCST' ,
& ' ',
& 'CAFBCST'
endif
! loop in order to broadcast increasing sized packets.
do ido = 0, floor (log10 (real (ndata)))
nbcast = 10**ido
! broadcast data from root using MPI
! call MPI_Barrier(MPI_COMM_WORLD,ierr)
methn=1
! all PEs but root reset target array
if ( mype .ne. mpi_root ) bcArr(1:nbcast) = flagvalue
call sync_images()
start=MPI_WTIME()
call MPI_BCAST(bcArr,nbcast,MPI_REAL,mpi_root,MPI_COMM_WORLD
& ,ierr)
call sync_images()
end=MPI_WTIME()
tmpi = end-start
! broadcast same data from root using Shmem broadcast
methn=2
! all PEs but root reset target array
if ( mype .ne. mpi_root ) bcArr(1:nbcast) = flagvalue
call sync_images()
start=MPI_WTIME()
call VTBEGIN (SHMEMBCAST, ierr)
call shmem_broadcast
& (bcArr, bcArr, nbcast, shmem_root, 0, 0, npes, pSync)
call sync_images()
call VTEND (SHMEMBCAST, ierr)
end=MPI_WTIME()
tshmem = end-start
! broadcast same data from root using CAF Tree-structured broadcast
methn=3
! all PEs but root reset target array
if ( mype .ne. mpi_root ) bcArr(1:nbcast) = flagvalue
call sync_images()
start=MPI_WTIME()
! Total number of stages, or levels, in binary tree is
! the ceiling of log base-2 of the number of images.
nstages = ceiling (alog (real (npes)) / alog (2.0))
call VTBEGIN (CAFBCAST, ierr)
do istage = nstages, 1, -1
call sync_images ()
if ( mod (myimg - 1, 2**istage) .eq. 0) then
mypartner = (myimg) + 2**(istage - 1)
if (mypartner .le. npes) then
call VTEND (CAFBCAST, ierr)
call VTBEGIN (CAFPUT, ierr)
!dir$ cache_bypass bcArr
do i=1,nbcast
bcArr(i)[mypartner] = bcArr(i)
enddo
call VTEND (CAFPUT, ierr)
call VTBEGIN (CAFBCAST, ierr)
endif
endif
enddo
call sync_images ()
call VTEND (CAFBCAST, ierr)
end=MPI_WTIME()
tcafbcst = end-start
! print timing results
if ( mype .eq. mpi_root ) then
write (6, '(I4,I12,3(A1,F10.5))')
& npes,
& nbcast,
& ' ',
& 1000000*(tmpi)/nbcast,
& ' ',
& 1000000*(tshmem)/nbcast,
& ' ',
& 1000000*(tcafbcst)/nbcast
endif
enddo
call MPI_FINALIZE(ierr)
end
NACSE Task Force Reports on Software and Tools
The final report of the Northwest Alliance for Computational Science and Engineering (NACSE) task force on requirements for HPC software and tools is now available.
Over the past six months, various groups have been meeting to discuss software and tools on HPC systems. Experiences from activities such as application development, user and system support, and tools and software research have been combined to define a baseline environment that should meet most users' needs for code development and run-time support. However, this wasn't just a group creating a wish list of software dreams for ideal worlds. It was a serious effort to consider both what would be needed now and in the near future, to help vendors/researchers prioritize their efforts, and to help centers acquire productive systems for their users.
Contributions came from a mixed background of national labs, universities, vendors, and other institutions. The power of the systems currently in use varied from small workgroup servers to Teraflop systems. Vendors and users discussed what was needed and could be completed both in the near future and at a realistic cost. The need for Cray compatibility and how the growth of highly powered desktop systems would influence the software development process were also strongly debated.
Some key features recommended as being essential to all HPC systems were:
- OS features needed, which shells/utilities, threads etc.
- Fortran95, C, C++ compilers and openMP within all.
- Libraries such as optimized BLAS and MPI1, plus interopeability between MPI1 and openMP.
- Standard tools, debuggers and basic timers.
- Documentation and examples.
The final report contains details on each of the above topics. Each feature is described as it might be in a typical RFP by a center considering an HPC system or expansion of an existing resource. More details of the series of meetings, the final full report, and some typical examples of software needs for systems to perform specific activities can be found at:
http://www.nacse.org/projects/HPCreqts
Upcoming Conferences in 1999
CUG '99
Cray User's Group Minneapolis, Minnesota; May 24 - 28 Conference registration and accommodations now available on-line. http://www.cug.org/
HUG '99
High-Performance Fortran User Group Redondo Beach, California; Aug 1 - 2 [ Note: new location & date ] Submission Deadline: May 7 [ Note: new deadline ] http://www.icase.edu/hug99/
EuroPar '99
Euro-Par is a European conference on parallel computing Toulouse, France; Aug 31 - Sept 3 http://www.enseeiht.fr/europar99/
SC99
Supercomputing Portland, Oregon; Nov 13 - 19 http://www.sc99.org/
Quick-Tip Q & A
A:{{ What simple change can I make to improve my code's performance on
the T3E? }}
Whether or not these qualify as "simple" or "changes" is obviously
subjective:
o Don't compile with "-g".
o Compile with more optimization, like "-O3,aggress" (but compare
results and performance).
o Use a faster file system (at ARSC this means /tmp).
o Use dmget in advance to retrieve potentially migrated input files.
o Replace hand-coded algorithms with Cray libraries, if possible.
o Test the code on different numbers of PEs. Run on the number
that optimizes "performance," however you define it.
o Use tools (e.g., PAT, apprentice, VAMPIR) to locate inefficient
code segments, and then, if possible, improve them.
Q: The following is legal input to a Unix command:
[la1+dsa*pla10>y]sy
0sa1
lyx
What's the command, and what's the result from this input? (Hint:
the command is an easily mistyped anagram of another, extremely
popular, command.)
[ Answers, questions, and tips graciously accepted. ]
Current Editors:
E-mail Subscriptions:
Ed Kornkven ARSC HPC Specialist ph: 907-450-8669 Kate Hedstrom ARSC Oceanographic Specialist ph: 907-450-8678 Arctic Region Supercomputing Center University of Alaska Fairbanks PO Box 756020 Fairbanks AK 99775-6020
-
Subscribe to (or unsubscribe from) the e-mail edition of the
ARSC HPC Users' Newsletter.
-
Back issues of the ASCII e-mail edition of the ARSC T3D/T3E/HPC Users' Newsletter are available by request. Please contact the editors.
