ARSC T3E Users' Newsletter 169, May 21, 1999
MPI Quiz
Can you spot three mistakes in the code below? It compiles and runs, but hangs... :-(
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
program mpi_quiz
include 'mpif.h'
integer idata
parameter (idata=1000)
! code data
integer irbuf,isbuf
dimension irbuf(idata),isbuf(idata)
integer i
integer ndata
! MPI data
integer myid, numprocs, ierr, rc
integer mroot
call MPI_INIT( ierr )
call MPI_COMM_RANK( MPI_COMM_WORLD, myid, ierr )
call MPI_COMM_SIZE( MPI_COMM_WORLD, numprocs, ierr )
print *, "Process ", myid, " of ", numprocs, " is alive"
irbuf=0
ndata=10
mroot=0
do i=1,ndata
isbuf(i)=i*100
enddo
if(myid.eq.mroot) then
call MPI_REDUCE(isbuf, irbuf, ndata, MPI_INT, MPI_SUM,
! mroot, MPI_COMM_WORLD)
do i=1,ndata
write(6,*) ' data on ',myid,' is ',isbuf(i),irbuf(i)
enddo
endif
call MPI_FINALIZE(rc)
stop
end
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Here's a sample session showing the output:
yukon$ f90 mpiquiz.f yukon$ mpprun -n2 ./a.out Process 0 of 2 is alive Process 1 of 2 is aliveAt this point the program is hung--no more output appears. Here is the dump produced when program is interrupted by CTRL-C:
SIGNAL: Interrupt ( from process 0 ) Beginning of Traceback (PE 0): Interrupt at address 0x8000d9f0c in routine '???'. Called from line 1422 (address 0x8000f500c) in routine '_T3DMPI_reduce_zero'. Called from line 1524 (address 0x8000f6580) in routine 'MPI_Reduce'. Called from line 2430 (address 0x800171760) in routine 'MPI_REDUCE'. Called from line 36 (address 0x80000154c) in routine 'MPI_QUIZ'. Called from line 475 (address 0x800000c98) in routine '$START$'. End of Traceback.
IJCR: Research on Parallel Computing
> CALL FOR PAPERS > > Special Issue of the "International Journal of Computer Research" > (http://www.softlab.ntua.gr/~mastor/IJCR.htm) on: > > > INDUSTRIAL APPLICATIONS OF PARALLEL COMPUTING > > Parallel scientific and engineering computing is becoming of paramount > importance in several industrial applications, especially when the > solution of large and complex problems must cope with harder and harder > time scheduling. > > In this special issue, we would like to report on relevant research > representing the state-of-the-art in the following areas of parallel > computing: > > 1. parallel and distributed combinatorial and global optimization > methods, as applied to industrial/practical problems > > 2. parallel and distributed computing techniques and software systems, > as applied to industrial/practical problems > > The application areas are to be understood very broadly and include, > but are not limited to: computational fluid mechanics, structural > engineering, computational chemistry, electronic and electromagnetic > circuits, signal and image processing, etc. > > Submission deadline: September 15th, 1999[ For submission details, see URL, above. ]
Fortran Information List
The quarterly Fortran Information List submitted to comp.fortran.90 by Mike Metcalf provides, as usual, a tremendous list of information and resources for anyone involved in Fortran.The contents of the May 20, 1999 list include:
-
WHAT'S NEW?
-
Since 20 April:
- Update Cray and SGI entries to Fortran 95.
- Update Lahey's entry.
- Update Compaq e-addresses.
- Add Alan Miller's source form converter.
-
Since 23 March:
- Add "The DIGITAL Visual Fortran Programmer's Guide".
- Add Fortran 95 standard electronic ordering information.
- Update Compaq (Digital) entry for Linux.
-
Since 22 February:
- Update Wagener's book entry.
- Update Sun's entry.
- Delete Meissner's e-book.
- Add advice on down loading Dubois's lecture notes.
- Replace IBM's entry - for Fortran 95 compliance etc.
- Replace Fujitsu's entry - for Fortran 95 compliance etc.
- Add Lahey/Fujitsu version of f90gl.
-
Since 20 April:
- WHERE CAN I OBTAIN A FORTRAN COMPILER?
- OTHER USEFUL PRODUCTS
- WHAT BOOKS ARE AVAILABLE? References provided to books in: Chinese, Dutch, English, Finnish, French, German, Japanese, Russian, and Swedish
- WHERE CAN I OBTAIN COURSES, COURSE MATERIAL OR CONSULTANCY?
- WHERE CAN I FIND THE FORTRAN AND HPF STANDARDS?
We've made the May 20, 1999 list accessible at:
http://www.arsc.edu/support/news/T3Enews/misc/i169FortranList.html
To subscribe:(hpff@cs.rice.edu is a mailing list for announcements related to High Performance Fortran.)
To (un)subscribe to this list, send mail to hpff-request@cs.rice.edu. Leave the subject line blank, and in the body put the line (un)subscribe <email-address>
MPI Quiz: Answers
Here are the three mistakes:- The reduction/broadcast operation (MPI_REDUCE) isn't called on all the processors in the MPI_COMM_WORLD Communicator.
-
MPI_INT is specified as the data type in the MPI_REDUCE call. Oops!
If you work in both C and Fortran, remember, it is MPI_INT in the former and MPI_INTEGER in the latter. Other data types vary also. Fortran's IMPLICIT NONE can help since MPI_INT isn't defined in mpif.h header files.
- The IERR parameter is not included in all MPI subroutine calls.
We can see the different effects of these problems by correcting them one at a time. The error messages all provide clues.
As shown above, problem #1 causes the program to hang. To correct this problem, the "if" statement can be commented out, like this:
!! if(myid.eq.mroot) then
call MPI_REDUCE(isbuf, irbuf, ndata, MPI_INT, MPI_SUM,
! mroot, MPI_COMM_WORLD)
do i=1,ndata
write(6,*) ' data on ',myid,' is ',isbuf(i),irbuf(i)
enddo
!! endif
With this change, MPI_REDUCE will be called on all processors. This version produces this output:
yukon$ mpprun -n4 ./t1.a.out Process 1 of 4 is alive Process 3 of 4 is alive Process 0 of 4 is alive Process 2 of 4 is alive -MPI- ERROR FATAL: world rank SIGNAL-MPI- ERROR FATAL: world rank 0, c: 2, comm Operand range erroromm 0 (0 In c [0] In call memory management faultall _T3DMPI_op_sum)_T3DMPI_op_sum, [- , [-490 Beginning of Traceback (PE 1): 490, class Interrupt at address 0x80017175c in routine 'MPI_REDUCE'. , class MPI_ERR_OP Called from line 36 (address 0x8000014b8) in routine 'MPI_QUIZ'. MPI_ERR_OP], [ Called from line 475 (address 0x800000c98) in routine '$START$'. ], [Invalid op End of Traceback. Invalid op] MP] MPOperand range error(coredump)Correcting problem #1 has exposed problem #2. To correct problem #2, we replace MPI_INT with MPI_INTEGER, like this:
call MPI_REDUCE(isbuf, irbuf, ndata, MPI_INTEGER, MPI_SUM,
! mroot, MPI_COMM_WORLD)
The code is recompiled and rerun:
yukon$ mpprun -n4 ./t2.a.out Process 3 of 4 is alive Process 1 of 4 is alive Process 2 of 4 is alive Process 0 of 4 is alive SIGNAL: Operand range error ( [0] memory management fault) Beginning of Traceback (PE 3): Interrupt at address 0x80017175c in routine 'MPI_REDUCE'. Called from line 36 (address 0x8000014c0) in routine 'MPI_QUIZ'. Called from line 475 (address 0x800000c98) in routine '$START$'. End of Traceback. Operand range error(coredump)Correcting problem #2 has exposed problem #3. We add the IERR parameter to the MPI_REDUCE call:
call MPI_REDUCE(isbuf, irbuf, ndata, MPI_INTEGER, MPI_SUM,
! mroot, MPI_COMM_WORLD, ierr)
Again, the code is recompiled and rerun:
yukon$ mpprun -n4 ./t3.a.out Process 3 of 4 is alive Process 0 of 4 is alive Process 1 of 4 is alive Process 2 of 4 is alive data on 1 is 100, 0 data on 3 is 100, 0 data on 2 is 100, 0 [... snipped ...] data on 3 is 1000, 0 data on 0 is 1000, 4000 data on 2 is 1000, 0 STOP (PE 2) executed at line 44 in Fortran routine 'MPI_QUIZ' STOP (PE 0) executed at line 44 in Fortran routine 'MPI_QUIZ' STOP (PE 3) executed at line 44 in Fortran routine 'MPI_QUIZ' STOP (PE 1) executed at line 44 in Fortran routine 'MPI_QUIZ'This seems to be the final correction, but let us know if you spot anything else.
Quick-Tip Q & A
A:{{ "I can't login! I keep on trying... The Kerberos server
accepts my 'kerberos password,' asks for my 'card-code,' which
I enter, but then it says:
Enter Next Token:
I enter my SecurID PIN into my SecurID card (AGAIN), type the 'next
token' which appears on the card, but it doesn't work!"
(What should this person do?) }}
For politically correct brevity, let this person be a "he."
Because he's had too many failed logins, the SecurID Server has put
this user's account into "next token mode."
At the "Enter Next Token:" prompt, he should NOT reenter the PIN into
the card to obtain a new passcode. Instead, he should wait for the
passcode displayed on the card to change naturally (this will take at
least 60 seconds). The new passcode is the "next token" requested. He
should enter it at the "Enter Next Token:" prompt.
If this attempt fails, the user should call his computer center's help
desk. Possible solutions:
o The consultant can cancel "next token mode."
o If the card has lost synchronization, the consultant can try to
resync it.
o If the card's battery is fading, or it can not be resynchronized,
the consultant can provide a temporarily login mechanism and ship a
new card.
Q: What's the best restaurant in Minneapolis and why?
(CUG '99 is in Minneapolis next week. Hungry attendees may seek out
Guy or Tom for preliminary responses to this question. Satisfied
attendees are welcome to submit opinions.)
[ Answers, questions, and tips graciously accepted. ]
Current Editors:
E-mail Subscriptions:
Ed Kornkven ARSC HPC Specialist ph: 907-450-8669 Kate Hedstrom ARSC Oceanographic Specialist ph: 907-450-8678 Arctic Region Supercomputing Center University of Alaska Fairbanks PO Box 756020 Fairbanks AK 99775-6020
-
Subscribe to (or unsubscribe from) the e-mail edition of the
ARSC HPC Users' Newsletter.
-
Back issues of the ASCII e-mail edition of the ARSC T3D/T3E/HPC Users' Newsletter are available by request. Please contact the editors.
