ARSC T3D Users' Newsletter 36, May 17, 1995
New T3D Batch Queues
The T3D batch queues were changed on May 16, 1995. The current T3D queues are:
always on:
16pe_24h 1 job using at most 16 PEs for 24 hours 32pe_24h 1 job using at most 32 PEs for 24 hours 64pe_24h 1 job using at most 64 PEs for 24 hours 64pe_10m 1 job using at most 64 PEs for 10 minutes 128pe_5m 1 job using at most 128 PEs for 5 minutesThere is one additional queue that is enabled on Friday at 6:00 PM and disabled at 4:00 AM on Sunday:
128pe_8h 1 job using at most 128 PEs for 8 hoursA request made to these queues will be run as soon as enough PEs are available to satisfy the request. The intent of this change is to provide more production access to users who are moving from development work to production runs. We will be closely monitoring how this new queue structure is working and we may need to modify it in the future. Please contact Mike Ess if you have any concerns about the batch queues.
User's UDBSEE Limits
Most T3D users currently have a limit of 32 PEs for batch access. Users can check their limits with the udbsee command:udbsee grep jpelimitThe output will indicate their limits in interactive (i) and batch (b). For example:
jpelimit[b] :32: jpelimit[i] :8:If your batch PE limit is too small to access these new NQS queues and you would like to use them, please contact Mike Ess, either by phone at 907-474-5404 or email to ess@arsc.edu , to have your PE batch limits increased.
Users can query the NQS batch system with the command:
qstat -ato see what other NQS T3D jobs are scheduled to run on the T3D. The utility mppmon is available to see what jobs are currently running on the T3D. T3D jobs are executed on a "first fit" priority and run to completion without interruption.
A Barrier Routine with a Fixed Delay
In developing code for the T3D it is often the case that not all PEs reach a barrier point. The behavior of the program when this happens is that the program looks hung. This is because those PEs that have reached the barrier are spinning and the PE that hasn't reached the barrier is holding everyone up.One of our users, Dr. Alan Wallcraft, a scientist with the Naval Research Center in Stennis, Mississippi and I had the need to implement a barrier function that waits a specific number of seconds. For this function, if the time delay is exceeded, then the T3D job is aborted and the user can tell which PEs have reached the barrier and which one(s) were holding up the show.
Below are two implementation of this "barrier with delay". The first is implemented with PVM and the SET_BARRIER/TEST_BARRIER functions. This version can be used with Fortran 90. The second version is implemented with Craft Fortran. A driver program and the complete source files are listed below. (These routines are also available in /usr/local/examples/mpp/src as barrier.1f and barrier2.f).
program test
real a( 100000 )
intrinsic irtc
include '/usr/include/mpp/fpvm3.h'
c
call pvmfmytid(itid)
call pvmfgetpe(itid, mype)
c
irtc0 = irtc()
c
c loop on idelay.
c
do idelay= 5,1,-1
c
c generate some unequal size tasks
c
do i = 1, 5000 * (mype+1)
a(i) = sqrt( real(i+idelay)**3)
enddo
do k= 1,35
do i = 2, 5000 * (mype+1)
a(i) = a(i-1) + a(i) + sqrt( real(i)**3 )
enddo
enddo
c
c call a barrier that aborts if any processor waits more than idelay seconds
c
call debug_barrier(idelay)
if (mype.eq.0) then
write(6,*) 'idelay=',idelay,' ok at ',
+ (irtc()-irtc0)/150000000.0 ,' sec'
call flush(6)
endif
c
c a test that is always .false., to prevent optimizing away a(:)
c
if (a(5000*(mype+1)).eq.-999.9) then
write(6,*) a(1),a(99),a(5000*(mype+1))
endif
enddo
end
SUBROUTINE DEBUG_BARRIER(IDELAY)
IMPLICIT NONE
INTEGER IDELAY
C
C A VERSION OF BARRIER THAT ABORTS AFTER IDELAY SECONDS.
C
INTRINSIC IRTC
INTEGER IRTC
LOGICAL TEST_BARRIER
C
INTEGER ITICK,NTICK, ITID,MYPE
C
INCLUDE '/usr/include/mpp/fpvm3.h'
C
CALL SET_BARRIER()
IF (TEST_BARRIER()) THEN
RETURN
ENDIF
C
ITICK = IRTC()
NTICK = ITICK + IDELAY*150000000
C
DO WHILE (ITICK.LE.NTICK)
IF (TEST_BARRIER()) THEN
RETURN
ELSE
ITICK = IRTC()
ENDIF
ENDDO
C
C ONLY GET HERE AFTER IDELAY SECONDS.
C
CALL PVMFMYTID(ITID)
CALL PVMFGETPE(ITID, MYPE)
WRITE(0,*) 'ERROR - DEBUG_BARRIER(',IDELAY,
+ ') TIMED OUT ON PE ',MYPE
CALL FLUSH(0)
CALL ABORT()
STOP
C END OF DEBUG_BARRIER.
END
A version using Craft Fortran:
program test
real a( 100000 )
intrinsic irtc
intrinsic my_pe
c
c loop on idelay.
c
call mybarriersetup
c
mype = my_pe()
irtc0 = irtc()
c
do idelay= 5,1,-1
c
c generate some unequal size tasks
c
do i = 1, 5000 * (mype+1)
a(i) = sqrt( real(i+idelay)**3)
enddo
do k= 1,35
do i = 2, 5000 * (mype+1)
a(i) = a(i-1) + a(i) + sqrt( real(i)**3 )
enddo
enddo
c
c call a barrier that aborts if any processor waits more than idelay seconds
c
delay = idelay
call mybarrier(delay)
if (mype.eq.0) then
write(6,*) 'idelay=',idelay,' ok at ',
+ (irtc()-irtc0)/150000000.0 ,' sec'
call flush(6)
endif
if (a(5000*(mype+1)).eq.-999.9) then
write(6,*) a(1),a(99),a(5000*(mype+1))
endif
enddo
end
subroutine mybarrier(delay)
c
c this subroutine is a replcement for the standard call barrier routine
c if any processor waits at a barrier more than delay seconds a call to
c abort is made and all PE dump core
c
integer flags( 0:127 )
common /mine/ flags
CDIR$ shared flags(:block)
intrinsic my_pe
mype = my_pe()
flags( mype ) = flags( mype ) + 1.
t1 = real( irtc( ) ) / 150000000.0
10 continue
et = real( irtc( ) ) / 150000000.0 - t1
if( et .gt. delay ) then
write(0,*) 'error - mybarrier(',delay,
+ ') timed out on pe ',mype
do i = 0, N$PES - 1
if( flags( i ) .lt. flags( mype ) ) then
write(0,*) 'pe ',i,' not at the barrier'
endif
enddo
call flush(0)
call abort()
endif
do i = 0, N$PES - 1
if( flags( i ) .lt. flags( mype ) ) then ! barrier
goto 10
endif
enddo
end
subroutine mybarriersetup()
c
c an initialization routine for the status flags
c
integer flags( 0:127 )
common /mine/ flags
CDIR$ shared flags(:block)
intrinsic my_pe
flags( my_pe() ) = 0
call barrier() ! make sure we're in sync
end
With the 1.2 PE version of totalview, a user can envoke these test programs as:
totalview a.outWhen executed from within totalview the progress of each PE is shown at the time of the abort initiated by the one PE that has waited at the barrier more than "delay" seconds.
If you know of similar techniques on the T3D please send them to me and I'll pass them on to the readers of the ARSC T3D newsletter.
List of Differences Between T3D and Y-MP
The current list of differences between the T3D and the Y-MP is:- Data type sizes are not the same (Newsletter #5)
- Uninitialized variables are different (Newsletter #6)
- The effect of the -a static compiler switch (Newsletter #7)
- There is no GETENV on the T3D (Newsletter #8)
- Missing routine SMACH on T3D (Newsletter #9)
- Different Arithmetics (Newsletter #9)
- Different clock granularities for gettimeofday (Newsletter #11)
- Restrictions on record length for direct I/O files (Newsletter #19)
- Implied DO loop is not "vectorized" on the T3D (Newsletter #20)
- Missing Linpack and Eispack routines in libsci (Newsletter #25)
- F90 manual for Y-MP, no manual for T3D (Newsletter #31)
Current Editors:
E-mail Subscriptions:
Ed Kornkven ARSC HPC Specialist ph: 907-450-8669 Kate Hedstrom ARSC Oceanographic Specialist ph: 907-450-8678 Arctic Region Supercomputing Center University of Alaska Fairbanks PO Box 756020 Fairbanks AK 99775-6020
-
Subscribe to (or unsubscribe from) the e-mail edition of the
ARSC HPC Users' Newsletter.
-
Back issues of the ASCII e-mail edition of the ARSC T3D/T3E/HPC Users' Newsletter are available by request. Please contact the editors.
