ARSC HPC Users' Newsletter 241, March 15, 2002
SPSoft Bioinformatics Codes on Icehawk
ARSC has installed, on a one-month trial license, Southwest Parallel Software's parallelized versions of phrap, swat, and cross_match. See:
These products are available for user evaluation on the IBM SP, in:
/usr/local/pkg/spsoft/current/bin ICEHAWK$ ls -l -rwxr-xr-x 1 root software 33903 Mar 12 14:38 phrapview -rwxr-xr-x 1 root software 1260821 Mar 12 13:12 pphrap -rwxr-xr-x 1 root software 750721 Mar 12 13:12 pswat -rwxr-xr-x 1 root software 924533 Mar 12 13:12 pxm
Set the following environment variable to access the license (the syntax given is for Korn shell users):
export LM_LICENSE_FILE=/usr/local/etc/spsoft.lic
and to use all four CPUs on a node,
export OMP_NUM_THREADS=4
If you use these products, please let us know how they work for you, by email to consult@arsc.edu .
Portable, Fast Fortran IO
[ Thanks to Jeff McAllister of ARSC for this contribution. ]
Read the title of this article again...
Sounds too good to be true. Before I tried this out I thought the tradeoff was clear. Fortran output can be:
- ASCII/formatted (abysmally slow but portable) OR
- binary/unformatted (fast but generally system-specific)
However, there is a way to combine the best attributes of both -- direct access files. They aren't as portable as ASCII, but far more so than binary. Plus they allow binary-level file performance.
There are very few changes required to convert a program which uses sequential IO to direct IO. The open statement requires a record length and each read/write must specify the record being accessed. That's it. While the order is implied for sequential access, the record number can just be a counter to achieve the same effect.
Achieving portability on Cray PVP systems requires an extra step, using Cray's "assign" statement.
Here is a demo program to read or write a direct-access file.
! ------------------------------------------------------
! a direct-access demo program
! ------------------------------------------------------
program ftest
implicit none
integer,parameter::asize=5
character(len=10)::mode,statstr
integer,parameter::mykind=4 ! set to 8 for 64-bit IO
integer(kind=mykind) i,fsize,ierr
real(kind=mykind),dimension(asize)::A
! set mode to read or write to determine program's action
mode="write"
!mode="read"
#ifdef _CRAYSV1
! IO on unit 66 converted to IEEE 32-bit format
call asnunit(66,"-N ieee_32",ierr) ! use ieee_64 for 64-bit IO
#endif
! ------------------------------------------------------
! open the file
! ------------------------------------------------------
if (mode=="write") then
statstr="replace"
else
statstr="old"
endif
open (66,file="test.file",form="unformatted",status=statstr,&
recl=mykind*asize,access="direct")
! ------------------------------------------------------
! get/set the number of records
! ------------------------------------------------------
if(mode=="write") then
A=dble(0.5)
fsize=30
write(66,rec=1) fsize
else
read (66,rec=1) fsize
endif
! ------------------------------------------------------
! main IO -- whole array
! ------------------------------------------------------
if(mode=="write") then
do i=1,fsize
write (66,rec=i+1) A
a=a+dble(1.0)
end do
else
do i=1,fsize
read (66,rec=i+1) A
print *,A
end do
endif
close (66)
! ------------------------------------------------------
! remove assign --
! otherwise unit 66 will always be IEEE-converted!
! ------------------------------------------------------
#ifdef _CRAYSV1
call assign("assign -R")
#endif
end program ftest
In the above code the ifdef directives are executed only on the Cray SV1 which sets the _CRAYSV1 variable automatically. (See Cray's online documents, "CF90 Commands and Directives Reference Manual", section 5.3, for macros for other CRAY PVP systems.)
The assign statements are necessary to force a numeric conversion from Cray format to one conforming with the IEEE standard. (See "File Assignment: Dangerous but Useful" in issue 125:
/arsc/support/news/t3enews/t3enews125/index.xml
and its followup "Safer File Assignments" in issue 192:/arsc/support/news/t3enews/t3enews192/index.xml
for more information on assign.) As the T3E, IBM, and SGIs are already IEEE compliant there is no need for assign on those platforms.
Even with the magic of assign, direct access files aren't a panacea. I've tested this between all ARSC systems. Everything works for all combinations except the Linux cluster -- files produced on that platform can only be read correctly there. However, if a machine supports the IEEE standard for numerical storage then direct access should be fine.
The following numbers should be an incentive to consider direct access files, even if ASCII is unbeatable for portability. As I pointed out in "Taming the IO Beast" issue, 236:
/arsc/support/news/t3enews/t3enews192/index.xml
unformatted access is the only way to make use of faster filesystems. The poor performance of ASCII files means that performance stays at low levels no matter how fast the hardware potentially could be.
Consider, for example, results obtained from chilkoot's ssd filesystem: (your results will vary)
IO method 1 MB time 1 MB bandwidth --------- --------- -------------- single,formatted,sequential 19.5 sec .05 MB/sec single,unformatted,sequential 4.4 .23 array,formatted,sequential 1.1 .9 array,unformatted,direct .08 13.0 array,unformatted,sequential .01 90.2
My notation here:
- single=outputting only one array element at a time, array=whole array at once
- formatted=ASCII formatted output, unformatted=binary unformatted output
- sequential=default access, direct=direct-mode (aka random) access
One entry missing from the table is formatted direct access. This combination is not recommended.
Whole-array unformatted output is clearly the best for performance. Here writing one element at a time in ASCII approaches 2000x slower. The difference gets worse as file size increases. Direct access, while not as fast, is only about 7x slower. It is also about 15x faster than ASCII output. On other filesystems the difference may not be nearly as clear, but it is always significant.
In the end there is no best solution for all possible tasks. However, this tool might fit some applications better than the other options.
Even Co-Array Fortran Can't Do This...
A T3E user got the following cryptic compiler error on a bit of new co-array fortran code:
yukon$ f90 -Z prog.F90
f90-976 f90: INTERNAL PROG, File = prog.F90, Line = 728, Column = 24
Expected Dv_Deref_Opr from ptr_assign_from_pointer.
Can you spot the error? Here are the relevant variable declarations, then the line of incorrect code:
TYPE TRANS
REAL*8, POINTER, DIMENSION(:,:,:) :: EN
END TYPE TRANS
TYPE(TRANS) T(6)[*]
REAL*8, POINTER, DIMENSION(:,:,:) :: EG
INTEGER, DIMENSION(6) :: P1
INTEGER(KIND=8), DIMENSION(6) :: P0
INTEGER:: M
! error here:
EG => T(P1(M))[P0(M)]%EN
If this worked, it would make a pointer on one processor point to data stored on another processor. That would be pretty cool... but the language doesn't allow it.
While you can't do pointer assignments involving co-arrays, you can do regular assignments. Assuming storage had been allocated for "EG", the following statement:
EG = T(P1(M))[P0(M)]%EN
Copies the contents of the array pointed to by T(P1(M))%EN on the source processor to EG on the local processor. In Co-array Fortran, you talk about "images" not "processors." Thus, this is really copying the array from image P0(M) to the local image.
IBM SCICOMP 5
Announcement from SCICOMP:
SCICOMP 5, the fifth international meeting of SCICOMP, the IBM Scientific Computing User Group, is scheduled for the week of May 6 through 10, 2002, and will be hosted by Daresbury Laboratory, Daresbury, England.
More information about the SCICOMP organization can be found at
http://www.spscicomp.org/index.html
and complete information about this meeting can be found athttp://www.spscicomp.org/ScicomP5/
The agenda still has time slots available for user presentations, and we accept abstract submissions via the web form athttp://www.spscicomp.org/ScicomP5/userpres.html
Information on local lodgings and transportation are available at and you can register to attend the meeting atThe meeting web pages are updated regularly, so please check back often.
3D Printer Installed at UAF/ARSC
[This was released Feb 26...]
When we think of a computer printer, we usually think of a two-dimensional process using ink to represent text or images on paper. But, it is possible for scientists, engineers and artists to create tangible objects directly from their three-dimensional (3D) computer files.
This week, staff at the Arctic Region Supercomputing Center (ARSC) installed in the center's visualization lab, a printer capable of creating three-dimensional objects from computer files. Acquisition of such a printer was brought about through a collaborative effort between ARSC staff, UAF students and UAF researchers using funds awarded by the University's Technology Advisory Board and ARSC.
Obtaining the printer was originally the vision of local artist and ARSC visualization specialist Bill Brody. "It's basically a well-built thermal ink-jet printer that works like a computer-controlled high-resolution hot glue gun," says Brody. "It prints in the same resolution as a laser printer, but in three dimensions."
Brody, along with Computer Science/ARSC joint appointees Chris Hartman and Glenn Chappell, have been working on a three-dimensional computer-user interface that will eliminate the need for a mouse or wand when interacting with the computer. The program, called the Body Language User Interface, or BLUI, currently works similarly to a 3D drawing or sculpting program-creating files like those created by Maya, the animation program used by Hollywood studios to create computer animations and special effects in motion pictures.
Using these files, the ARSC 3D fabrication device creates objects made of casting wax. These objects can be up to 10.5x8x7 inches in size. Currently, Brody is outputting several sculptures he created using BLUI, which he will then have cast in metal. In addition, he plans to output a terrain model of the Brooks Range.
Three dimensional printers have been in use by industrial designers for several years. Doctors use this technology to prefabricate working artificial joints for replacement surgery. The Nike Corporation owns devices like this to "print" prototypes of soles for shoes. Already at UAF, there are engineers, designers and artists ready to take advantage of the new equipment to enhance and excel their research projects.
### CONTACT: ARSC Public Information Officer Jenn Wagaman, (907) 450-8662, wagaman@arsc.edu or Bill Brody, Visualization Specialist, (907) 474-1895, brody@arsc.edu . ARSC news releases are also available on the Web at http://www.arsc.edu/
Perl Examples
The response to last week's quick-tip calls for a full-blown article... So here goes:
# # Thanks to Richard Griswold: #
I guess I've been ruined by my exposure to Windows, but I've started using spaces in my file and directory names. This plays all kind of havoc when I try to use find/xargs to muck around with my files. Perl (or sed) come in handy here.
For example, if I want to open all .c files in nedit, using the "nc" launcher (See http://www.nedit.org/ ), I could try:find . -name "*.c" xargs ncbut, since I have spaces in my file or directory names, this can cause problems. To escape the spaces, simply do:
find . -name "*.c" perl -p -e 's/ /\\ /g' xargs ncThe -p argument tells perl apply the script to each line of the filename arguments or stdin, similar to the way sed works. The -e flag tells perl that the next argument is a one-line perl script. In this case, it is a simple regex that puts a backslash before each space.
You could do the same thing using sed:
find . -name "*.c" sed 's/ /\\ /g' xargs nc
You can also use perl to change the contents of a file in place, something that sed cannot easily do.
In one case, I had approximately 430 frames from an animation that I rendered. Since I used several systems to render portions of the animation, I had filenames like "saltyXXX.YYY.iff", where XXX was the machine number and YYY was the frame number. The problem was that the compositing software wanted filenames like "salty.YYY.iff". The only good part was that the frame numbers were all sequential, so I didn't have to renumber the files from different machines.
Since I was working on a Windows machine, I didn't have access to all the nice tools that I was used to. All hope was not lost however. Instead of renaming all the files by hand, I redirected the output from dir to a file and transferred the file to a Unix machine. There I used a couple of perl commands to transform the dir output to a batch file. Once I had a batch file, I transferred it back to the Windows machine and ran it to rename all of the files.
First I had to move each file name onto a line by itself:
perl -pi -e 's/ +/\n/' salty.batThis is similar to the previous example, but -i tells perl to read from one or more files rather than stdin. Next, I made each line into a ren statement:
perl -pi -e 's/(salty)([0-9]+)(.+)$/mv \1\2\3 \1\3/' salty.bat
This is simply a more complex regex that transforms something like "salty005.14.iff" to "ren salty005.14.iff salty.14.iff".
With two simple perl commands, I saved myself from hours of tedious manual file renaming.
# # Thanks to Kate Hedstrom: #
A little Perl example:
#!/usr/bin/perl -w
while (<>) {
($lon, $lat, $depth) = split;
print "$depth\n";
}
This is like a cut command, reading three columns of data and printing only the third.
**********************************************************
This is in my .cshrc:
alias noM 'perl -i -pe "s/\015//"'use as
noM file1 [file2] ...
This deletes the control-M's from files created on a PC. For Mac files, you would need to have:
alias noMac 'perl -i -pe "s/\015/\n/g"'
The -i option is telling it to modify files in place. The -pe options tell it to behave like sed, operating on each line in turn and printing them.
**********************************************************
Here is a little script for doing one phase of converting to f90:
#!/usr/bin/perl -i.bak
#
# Usage: f90_com file
#
# Convert to F90 comments (!), in place, saving old file to file.bak
#
$comment_char = '!';
#
# main loop
#
while (<>) {
# replace upper and lower case c, *
if (/^[*c]/i) {
substr($_,0,1) = $comment_char;
}
print;
}
**********************************************************
This is an example of a tiny program expanded to include built-in documentation. I learned to do this from the Work column in Server-Workstation Expert. The actual program is in these lines:
while (<>) {
print eval $_;
print "\n";
warn $@ if $@;
}
For each line of input, evaluate the expression and print it. It is an interactive calculator somewhat like bc and dc.
The whole thing:
#!/usr//bin/perl -w
use Pod::Usage;
use Getopt::Long;
# From the Pod::Usage man page
our ($opt_help, $opt_man);
GetOptions("help", "man")
or pod2usage("Try '$0 --help' for more information");
pod2usage(-verbose => 1) if $opt_help;
pod2usage(-verbose => 2) if $opt_man;
pod2usage("Too many arges: " . @ARGV) if @ARGV > 0;
while (<>) {
print eval $_;
print "\n";
warn $@ if $@;
}
__END__
pcalc - Perl calculator
=head1 SYNOPSIS
pcalc [--help] [--man]
=head1 DESCRIPTION
An interactive calculator, much like "bc -l". In addition, it supports
Perl variables and syntax.
=head1 OPTIONS AND ARGUMENTS
=over 4
=item I<-help>
Print more details about the arguments.
=item I<-man>
Print a full man page.
=back
=head1 EXAMPLE
% pcalc
use Math::Trig
$pi = 4*atan(1)
3.14159265358979
2*$pi/24/3600
7.27220521664303e-05
=head1 AUTHOR
Larry Wall
From the first Camel Book.
=cut
# # One from the editor: #
cat file perl -n -e "print if /start_pattern/ .. /end_pattern/"
Prints blocks of text from file. A block starts when "start_pattern" is found and continues until "end_pattern" is found. The ".." is perl's "range" operator. "-n" assumes a loop around the script but doesn't print every line, as "-p" would. Thus, only lines explicitly selected by the "if" condition are printed.
I have this function defined in my .profile file:
function range {
perl -n -e "print if /$1/ .. /$2/"
}
Usage:
cat file range "start_pattern" "end_pattern"
Quick-Tip Q & A
A:[[ I resolve to stop using sed, awk, cut, split, complicated egrep
[[ commands, etc..., in favor of perl.
[[
[[ My goal is to simplify, and learn just one way to do everything.
[[
[[ Can you help me get started? I'd appreciate a couple examples
[[ --with explanations--of using perl on the command line or in short
[[ scripts, to accomplish common unix tasks.
#
# See last article...
#
Q: Recently I was debugging a code on the SV1 using totalview. In one
of the subroutines there were several matrices declared as:
COMPLEX A( LDA, * ), B( LDB, * )
COMPLEX Z( LDZ, * )
Where LDA = LDB = LDZ = 147. When viewing the values, totalview
displayed all the data arrays as all being dimensioned (147, 1) when
in reality the sizes were A(147,147), B(147,147) and Z(147,15).
I tried changing the sizes in the variable display window, but
couldn't get it to work. I also found that I could type in single
values to look at, say Z(15,15) and that would work. However, I'd
like to be able to see the whole matrix at once. Does anyone know if
this can be done?
[[ Answers, Questions, and Tips Graciously Accepted ]]
Current Editors:
E-mail Subscriptions:
Ed Kornkven ARSC HPC Specialist ph: 907-450-8669 Kate Hedstrom ARSC Oceanographic Specialist ph: 907-450-8678 Arctic Region Supercomputing Center University of Alaska Fairbanks PO Box 756020 Fairbanks AK 99775-6020
-
Subscribe to (or unsubscribe from) the e-mail edition of the
ARSC HPC Users' Newsletter.
-
Back issues of the ASCII e-mail edition of the ARSC T3D/T3E/HPC Users' Newsletter are available by request. Please contact the editors.
