ARSC HPC Users' Newsletter 241, March 15, 2002

SPSoft Bioinformatics Codes on Icehawk

ARSC has installed, on a one-month trial license, Southwest Parallel Software's parallelized versions of phrap, swat, and cross_match. See:

http://www.spsoft.com/

These products are available for user evaluation on the IBM SP, in:


  /usr/local/pkg/spsoft/current/bin 

  ICEHAWK$ ls -l
  -rwxr-xr-x   1 root     software   33903 Mar 12 14:38 phrapview
  -rwxr-xr-x   1 root     software 1260821 Mar 12 13:12 pphrap
  -rwxr-xr-x   1 root     software  750721 Mar 12 13:12 pswat
  -rwxr-xr-x   1 root     software  924533 Mar 12 13:12 pxm

Set the following environment variable to access the license (the syntax given is for Korn shell users):


  export LM_LICENSE_FILE=/usr/local/etc/spsoft.lic

and to use all four CPUs on a node,


  export OMP_NUM_THREADS=4

If you use these products, please let us know how they work for you, by email to consult@arsc.edu .

Portable, Fast Fortran IO

[ Thanks to Jeff McAllister of ARSC for this contribution. ]

Read the title of this article again...

Sounds too good to be true. Before I tried this out I thought the tradeoff was clear. Fortran output can be:

  1. ASCII/formatted (abysmally slow but portable) OR
  2. binary/unformatted (fast but generally system-specific)

However, there is a way to combine the best attributes of both -- direct access files. They aren't as portable as ASCII, but far more so than binary. Plus they allow binary-level file performance.

There are very few changes required to convert a program which uses sequential IO to direct IO. The open statement requires a record length and each read/write must specify the record being accessed. That's it. While the order is implied for sequential access, the record number can just be a counter to achieve the same effect.

Achieving portability on Cray PVP systems requires an extra step, using Cray's "assign" statement.

Here is a demo program to read or write a direct-access file.


! ------------------------------------------------------
!  a direct-access demo program
! ------------------------------------------------------
program ftest
  implicit none

  integer,parameter::asize=5
  character(len=10)::mode,statstr

  integer,parameter::mykind=4 ! set to 8 for 64-bit IO

  integer(kind=mykind) i,fsize,ierr
  real(kind=mykind),dimension(asize)::A

  ! set mode to read or write to determine program's action

  mode="write"
  !mode="read"


#ifdef _CRAYSV1
  ! IO on unit 66 converted to IEEE 32-bit format
  call asnunit(66,"-N ieee_32",ierr) ! use ieee_64 for 64-bit IO
#endif

! ------------------------------------------------------
!  open the file
! ------------------------------------------------------
  if (mode=="write") then
     statstr="replace"
  else
     statstr="old"
  endif

  open (66,file="test.file",form="unformatted",status=statstr,&
       recl=mykind*asize,access="direct")

! ------------------------------------------------------
!  get/set the number of records
! ------------------------------------------------------
  if(mode=="write") then
     A=dble(0.5)
     fsize=30
     write(66,rec=1) fsize
  else
     read (66,rec=1) fsize
  endif

! ------------------------------------------------------
!  main IO -- whole array
! ------------------------------------------------------
  if(mode=="write") then
     do i=1,fsize
        write (66,rec=i+1) A
        a=a+dble(1.0)
     end do
  else
     do i=1,fsize
        read (66,rec=i+1) A
        print *,A
     end do
  endif

  close (66)

! ------------------------------------------------------
!  remove assign --
!   otherwise unit 66 will always be IEEE-converted!
! ------------------------------------------------------
#ifdef _CRAYSV1
  call assign("assign -R")
#endif

end program ftest

In the above code the ifdef directives are executed only on the Cray SV1 which sets the _CRAYSV1 variable automatically. (See Cray's online documents, "CF90 Commands and Directives Reference Manual", section 5.3, for macros for other CRAY PVP systems.)

The assign statements are necessary to force a numeric conversion from Cray format to one conforming with the IEEE standard. (See "File Assignment: Dangerous but Useful" in issue 125:

/arsc/support/news/t3enews/t3enews125/index.xml

and its followup "Safer File Assignments" in issue 192:

/arsc/support/news/t3enews/t3enews192/index.xml

for more information on assign.) As the T3E, IBM, and SGIs are already IEEE compliant there is no need for assign on those platforms.

Even with the magic of assign, direct access files aren't a panacea. I've tested this between all ARSC systems. Everything works for all combinations except the Linux cluster -- files produced on that platform can only be read correctly there. However, if a machine supports the IEEE standard for numerical storage then direct access should be fine.

The following numbers should be an incentive to consider direct access files, even if ASCII is unbeatable for portability. As I pointed out in "Taming the IO Beast" issue, 236:

/arsc/support/news/t3enews/t3enews192/index.xml

unformatted access is the only way to make use of faster filesystems. The poor performance of ASCII files means that performance stays at low levels no matter how fast the hardware potentially could be.

Consider, for example, results obtained from chilkoot's ssd filesystem: (your results will vary)


 IO method                         1 MB time      1 MB bandwidth
 ---------                         ---------      --------------
 single,formatted,sequential       19.5 sec       .05 MB/sec
 single,unformatted,sequential     4.4            .23
 array,formatted,sequential        1.1            .9
 array,unformatted,direct          .08            13.0
 array,unformatted,sequential      .01            90.2

My notation here:

  • single=outputting only one array element at a time, array=whole array at once
  • formatted=ASCII formatted output, unformatted=binary unformatted output
  • sequential=default access, direct=direct-mode (aka random) access

One entry missing from the table is formatted direct access. This combination is not recommended.

Whole-array unformatted output is clearly the best for performance. Here writing one element at a time in ASCII approaches 2000x slower. The difference gets worse as file size increases. Direct access, while not as fast, is only about 7x slower. It is also about 15x faster than ASCII output. On other filesystems the difference may not be nearly as clear, but it is always significant.

In the end there is no best solution for all possible tasks. However, this tool might fit some applications better than the other options.

Even Co-Array Fortran Can't Do This...

A T3E user got the following cryptic compiler error on a bit of new co-array fortran code:


yukon$  f90 -Z prog.F90
  f90-976 f90: INTERNAL PROG, File = prog.F90, Line = 728, Column = 24 
    Expected Dv_Deref_Opr from ptr_assign_from_pointer.

Can you spot the error? Here are the relevant variable declarations, then the line of incorrect code:


   TYPE TRANS
     REAL*8, POINTER, DIMENSION(:,:,:) ::  EN
   END TYPE TRANS
  
   TYPE(TRANS) T(6)[*]
  
   REAL*8, POINTER, DIMENSION(:,:,:) :: EG
  
   INTEGER, DIMENSION(6) :: P1
   INTEGER(KIND=8), DIMENSION(6) :: P0
   INTEGER:: M
  
! error here:
   EG => T(P1(M))[P0(M)]%EN

If this worked, it would make a pointer on one processor point to data stored on another processor. That would be pretty cool... but the language doesn't allow it.

While you can't do pointer assignments involving co-arrays, you can do regular assignments. Assuming storage had been allocated for "EG", the following statement:


   EG = T(P1(M))[P0(M)]%EN

Copies the contents of the array pointed to by T(P1(M))%EN on the source processor to EG on the local processor. In Co-array Fortran, you talk about "images" not "processors." Thus, this is really copying the array from image P0(M) to the local image.

IBM SCICOMP 5

Announcement from SCICOMP:

SCICOMP 5, the fifth international meeting of SCICOMP, the IBM Scientific Computing User Group, is scheduled for the week of May 6 through 10, 2002, and will be hosted by Daresbury Laboratory, Daresbury, England.

More information about the SCICOMP organization can be found at

http://www.spscicomp.org/index.html

and complete information about this meeting can be found at

http://www.spscicomp.org/ScicomP5/

The agenda still has time slots available for user presentations, and we accept abstract submissions via the web form at

http://www.spscicomp.org/ScicomP5/userpres.html

Information on local lodgings and transportation are available at

http://www.spscicomp.org/ScicomP5/index.html#localinfo

and you can register to attend the meeting at

http://www.spscicomp.org/ScicomP5/register.html

The meeting web pages are updated regularly, so please check back often.

3D Printer Installed at UAF/ARSC

[This was released Feb 26...]

When we think of a computer printer, we usually think of a two-dimensional process using ink to represent text or images on paper. But, it is possible for scientists, engineers and artists to create tangible objects directly from their three-dimensional (3D) computer files.

This week, staff at the Arctic Region Supercomputing Center (ARSC) installed in the center's visualization lab, a printer capable of creating three-dimensional objects from computer files. Acquisition of such a printer was brought about through a collaborative effort between ARSC staff, UAF students and UAF researchers using funds awarded by the University's Technology Advisory Board and ARSC.

Obtaining the printer was originally the vision of local artist and ARSC visualization specialist Bill Brody. "It's basically a well-built thermal ink-jet printer that works like a computer-controlled high-resolution hot glue gun," says Brody. "It prints in the same resolution as a laser printer, but in three dimensions."

Brody, along with Computer Science/ARSC joint appointees Chris Hartman and Glenn Chappell, have been working on a three-dimensional computer-user interface that will eliminate the need for a mouse or wand when interacting with the computer. The program, called the Body Language User Interface, or BLUI, currently works similarly to a 3D drawing or sculpting program-creating files like those created by Maya, the animation program used by Hollywood studios to create computer animations and special effects in motion pictures.

Using these files, the ARSC 3D fabrication device creates objects made of casting wax. These objects can be up to 10.5x8x7 inches in size. Currently, Brody is outputting several sculptures he created using BLUI, which he will then have cast in metal. In addition, he plans to output a terrain model of the Brooks Range.

Three dimensional printers have been in use by industrial designers for several years. Doctors use this technology to prefabricate working artificial joints for replacement surgery. The Nike Corporation owns devices like this to "print" prototypes of soles for shoes. Already at UAF, there are engineers, designers and artists ready to take advantage of the new equipment to enhance and excel their research projects.


###
CONTACT: ARSC Public Information Officer Jenn Wagaman, (907) 
450-8662, 
wagaman@arsc.edu
 or Bill Brody, Visualization Specialist, 
(907) 474-1895, 
brody@arsc.edu
.

ARSC news releases are also available on the Web at

http://www.arsc.edu/

Perl Examples

The response to last week's quick-tip calls for a full-blown article... So here goes:


# 
# Thanks to Richard Griswold:
# 

I guess I've been ruined by my exposure to Windows, but I've started using spaces in my file and directory names. This plays all kind of havoc when I try to use find/xargs to muck around with my files. Perl (or sed) come in handy here.

For example, if I want to open all .c files in nedit, using the "nc" launcher (See http://www.nedit.org/ ), I could try:

  find . -name "*.c" 
 xargs nc
but, since I have spaces in my file or directory names, this can cause problems. To escape the spaces, simply do:

  find . -name "*.c" 
 perl -p -e 's/ /\\ /g' 
 xargs nc
The -p argument tells perl apply the script to each line of the filename arguments or stdin, similar to the way sed works. The -e flag tells perl that the next argument is a one-line perl script. In this case, it is a simple regex that puts a backslash before each space.

You could do the same thing using sed:


  find . -name "*.c" 
 sed 's/ /\\ /g' 
 xargs nc

You can also use perl to change the contents of a file in place, something that sed cannot easily do.

In one case, I had approximately 430 frames from an animation that I rendered. Since I used several systems to render portions of the animation, I had filenames like "saltyXXX.YYY.iff", where XXX was the machine number and YYY was the frame number. The problem was that the compositing software wanted filenames like "salty.YYY.iff". The only good part was that the frame numbers were all sequential, so I didn't have to renumber the files from different machines.

Since I was working on a Windows machine, I didn't have access to all the nice tools that I was used to. All hope was not lost however. Instead of renaming all the files by hand, I redirected the output from dir to a file and transferred the file to a Unix machine. There I used a couple of perl commands to transform the dir output to a batch file. Once I had a batch file, I transferred it back to the Windows machine and ran it to rename all of the files.

First I had to move each file name onto a line by itself:


  perl -pi -e 's/ +/\n/' salty.bat
This is similar to the previous example, but -i tells perl to read from one or more files rather than stdin. Next, I made each line into a ren statement:

  perl -pi -e 's/(salty)([0-9]+)(.+)$/mv \1\2\3 \1\3/' salty.bat

This is simply a more complex regex that transforms something like "salty005.14.iff" to "ren salty005.14.iff salty.14.iff".

With two simple perl commands, I saved myself from hours of tedious manual file renaming.


# 
# Thanks to Kate Hedstrom:
# 

A little Perl example:


#!/usr/bin/perl -w

while (<>) {
    ($lon, $lat, $depth) = split;
    print "$depth\n";
}

This is like a cut command, reading three columns of data and printing only the third.

**********************************************************

This is in my .cshrc:


  alias noM 'perl -i -pe "s/\015//"'
use as

noM file1 [file2] ...

This deletes the control-M's from files created on a PC. For Mac files, you would need to have:


  alias noMac 'perl -i -pe "s/\015/\n/g"'

The -i option is telling it to modify files in place. The -pe options tell it to behave like sed, operating on each line in turn and printing them.

**********************************************************

Here is a little script for doing one phase of converting to f90:


#!/usr/bin/perl -i.bak
#
# Usage: f90_com file
#
# Convert to F90 comments (!), in place, saving old file to file.bak
#
$comment_char = '!';
#
# main loop
#
while (<>) {
        # replace upper and lower case c, *
        if (/^[*c]/i) {
                substr($_,0,1) = $comment_char;
        }
        print;
}

**********************************************************

This is an example of a tiny program expanded to include built-in documentation. I learned to do this from the Work column in Server-Workstation Expert. The actual program is in these lines:


while (<>) {
    print eval $_;
    print "\n";
    warn $@ if $@;
}

For each line of input, evaluate the expression and print it. It is an interactive calculator somewhat like bc and dc.

The whole thing:


#!/usr//bin/perl -w

use Pod::Usage;
use Getopt::Long;

# From the Pod::Usage man page
our ($opt_help, $opt_man);
GetOptions("help", "man")
  or pod2usage("Try '$0 --help' for more information");
pod2usage(-verbose => 1) if $opt_help;
pod2usage(-verbose => 2) if $opt_man;
pod2usage("Too many arges: " . @ARGV) if @ARGV > 0;

while (<>) {
    print eval $_;
    print "\n";
    warn $@ if $@;
}

__END__

pcalc - Perl calculator

=head1 SYNOPSIS

pcalc [--help] [--man]

=head1 DESCRIPTION

An interactive calculator, much like "bc -l". In addition, it supports
Perl variables and syntax.

=head1 OPTIONS AND ARGUMENTS

=over 4

=item I<-help>

Print more details about the arguments.

=item I<-man>

Print a full man page.

=back

=head1 EXAMPLE

    % pcalc
    use Math::Trig

    $pi = 4*atan(1)
    3.14159265358979
    2*$pi/24/3600
    7.27220521664303e-05

=head1 AUTHOR

Larry Wall
From the first Camel Book.

=cut

# 
# One from the editor:
# 

  cat file 
 perl -n -e "print if /start_pattern/ .. /end_pattern/"

Prints blocks of text from file. A block starts when "start_pattern" is found and continues until "end_pattern" is found. The ".." is perl's "range" operator. "-n" assumes a loop around the script but doesn't print every line, as "-p" would. Thus, only lines explicitly selected by the "if" condition are printed.

I have this function defined in my .profile file:


function range {
   perl -n -e "print if /$1/ .. /$2/"
}
Usage:

  cat file 
 range "start_pattern" "end_pattern"

Quick-Tip Q & A



A:[[ I resolve to stop using sed, awk, cut, split, complicated egrep
  [[ commands, etc..., in favor of perl. 
  [[ 
  [[ My goal is to simplify, and learn just one way to do everything.
  [[
  [[ Can you help me get started? I'd appreciate a couple examples
  [[ --with explanations--of using perl on the command line or in short
  [[ scripts, to accomplish common unix tasks.


  # 
  # See last article... 
  # 



Q: Recently I was debugging a code on the SV1 using totalview.  In one
   of the subroutines there were several matrices declared as:

          COMPLEX         A( LDA, * ), B( LDB, * )
          COMPLEX         Z( LDZ, * )

   Where LDA = LDB = LDZ = 147.  When viewing the values, totalview
   displayed all the data arrays as all being dimensioned (147, 1) when
   in reality the sizes were A(147,147), B(147,147) and Z(147,15).

   I tried changing the sizes in the variable display window, but
   couldn't get it to work.  I also found that I could type in single
   values to look at, say Z(15,15) and that would work.  However, I'd
   like to be able to see the whole matrix at once.  Does anyone know if
   this can be done?

[[ Answers, Questions, and Tips Graciously Accepted ]]


Current Editors:
Ed Kornkven ARSC HPC Specialist ph: 907-450-8669
Kate Hedstrom ARSC Oceanographic Specialist ph: 907-450-8678
Arctic Region Supercomputing Center
University of Alaska Fairbanks
PO Box 756020
Fairbanks AK 99775-6020
E-mail Subscriptions: Archives:
    Back issues of the ASCII e-mail edition of the ARSC T3D/T3E/HPC Users' Newsletter are available by request. Please contact the editors.
Back to Top