ARSC HPC Users' Newsletter 365, July 13, 2007

HPC Faculty Boot Camp Agenda Posted; Drop-In Visitors Welcome

ARSC's annual faculty "boot camp," July 30 - Aug. 17, is an intensive, hands-on opportunity to gain new, or improve existing high performance computing skills, including how to manage extremely large volumes of data, run simulations for longer durations, and visualize data in 3D.

All faculty camp sessions are open.

Users, prospective users, other members of the university community, and the general public are welcome to attend presentations of interest. Here's the schedule:

http://www.arsc.edu/support/training/HPCBootCampAgenda2007.html

You may contact the coordinator, Tom Logan, logan@arsc.edu , with questions.

WORKDIR Purging on Midnight to Commence, August 1st

This is a notice that purging on midnight will start on Wednesday, August 1, 2007.

As described in our web pages on storage:

http://www.arsc.edu/support/howtos/storage.html

the work directory filesystems on each of our platforms are a finite resource and are intended to be used as short-term storage. These working directories are accessed via the $WORKDIR environment variable and are not backed up. These directories include the following on midnight: /wrkdir.

In order to ensure that these filesystems continue to provide high-speed access to potentially huge temporary files, ARSC will be re-enabling the automatic purging of files on the $WORKDIR filesystem. This means that files on the $WORKDIR that you have not accessed in more than 30 days will automatically be removed. All /wrkdir files that you have not accessed in more than 30 days will be removed beginning on August 1. Please move any files you wish to save to $ARCHIVE_HOME now. Note that the $WORKDIR directory is not backed up and files in this directory which are not archived are at risk.

Here are some of the common questions and answers that you may have about this process. If you have any further questions or concerns, please feel free to contact consult AT arsc.edu or 907-450-8602.

--

Question: How can I tell which files will be purged?

Answer: The purger uses the 'last access' date for each file. You can view this date for each file by using the -u option to ls. For example:

ls -lu

Note that this is different than the 'modified' date that Unix/Linux/OSX normally show. Anytime you read the contents of a file (or use a command like cat) you update the 'accessed' date field for that file.

You can also use a tool called getPurgable. Invoking

/usr/local/bin/getPurgable

without any options will list out all files that you have in $WORKDIR that have not been accessed in over 30 days. You can see all of the available options by invoking getPurgable with -help

  /usr/local/bin/getPurgable -help 
  Usage: getPurgable [-p path] [-a agelimit] days
       getPurgable [-p path] [-a agelimit] [-n days]
       getPurgable -help
-h
-?

     -p (path)     : Path to the filesystem (default is $WRKDIR)
     -a (agelimit) : Looks for files older than agelimit (default is 30)
    [-n] days      : Looks for files that will be older than agelimit 
                   : over the specified days
     -h (Help)     : Prints usage information

For example,

/usr/local/bin/getPurgable -n 3

will show which files will become eligible for purging over the next 3 days. --

Question: Where should I put my data?

Answer: Copy the files and directories you want to retain to $ARCHIVE_HOME. As an aside, please remember that none of the $WORKDIR filesystems on any of our systems are backed up. Thus, it is a really good idea to backup your working files to either $ARCHIVE_HOME or $HOME regularly -- regardless of whether they are subject to purging or not. Only files that have not been accessed in more than 30 days will be purged so files that you are actively accessing in $WORKDIR will not be removed.

We recommend that you copy your files as opposed to moving them. This way you can, if needed, compare checksums between the copies to verify that they were copied correctly. We also request that you tar up directories prior to copying them as this can greatly speed up file transfers to the storage server. We recommend that tar files not exceed 200 GB. We do not recommend that you gzip or compress your files prior to copying them. Compressing the files on the systems can frequently take much longer than just copying the raw tar files and data is automatically compressed by the tape drive hardware prior to being written out.

Finally, note that if you need a larger quota in your $HOME directory to store source code, make files, etc., please send an email to consult. We are unable to grant $HOME quotas as large as the $WORKDIR quotas, but we can increase $HOME somewhat.

ARSC Summer Science Seminar Series Continues

TUESDAY, JULY 17 1-2 p.m. 010 West Ridge Research Building (WRRB) (See map, at: http://www.arsc.edu/ )

"Overview of ARSC Computational Science Activities"

Chief Scientist for the Arctic Region Supercomputing Center, Greg Newby, will lead a panel discussion for the general public on the breadth and depth of computational science activities at ARSC, Alaska's only full-service high performance supercomputing facility, Tuesday, July 17 at 1 p.m., in room 010 of the West Ridge Research Building.

TUESDAY, JULY 24 1-2 p.m. 010 West Ridge Research Building (WRRB)

"Seeking Admission to Graduate School for the Sciences"

If you are considering the idea of going to graduate school and not sure what to expect, you are invited to a free seminar hosted by Arctic Region Supercomputing Center Chief Scientist Greg Newby and guest panelists Tuesday, July 24 at 1 p.m., in room 010 of the West Ridge Research Building. Prospective graduate students will hear about the best methods for preparation for grad school, how different factors such as standardized test scores, undergraduate research, grades and recommendation letters are considered, how graduate school is different from undergraduate studies, and what different subject areas are available for study.

--

For more information on the science seminar series, contact ARSC Chief Scientist Greg Newby at newby@arsc.edu .

Cerebro to be Retired, July 31, 2007

The cerebro research cluster is scheduled to be retired at 8:00am on Tuesday July 31st, 2007. Cerebro home directories will be backed up before the scheduled decommission, however all data in $WORKDIR will be deleted. Therefore, if you have any data stored in your home directory or $WORKDIR, please retrieve the data before Tuesday the 31st.

If you have any questions regarding the decommissioning of cerebro or transferring data from your home directory or $WORKDIR, please contact the ARSC Help Desk.

$WORKDIR or $WRKDIR, Which Should I Use ?

Both $WORKDIR or $WRKDIR point to the same, purgeable filesystem. You should prefer the $WORKDIR variable, as it is now the standard across all the HPCMP centers, which will make migration between centers easier. Similarly, $ARCHIVE_HOME and $ARCHIVE are equivalent, but you should prefer $ARCHIVE_HOME.

Common environment variable names are only one part of the Baseline Configuration (BC) Project, which is working to create consistent working environments across all resources of the Modernization Program. We introduced this to you in issue #345:

> /arsc/support/news/hpcnews/hpcnews345/index.xml#article2

For all the details, see:

http://www.afrl.hpc.mil/consolidated/bc/index.php

Quick-Tip Q & A


A:[[ I have a binary formatted file with floating point numbers in it,
  [[ but I can't remember if I compiled with REAL*4 or REAL*8 when I 
  [[ created the output.  I know I should really be using NetCDF or HDF 
  [[ for output files, but I didn't know about those file formats when 
  [[ I created the file!  Is there a simple way for me to see the values 
  [[ in the file without writing a new program?  There ought to be a
  [[ way to do this on the command line!



  #
  # Sean Ziegeler
  #

  Personally, I find that dissecting binary files is an important
  skill. On UNIX-like systems, I use the "od" command.  It can dump the
  file as ints, reals, and other standard types.  For more information,
  type "man od", but in-short, you will want to dump the file as 4-byte
  reals and then as 8-byte reals and see which one makes more sense.
  I am going to assume that you do not have any other non-real values
  before your reals (more on that later).

  So first try the following (piping to "more", "less", "head", etc. since
  the output may be very long depending on your file size):

    od -t fF mydatafile 
 more

    What you should see is something like:
    0000000   7.845912e-40   1.229055e+03   3.921417e+37   9.261866e-06
    0000020  -7.855965e-31   1.486139e+23  -5.087093e+06  -7.099653e-04
    ...

  The left column of numbers are hexadecimal offset values (see the man
  page).  As you can see in the example, the above values fluctuate wildly
  and probably aren't correct.  So, try the command again for 8-byte:

    od -t fD mydatafile 
 more

  If those numbers make more sense, then you probably wrote out 8-byte
  floats.

  In the case that you did write out header information or other values
  before your reals, you can use the "-j" flag to skip some bytes before
  writing out values.  The following example skips 100 bytes before printing
  out real values:

    od -j 100 -t fF mydatafile 
 more

  I mention that because, quite often, Fortran programmers use the
  sequential unformatted format rather than "direct" access.  In those
  cases, you will usually have a 4-byte record separator before your real
  values start.  Thus, you may need to try "od -j 4 ..."

  One other handy hint for Linux users: the "khexedit" program is a
  GUI-based program with some of the above features.  It isn't good at
  listing long strings of real numbers, but it can inspect them one at a
  time and without all the cumbersome command-line switches.


  #
  # Jed Brown
  #

  od will do the trick:

  # in bash...
  $ bindata () { perl -e 'print pack("ffdd", 1.1, 2+1e-8, 3.3, 4+1e-8)'; }
  $ bindata 
 od -t f4
  0000000   1.100000e+00   2.000000e+00   2.720083e+23   2.162500e+00
  0000020   1.577722e-38   2.250000e+00
  0000030
  $ bindata 
 od -t f8
  0000000   2.000000473484397e+00   3.300000000000000e+00
  0000020   4.000000010000000e+00
  0000030


  #
  # Kate Hedstrom
  #

  Perl used to come with an xdump script for reading binary files in a
  nifty format. I'd use that to see if there was some pattern that
  repeated every 4 or 8 bytes. I can attach an old xdump to this
  message.



A:[[ **BONUS ANSWER (courtesy of Kurt Carlson) on the problem of
  [[ ** preserving file ownership when moving directories between hosts.

  FYI, using 'sudo tar xf' is STRONGLY discouraged (considered bad
  practice).

  Yes, it preserves ownership... and when installing things it may set
  very wrong ownership.  So many sys admins were burned at some time
  in their careers by untarring as root that it generally makes the
  list now of "don't do that."  Untar as yourself and set ownerships
  appropriately when you're done.

  Anyway, there is no real solution for the question you posed.
  Possibly one approach might be to use source repositories
  (cvs/subversion/...) where you have logs of who checked things in but
  actual ownership is irrelevant (as it should be).




Q: Can someone please explain this?  When I use over 30000000
   random variables in my code, the sum and average get out of whack.
   Below is a sample code which shows the problem.
   
   mg56% cat t2.f90
   program t2
   
     integer :: p, N
     real, allocatable, dimension (:) :: a
   
     do N=10000000,50000000,5000000
       allocate (a(1:N))
   
       call random_seed ()
       call random_number (a)
   
       print*, "Count:", N, "Sum:", sum (a), "Avg:", sum(a)/real(N), "Min:"  &
         , minval(a), "Max:", maxval(a)
   
       deallocate (a)
     enddo 
   
   end program
   
   mg56% pathf90 t2.f90
   mg56% ./a.out
    Count: 10000000 Sum: 5000621.5 Avg: 0.500062168 Min: 2.980232239E-7 Max: 0.999999702
    Count: 15000000 Sum: 7499592.5 Avg: 0.49997282 Min: 0.E+0 Max: 0.999999881
    Count: 20000000 Sum: 10002506. Avg: 0.500125289 Min: 0.E+0 Max: 0.99999994
    Count: 25000000 Sum: 12498267. Avg: 0.49993068 Min: 5.960464478E-8 Max: 0.99999994
    Count: 30000000 Sum: 15000845. Avg: 0.500028193 Min: 0.E+0 Max: 0.99999994
    Count: 35000000 Sum: 16777216. Avg: 0.479349017 Min: 0.E+0 Max: 0.99999994
    Count: 40000000 Sum: 16777216. Avg: 0.419430405 Min: 0.E+0 Max: 0.99999994
    Count: 45000000 Sum: 16777216. Avg: 0.372827023 Min: 5.960464478E-8 Max: 0.99999994
    Count: 50000000 Sum: 16777216. Avg: 0.335544318 Min: 0.E+0 Max: 0.99999994
   mg56% 

   It happens on both the IBM with xlf90 and the Sun Opteron cluster
   with pathf90.  (Sorry, but 30000000 random numbers just isn't
   enough!)

[[ Answers, Questions, and Tips Graciously Accepted ]]


Current Editors:
Ed Kornkven ARSC HPC Specialist ph: 907-450-8669
Kate Hedstrom ARSC Oceanographic Specialist ph: 907-450-8678
Arctic Region Supercomputing Center
University of Alaska Fairbanks
PO Box 756020
Fairbanks AK 99775-6020
E-mail Subscriptions: Archives:
    Back issues of the ASCII e-mail edition of the ARSC T3D/T3E/HPC Users' Newsletter are available by request. Please contact the editors.
Back to Top