ARSC HPC Users' Newsletter 405, July 24, 2009

A Leg Up On R

[ By John Styers ]

One is often tasked with adopting new software to use. I was recently faced with getting up to speed on the use of R, "a language and environment for statistical computing and graphics." The most natural place to turn to is the documentation.

When using new software, I always find that one has to reach a threshold, or logic, beyond which use (and learning/adopting new features) of the software becomes natural. From "R-intro.pdf":

"The following session is intended to introduce to you some features of the R environment by using them. Many features of the system will be unfamiliar and puzzling at first, but this puzzlement will soon disappear." - David M. Smith

However, I found the documentation not quite so helpful. Actually blazing the trail from unfamiliarity to producing a plot of the data was a long (five days) and rather arduous journey. The following tutorial is an attempt to help the reader avoid similar difficulties, accelerating their introduction to R.

The first point to make is that virtually EVERYTHING in R is followed by parenthesis, even "quit":


> quit()

One of the primary tasks for this type of program is reading data from a file. To read a single column of data, enter:


> xnew <- read.table("data1D")

R can also read multi-column data sets from a comma-separated file. For example, the following file named "data2D.csv":


2741813084,2189.075,2197.116
2020956776,596.208,360.968
1892913644,342.413,342.807
1735159400,317.133,315.858
1299728072,170.853,169.788

This file can be read using the read.table() function, specifying a comma as the separator between values:


> xnew <- read.table("data2D.csv", sep = ",")

[Note: Be sure to remove any text headers on the columns or rows, as non-numerical data can confuse R.)

In the above examples, "xnew" is the data object (data.frame) one is reading in, and "data1D" and "data2D.csv" are the comma-delimited files being read. Note that the quotes around the function parameters are required.

Something else to note about R is that all operations on an object need the "<-" operand. The direction of the arrow denotes flow.

To view an object, just type it:


> xnew
          V1       V2       V3
1 2741813084 2189.075 2197.116
2 2020956776  596.208  360.968
3 1892913644  342.413  342.807
4 1735159400  317.133  315.858
5 1299728072  170.853  169.788

Individual columns from "xnew" can be stored as new data objects:


> col1 <- xnew[1] # the number in braces represents the column
> col2 <- xnew[2]
> col3 <- xnew[3]

These can then be combined to create the two-dimensional objects needed for a plot:


> colnew <- data.frame(col1, col2)
> colnew2 <- data.frame(col1, col3)

Basic plotting is simple, but can be made quite complex due to the multitude of options available. Here is an example that produces a rather nice plot and shows many of the necessary basics.


# create and title chart, plot points, and label axes
> plot(colnew, main="With and Without Singular Vectors, Time (seconds) vs. Size (bytes)",
  xlab="Size (bytes)", ylab="Time (seconds)", col=4)

# draw lines between the plotted points
> lines(colnew)

# allow multiple plots on the same chart
> par(new = T) # So one can write onto the same plot.

# plot second set of points
> plot(colnew2, pch="+", col=2, xlab="Size (bytes)", ylab="Time (seconds)")

# draw lines between the second set of plotted points
> lines(colnew2)

# create legend for first plot
> legend("topleft", pch=1, col=4, "Without Singular Vectors")

# create legend for second plot
> legend(x = 1.24e+09, y=2090, pch="+", col=2, "With Singular Vectors")

These operations are carried out in "live time," so one catch watch the effects of the commands as they are entered.

Finally, to write this chart to a PostScript file, enter:


> dev.copy2eps()

This creates a file with the name "Rplot.eps". Multiple files are put into subsequent pages.

Simple, right? The above took five days. I hope this makes YOUR journey smoother.

Sharing File Access Safely

[ By Craig Stephenson ]

One aspect of Linux that seems to confuse beginner and intermediate users alike is file permissions, particularly the effects of permissions on nested directories. To complicate matters further, ARSC has strict policies concerning what permissions users can and cannot set on their files. If one does not have a thorough understanding of Linux file permissions, it is easy to accidentally violate ARSC's security policies while trying to share the files in a directory with another user. This article demonstrates how to safely share access to your files.

Sharing Run Access To An Executable

Suppose you have written a useful program you would like to allow other members of your group, or perhaps even members of other groups, to execute. If this is a compiled code, users do not necessarily need read access to run the program, so others can run your executable without being able to copy it.

In this example, the program we want to share execute access to is located here:


  $HOME/bin/myProgram

Sharing execute access is, unfortunately, not just as simple as giving the myProgram file group or world execute access. Each directory in the path must also have execute access. The following command will allow every user on the system to run your program, without allowing them to copy it, and without allowing them to view the contents of your directories.


> chmod 711 $HOME $HOME/bin $HOME/bin/myProgram

This command gives myProgram and the two directories above it group and world execute access. The permissions look like this:


> ls -ld $HOME $HOME/bin $HOME/bin/myProgram
drwx--x--x 82 user group 8192 2009-07-10 10:34 /u1/uaf/user
drwx--x--x  2 user group 4096 2009-07-10 10:34 /u1/uaf/user/bin
-rwx--x--x  1 user group 8378 2009-07-10 10:34 /u1/uaf/user/bin/myProgram

Setting the execute bit on a directory allows that directory to provide the inode information needed to access the contents of a file or subdirectory. In this case, both $HOME and its subdirectory, $HOME/bin, require the execute bit to allow access to $HOME/bin/myProgram.

As a cautionary note, executable scripts (e.g., perl, bash, etc.) differ from compiled executables in that they require both read and execute permissions to run, as the interpreter must read and process the script's high-level source code on the fly.

Sharing Read Access To Files

Read access gives users the ability to copy, use, or read files and to view the contents of directories. As in the previous example, granting read access to a file or directory requires that execute permissions be set on all of the directories in its path.

If you wanted to give all users on the system the ability to copy myProgram and also list all of the files in your $HOME/bin directory, the following commands would work:


> chmod 711 $HOME
> chmod 755 $HOME/bin $HOME/bin/myProgram

Resulting in permissions that look like this:


> ls -ld $HOME $HOME/bin $HOME/bin/myProgram
drwx--x--x 82 user group 8192 2009-07-10 10:34 /u1/uaf/user
drwxr-xr-x  2 user group 4096 2009-07-10 10:55 /u1/uaf/user/bin
-rwxr-xr-x  1 user group 8378 2009-07-10 10:34 /u1/uaf/user/bin/myProgram

While these permissions allow all users to read myProgram and list the files in $HOME/bin, only you will be able to list the files in your $HOME directory.

Please also note that the files listed under "Environment (or "Dot") Files/Directories" on the following web page should never have group or world read access:

http://www.arsc.edu/arsc/support/policy/#dotFiles

Sharing Write Access To Files/Directories

It is a violation of ARSC's security policies to have:

  • either group or world write permissions set on your $HOME directory
  • world write permissions set on any of your files

You may, however, create a group-writable directory in your $WORKDIR or $ARCHIVE_HOME where other members of your group can write files. If the goal is to allow other members of your group to read, write, and list files in this directory:


$WORKDIR/groupFiles

The following permissions, at a minimum, would be required:


> chmod 710 $WORKDIR
> chmod 770 $WORKDIR/groupFiles

The permissions should look like this:


> ls -ld $WORKDIR $WORKDIR/groupFiles 
drwx--x---  15 user group 4096 2009-07-10 12:48 /wrkdir/user
drwxrwx---   2 user group 4096 2009-07-10 12:50 /wrkdir/user/groupFiles

If members of your group belong to multiple groups, it would probably be wise to set the setgid bit on the $WORKDIR/groupFiles directory, too:


> chmod g+s $WORKDIR/groupFiles

This will ensure that any new files or directories created in the groupFiles directory belong to the same group as its parent directory. In other words, it will help keep group ownership consistent within the directory tree.

More information on setting up shared group directories can be found in newsletter issue 372:

/arsc/support/news/hpcnews/hpcnews372/index.xml#article2

As these examples have shown, it is important to remember that file permissions are affected by every directory above them. Owing to this, users are given quite a bit of flexibility in how to share individual files or entire directory trees. But also remember to always be aware of ARSC's security policies when exercising this flexibility.

Quick-Tip Q & A


A:[[ I just exceeded my $HOME directory quota.  I would like to move some
 [[ files to my $ARCHIVE_HOME directory to make room, but I'm not sure where
 [[ to start.  Is there some command I can use to help me free up space by
 [[ moving files I haven't used in a while?

#
# The following response from Rahul Nabar shows how to identify older
# files via a "find" option:
#

#Files older than 90 days.
find "$HOME" -name '*' -ctime '+90'

#This tiny bash script might help you copy the files over (not well tested!)
####################################
#!/usr/bin/env bash

files=`find "${HOME}" -name '*' -ctime '+90'`
for file in $files
do
  T1=`echo "${file}" 
 sed "s@$HOME@$ARCHIVE_HOME@"`
  T2=`dirname "${T1}"`
  mkdir -p "${T2}"
  cp "${file}" "${T2}"
done
####################################

Finally delete using:
find "$HOME" -name '*' -ctime '+90' 
 xargs rm -f

#
# Ryan Czerwiec explains how to combine the "ls" and "find" commands to
# sort all files within a directory tree using various "ls" options:
#

This calls for something like ls -R, but that can be unwieldy in the way
it formats subdirectory listings.  Start with something like:
ls -dp `find $HOME -name "*"`

This will list directory names as well, which are not necessarily
germane to this search and can have misleading statistics, but you'll be
able to tell which entries are directories and consider them
appropriately.  -d here makes ls list just directory names, not contents
(find will get the contents anyway, so without -d you'd get them all
twice, and also why I'm not using the -exec option of find to run the ls
command), and p flags directory names with a trailing / to help you
identify them, which your OS might already do by color-coding them.  Add
further options to taste: t will sort by last time file was modified, u
will sort by last time file was accessed, S will sort by file size, 1
(that's numeral one, not lower case L) will force single column output
if that's easier for you to see or use, l (lower case L) will give all
the file statistics: age, size, etc. (and single column output is now
the default), and many more on the ls man page.  If using -l (lower case
L) you can tack on:


 grep -v "^d"

to the end of the command to remove directory names from the listing,
since they're not really what you want to get at (p option then becomes
unnecessary).  So, it sounds like you want something like:

ls -dp1u `find $HOME -name "*"`

or

ls -dlu `find $HOME -name "*"` 
 grep -v "^d"

maybe even with a " 
 more" tacked on to each so you see the listing one
screen at a time.

#
# A tip from an editor:
#

The following command will display a list of first-level files and
directories, along with their sizes (in kilobytes), sorted by size:

du -sk * 
 sort -n -k 1

This might help show where your cleaning efforts are best spent.


Q: I am developing a script that reads in a formula from an input file
to be used with a series of values (qp and et).

E.g.
val=qp + 0.000277777 * et

Is there a way to use this formula within my script without writing
my own formula parser?  Here is an example of how I want to use this.

% cat formula
val=qp + 0.000277777 * et

% cat myscript
  ...
for all qp
    for all et
       <execute formula and print val>
  ...

Then the output would be something like:

qp    et    val
0.0   0.0   0.0
0.0   1.0   0.000277777
  ...
1.0   0.0   1.0
1.0   1.0   1.000277777
  ...

The script is currently written in Perl, but I wouldn't be opposed to
using Python.

[[ Answers, Questions, and Tips Graciously Accepted ]]


Current Editors:
Ed Kornkven ARSC HPC Specialist ph: 907-450-8669
Kate Hedstrom ARSC Oceanographic Specialist ph: 907-450-8678
Arctic Region Supercomputing Center
University of Alaska Fairbanks
PO Box 756020
Fairbanks AK 99775-6020
E-mail Subscriptions: Archives:
    Back issues of the ASCII e-mail edition of the ARSC T3D/T3E/HPC Users' Newsletter are available by request. Please contact the editors.
Back to Top