ARSC HPC Users' Newsletter 362, May 18, 2007

Using Google Tools for Research Project Management


[ by Jessie Cherry, ARSC/IARC post-doc ]

A few months ago a friend of mine mentioned that he was using a blog to keep track of some of his permafrost research and share it with a collaborator. Pshaw! I thought. I was already an active blogger, but using a blog for each individual project seemed unnecessary. Surely these file folders on my desk were doing the trick? I had become accustomed to emailing around MS or Open Office documents after using "Track Changes" and adding comments. Or there was my graduate advisor who insisted on faxing his chicken scratches from distant hotels.

Well, I'm not sure what pulled me in--maybe the day the file folders all tipped over and spilled my coffee--but there's no looking back. I've been collaborating on a book chapter of late, and a combination of Blogger, Googlegroups, Googledocs, YouTube, and Googlereader have been really handy for this sort of project. Here I will outline how to set up a blog on Blogger and use Googlegroups and Googledocs for management of a research project.

In the event that others might have useful tips to share, I've actually created a blog about blogging for this purpose. This is called "The Research Project Management Metablog" and can be found here: http://scimetablog.blogspot.com To be even more Escher-esque, I will post this article on the RPM Metablog and will use the blog as an example in this article.

Step 1 : Determine who are the likely users of your blog. Are you co-authoring a project with a group of people who are disorganized and forgetful? Perfect! Are you working on a manuscript with someone who can't figure out how to use email? Not so good! You want to make sure that your collaborators are willing and able to use the forum you are going to establish. They may resist at first, but oral statements like "say it to the blog 'cuz this ear ain't listenin" may help. Finally, the primary users are going to have to accept a certain level of Googleuniverse in their life. That is, to use these particular tools collaborators have to be willing to use a Gmail account. As Google mail and blogs are data-mined, the blog or doc may not be a good place for highly secure information. If people have experience with other blog applications in conjunction with online document editing, etc. by all means share (on the Metablog, of course).

Step 2 : You'll need a Gmail account. If you don't already have one, go to http://mail.google.com/mail . Luckily, you no longer have to ask one of the cool kids to "invite" you to Gmail, which is good for those of us who were always picked last for the school dodgeball team and hate that sort of thing. When you open up your Gmail account, you'll see a listing of services in the upper left corner, including Groups, Documents, and Reader. This will all be helpful, as might other services like Photos. I suggest you set up a Googlegroup with all of the people who are going to use the blog. Blogger is able to notify a single address of changes to the blog. You could make this address yourprojectgroupname@googlegroups.com.

Step 3 : If you don't have one already, set up a Blogger account with your Gmail account information at blogger.com. Now add a blog for the project you want to manage. In the settings for this blog, decide whether you want to make it public or private, directory-listed or not, published as a RSS feed, etc. Set read/write permissions for contributors, if you want to restrict it to specific people. Finally, put in the Googlegroup address you created if you want to notify these folks of blog postings via email. Alternatively, one can use an RSS reader (Googlereader or another) to track the feed.

Step 4 : Go crazy with your blog. Use the Customize link or the Template tab to edit your page elements. For example, you might want a variety of links on one side of the page organized under different headings. Blogger has simple point and click editing or you can write html if you prefer. I've posted a number of sample page elements on the RPM Metablog.

Step 5 : Create a Googledoc from your Gmail account. This is similar to MS Office and Open Office word processors, except the document is modified and resides online. The application automatically tracks changes, though the method of version control is somewhat different than in other word processors. The user can upload images and do some formatting, though options are somewhat limited. There is also spellcheck and other standard features. Collaborators need to be invited and read/write permissions can be set. You can publish to the web, so that non-collaborators can see the document. I've done this with a sample document on the Metablog. Finally, the Googledoc can also be set up as a feed and read via your favorite RSS reader. You can also work on a spreadsheet in Googledocs and export/import word processed documents or spreadsheets from a variety of formats.

Here I've outlined some basic steps to getting you started using some Google tools to help manage a research project. I've also mentioned the example blog I created at http://scimetablog.blogspot.com . None of these tools are perfect...for example I haven't found a great way to deal with .PDFs other than post them on a server and link to them on the blog. The same goes for files of code or data. But there are also a variety of tools and features that I haven't mentioned that may be useful. I would love to hear about other tools and tips; feel free to post them to the blog!

Installing programs and libraries locally on ARSC computers


[ by Anton Kulchitsky ]

Many of us need software that is not installed by default on a system. It could be a specific library or a CVS version of software with some new features that you need. It may be preferrable to install such software locally if the software is very specific to your needs and might not be interesting for other users. However, if the software package may be useful for other users, please complete a software request instead.

This short article describes how you might want to organize your local directory, install software and libraries from the source and use CVS versions of your favorite software. The focus for this article is on software that uses the autoconf build system (i.e. configure, make, make install).

1. Preparation step #1. Organize your directories. If you are going to install new software to your home directory, your quota could be a limiting factor. If it is an issue, you can request an increase in quota through the ARSC Help Desk. First create a directory for installs:


   mkdir $HOME/usr
and a download directory:

   mkdir $HOME/download

2. Preparation step #2. Environment variables To be able to run your programs just like all system installed ones you need your different path variables to be set to the proper directory. For our example we have:


   export LOCAL_USR=${HOME}/usr
   export LD_LIBRARY_PATH=${LOCAL_USR}/lib:${LD_LIBRARY_PATH}
   export LIBRARY_PATH=${LOCAL_USR}/lib:${LIBRARY_PATH}
   export C_INCLUDE_PATH=${LOCAL_USR}/include:${C_INCLUDE_PATH}
   export PATH=${LOCAL_USR}/bin:${PATH}
You need to put these into your .bashrc file if you use bash shell. For other shells please adjust this as appropriate. NOTE: If you are using gcc, LIBRARY_PATH is where the compiler will look for libraries. C_INCLUDE_PATH is where the GNU C compiler will look for files to include. If the compiler you are using doesn't support these options, you can specify these directories with -L and -I options. See your compiler documentation for more details.

3. Installing programs/libraries from tarballs. Suppose you need to install Common Lisp clisp from source (clist is a very portable and powerful free ANSI Common Lisp compiler/interpreter) In this case you need to go online to     http://sourceforge.net/projects/clisp/ and download tarball into your $HOME/download directory (tarball can be a file "clisp-2.41.tar.bz2"). Untar the source files with


   tar jxvf clisp-2.41.tar.bz2
or

   tar zxvf clisp-2.41.tar.gz
Enter the new directory with clisp. Read the INSTALL file for installation instructions. However, usually you just need to run

   ./configure --prefix=$HOME/usr
Please, notice the --prefix command. This is a key option for your installation, and will define where the package will be installed. In the case of clisp you definitely would like to use --with-readline and some other options. If you are not sure what options are available, run:

   ./configure --help
and read carefully. Then, the rest of the dance:

   make
   make install
Assuming everything worked correctly, you are done! Just type

   clisp
now you still need to verify it works as expected. Installing a library is almost the same process. However, you should not forget to link to your installed library explicitly. Suppose you installed a library parse_conf and use gcc for your program my_program. Then, you would need to link to parse_conf like this:

   gcc -lparse_conf my_program.o -o my_program
You also might need to use -L option to specify the location of library (for example -L$HOME/lib).

4. Installing programs/libraries from CVS. Suppose you need a brand new CVS version of gnuplot. In this case change directory to $HOME/download and type


   CVS_RSH=ssh cvs -d:pserver:anonymous@gnuplot.cvs.sourceforge.net:/cvsroot/gnuplot login
press Enter twice and then

   CVS_RSH=ssh cvs -z3 -d:pserver:anonymous@gnuplot.cvs.sourceforge.net:/cvsroot/gnuplot co -P gnuplot
and press Enter again. You should get the source code. Enter the gnuplot directory and read the INSTALL file for instructions. In the case of gnuplot you might need to type

   ./configure --prefix=$HOME/usr --with-readline
or something similar if you need some additional configuration options. Try

   ./configure --help
for more options. After configuration is finished, you need only to run

   make
   make install
If everything was configured correctly, gnuplot should be installed now!

5. Cleaning download directory You may want to delete packages from your download directory if you no longer need them.

ARSC Summer Tours start June 6th

ARSC conducts public tours in our Discovery Lab every summer. This is a great introduction to the center. No reservations are required, but if you've got a large group, please contact info@arsc.edu first:

ARSC Summer Tours, 2007:     * June 6-August 29: Wednesdays, 1 p.m. (Except on July 4th) For more info and directions to the Discovery Lab, see:      http://www.arsc.edu/news/archive/summer_tours.html

Quick-Tip Q & A



A: [[ It's an easy Unix task to create the union of two simple lists:
   [[   cat old.txt additions.txt 
 sort 
 uniq 
   [[ But how can I create the difference?  I.e., I want to expunge 
   [[ every line from "old.txt" which has a duplicate in 
   [[ "additions.txt".


#
# Thanks to Oralee Nudson and Alec Bennett for this solution using
# uniq -u.
# 

cat old.txt additions.txt 
 sort 
 uniq -u > diff.txt

This grabs only the lines which do not appear in both.  The only
pitfall may be that if lines exist in additions.txt that do not exist
in old.txt, both will be picked up.


#
# Thanks to Chris Fallen for this clever solution which handles the
# pitfall in the solution listed above.
#
To find the set-difference "O - A" of the sets "O" and "A" whose
elements correspond to lines of the text files "old.txt" and
"additions.txt", duplicate the set to be subtracted and use the -u
option of GNU uniq to print only the unique lines:

cat old.txt additions.txt additions.txt 
 sort 
 uniq -u

prints only the lines in old.txt that do not occur in additions.txt.


#
# Thanks to Martin Luthi for this python solution. 
#
If you have Python 2.4 or 2.5 available, the built-in set type does
all you need: 

====
#!/usr/bin/env python

file1 = set([line for line in file('old.txt')])
file2 = set([line for line in file('additions.txt')])

# how big are the intersection, difference, union?
print 'Intersection: ', len(file1.intersection(file2))
print 'Difference:   ', len(file1.difference(file2))
print 'Union:        ', len(file1.union(file2))

# remove all the elements of file2 from file1
file1.difference_update(file2)

# write out the result (caution, the order changed)
file('diffmess', 'w').writelines(file1)


#
# Chris Young's solution uses grep and is the only solution which
# keeps the original list in order.
#
I think that your requirement can be handled with grep.

-x for matching an entire line
-v for Outputing only those values not matched
-F for Interpretting the patterns as simply strings, no regex
-f trash_em.txt for matching against a file with one pattern on each line

So

grep -xvFf additions.txt old.txt


#
# Thanks to Ryan Czerwiec for two separate solutions using comm and
# diff.
# 
If you have the "comm" command on your system, it is the simplest
answer.

By default it acts like this:
   comm file1 file2

It will produce 3 columns: 
 1) lines that appear only in file1 
 2) lines that appear only in file2 
 3) lines that appear in both.

You can add -1 -2 or -3 to shut off each column, so you want:
   comm -2 -3 old.txt trash_em.txt 
 uniq

This will require an extra step beforehand, though, because comm
requires that the two files already be sorted.  You can always create
a single line with multiple commands, from the command line or as
an alias:
   sort old.txt > file1 ; sort trash_em.txt > file2 ; \
   comm -2 -3 file1 file2 
 uniq ; rm file1 file2

If you don't have comm, some old standby commands will work, with file1
and file2 already sorted as before:
   diff file1 file2 
 grep "^ *<" 
 cut -f2- -d' ' 
 uniq

In both methods, the order of the files is important; list the
trash file second, or you'll have to change a few other things in
the command.  If you want to use old.txt and trash_em.txt as mutual
trash files, i.e.  get lines that appear in one but not both, the
diff method is actually easier: just replace < with [<>].  For comm,
you'd have to execute 3 steps (get union with cat, intersection with
comm, difference with comm).



Q:  What's your favorite shell alias or shell one-liner.


  [[ Answers, Questions, and Tips Graciously Accepted ]]



++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Editors:
--------
   Tom Baring, ARSC HPC Specialist, baring@arsc.edu, 907-450-8619
   Don Bahls, ARSC HPC Specialist, bahls@arsc.edu, 907-450-8674

Subscription Information:
-------------------------
   Subscribing: send this message to: "majordomo@arsc.edu":
     subscribe hpc_users
     end
   Unsubscribing: send this:
     unsubscribe hpc_users
     end
   For help with majordomo, send this:
     help
     end
   In all cases, leave the "subject" line of your message blank.

   Messages sent to "owner-hpc_users@arsc.edu" will be forwarded to 
   the editors.  Let us know if you have problems with majordomo.

Back Issues are Available:
--------------------------
   - Web edition:   http://www.arsc.edu/support/news/HPCnews.shtml
   - E-mail edition archive:
                    ftp://ftp.arsc.edu/pub/publications/newsletters/

-----------------------------------------------------------------------
Arctic Region Supercomputing Center          ARSC HPC Users' Newsletter
-----------------------------------------------------------------------

[[ Answers, Questions, and Tips Graciously Accepted ]]


Current Editors:
Ed Kornkven ARSC HPC Specialist ph: 907-450-8669
Kate Hedstrom ARSC Oceanographic Specialist ph: 907-450-8678
Arctic Region Supercomputing Center
University of Alaska Fairbanks
PO Box 756020
Fairbanks AK 99775-6020
E-mail Subscriptions: Archives:
    Back issues of the ASCII e-mail edition of the ARSC T3D/T3E/HPC Users' Newsletter are available by request. Please contact the editors.
Back to Top