ARSC HPC Users' Newsletter 357, March 9, 2007

Better C++ Compiler Warnings

A few years ago a friend of mine recommended the book "Effective C++" by Scott Meyers. At the time I was doing a whole lot of C++ programming, so I went out and bought the book. It is an excellent resource for anyone that doing object oriented C++ programming, but that's not the point of this story.

These days I don't do a whole lot of C++ programming, but I had a strange sense of deja vu a few weeks ago when I was reading the g++ man page and found the "-Weffc++" command line option. This warning option checks C++ code for several of the style guidelines that Scott describes in his book.

If you are doing object oriented programming, this option warns for several C++ gotchas that other compilers don't catch, such as, failure to overload the copy constructor and assignment operator in classes which have dynamically allocated memory.

Here's a not so useful, misbehaving C++ code with these mistakes:


   #include <iostream>
   #include <vector>

   class myclass
   {
   public:
       myclass(int size=10)
       {
           a_size_=size;
       }
       int a_size() { return a_size_; }
   private:
       int * a;
       int a_size_;
       std::vector<int> c;
   };


   int main()
   {
       myclass a;
       std::cout << a.a_size() << std::endl;

   }

If the standard g++ warnings are enabled, no warnings are produced:


  nelchina 1% g++ -Wall  myclass.cpp -o myclass

However with "-Weffc++", the assignment operator and copy constructor issues with this code are caught along with several other good style violations:


  nelchina 2% g++ -Wall -Weffc++  myclass.cpp -o myclass
  ...
  ...
  myclass.cpp:5: warning: `class myclass' has pointer data members
  myclass.cpp:5: warning:   but does not override `myclass(const myclass&)'
  myclass.cpp:5: warning:   or `operator=(const myclass&)'
  myclass.cpp: In constructor `myclass::myclass(int)':
  myclass.cpp:8: warning: `myclass::a' should be initialized in the
                 member initialization list
  myclass.cpp:8: warning: `myclass::a_size_' should be initialized in
                 the member initialization list
  myclass.cpp:8: warning: `myclass::c' should be initialized in the
                 member initialization list

Be aware that many of the standard C++ header files, such as the vector class, do not follow Scott's good style guidelines. You may use grep to hide the warnings produced by these files, making it easier to spot the warnings which apply to your own code:

E.g.:


  nelchina 3% g++ -Wall -Weffc++  myclass.cpp -o myclass 2>&1 
 grep -v "/usr/lib"

If you're not sure how to implement a copy constructor or assignment operator, check out Scott's book.

"Unbuffering" Standard Output (So You Can See It!)

[ Thanks to Ed Kornkven for this article! ]

Recently I was running a Fortran program ported to our new Sun system, Midnight. The program writes several output files, and sends its diagnostic output to stdout, using PRINT*. Output is sent to stdout in a flurry at program startup, then a line every minute or so thereafter.

I wanted to capture that stdout output to a file and pulled out the trusty Unix utility, "tee." The effect of "tee" is to have one's cake and eat it too, by copying its output to a file but still displaying it on the screen. It is used by piping stdout to it as in


       ./a.out 
 tee stdout.txt

But when I piped stdout to "tee," I found myself with plumbing troubles -- a clogged pipe and no output. The output had been flowing freely without the "tee", i.e.,


       ./a.out
but when the tee was added, not a trickle.

I worked around the problem, and a few days later got time for more experimentation. I discovered that "tee" did, in fact, produce output, but it was obviously buffering it. Instead of displaying the output a line at a time, the output was being displayed a buffer at a time. The buffers are large enough to hold several output lines and it took around ten minutes for the buffer to fill and be dumped to the window. I just hadn't waited long enough to see the output the first time.

Besides the eventual appearance of output, another clue that was that the last line showing in the window was chopped off in mid-sentence. Clearly the output was being written to the window in increments other than whole lines.

But how to fix it? What I needed was something like


       drano ./a.out 
 tee stdout.txt
to get things moving through that pipe. If only things were that easy, right? Well in this case they are -- only the "magic" is called "unbuffer". The unbuffer command uses "expect" to unbuffer redirected program output. Running with

       unbuffer ./a.out 
 tee stdout.txt
works perfectly and my output is displayed line-by-line. "unbuffer" is available on our Linux systems, including Midnight and Nelchina, and also on our AIX systems, iceberg and iceflyer. It doesn't appear on OS X but my Mac doesn't appear to buffer its output the same way anyway.

For our hands-on readers, here is a simple Fortran program that manifests the symptoms I was seeing.


       program unbuf
       integer i
       do i = 1, 200
          print*, 'Output line ', i
          call sleep (1)
       end do
       end program
Compile this, say with the PathScale compiler on Midnight by

       module load PrgEnv      #if you haven't already
       pathf90 -freeform unbuf.f
and then execute it by

       ./a.out 
 tee out

The delay in printing each output line, coupled with the buffering of stdout, will make it appear that the program isn't doing anything. Note that the output to file "out" is also being buffered, so its size appears as zero until the buffer empties. In this case, the output lines are short enough and the buffer is large enough that all 200 lines are buffered until the end of the program, 200 seconds after launching it. Using "unbuffer" however, a line is displayed every second, just as the program logic would lead us to expect.

Experienced programmers may know that the same effect can be accomplished by explicitly flushing the buffers for a given I/O unit, in this case unit 6 which is "Fortran" for stdout:


   program unbuf_flush
   integer i
   do i = 1, 200
       print*, 'Output line ', i
       call flush(6)
       call sleep(1)
   end do
   end program

The call to flush() forces the buffer to be emptied so we will see the output as it is written. Of course this requires access to, and modification of, the source program which may not always be possible or desirable.

Mentor (and Intern) Opportunities for Internship Program

UAF faculty members are invited to submit proposals to mentor summer undergraduate research interns. As part of the ARSC Undergraduate Research Summer Challenge, UAF faculty mentors will work with students to accomplish computational science research goals.

Prospective interns are also invited to apply, but note that your deadline is next Friday.

The ARSC Undergraduate Research Summer Challenge attracts competitive applications from around the US. Interns will be working 40 hours per week on their research projects, under the direction of their mentor.

Visit the program information pages for more background, along with some past project and student profiles:

http://www.arsc.edu/programs/interns/

Intern applications are due March 16. Mentor project applications received before March 20 will have the best chance of being matched to interns.

For more information, or to submit a proposal, send email to, internsreu@arsc.edu or telephone Dr. Greg Newby at x8663 (450-8663).

Quick-Tip Q & A


A:[[ I've just extracted a tar file into my home directory.  It turns
  [[ out the contents of the tar file were not contained within a single
  [[ directory, so now I have a bunch of random files and subdirectories
  [[ scattered all over my home directory amidst other, unrelated files.
  [[ How can I sort the recently-untarred files from the rest and move
  [[ them into a new subdirectory?  Sorting by modification date doesn't
  [[ work since the tar file preserved timestamps.


  #
  # From Jed Brown:
  #

  My first thought after,

    $ tar xf archive.tar    # OOPS!

  is:

    $ mkdir archive
    $ for n in `tar tf archive.tar`; do
      if [ -d $n ]; then
        mkdir archive/$n
      else
        mv $n archive/$n
      fi
    done    # You can put this all on one line, but you must use semicolons.

  Because of the order in which tar displays the contents of an archive,
  directories come first, so in the loop, the directories will be created
  before we try to put files in them.  This might leave some empty
  directories, so we can clean them up with

    $ for n in `tar tf archive.tar`; do
      -d $n && rmdir -p --ignore-fail-on-nonempty $n
    done    # Not elegant, but does the right thing.

  If you had empty directories that the archive unpacked into, they will
  be removed.  This is probably not a problem.  As long as nothing was
  overwritten, you should be back where you started.

  Unpacking an untrusted tar file is a dangerous thing since it can
  overwrite your files---much worse than making a mess of your
  directories.  Better to examine the contents first or make a
  subdirectory to hold its contents.

      
  # 
  # Thanks to Martin Luthi
  # 

  You could use the atime (acess time). In the GNU tools you can see it
  with "ls -u". Now you construct a find command like,

    find . -atime -1 -print      # shows the files accessed the last day
    find . -amin  -5 -print      # shows the files accessed in the last
                               # five minutes

  Then to move the files somewhere either use xargs (if available):

    find . -amin -5 -print 
 xargs mv /my/save/place

  or the -exec option to find:

    find . -amin -5 -print -exec mv '{}' /my/save/place ';'

  [here the quotes are used to protect the special characters {}; from
  the shell]



  #
  # And from Brad Havel:
  #

  I'd go with perl for this one if the original tar file is still available.

  ----- cleanup.pl
  #!/usr/bin/perl

  # Usage: cleanup.pl [original tar archive] [directory to move to from
  current path (1 deep)]

  MAIN:
  {
    # Make the output directory
    mkdir($ARGV[1]);
    # for each file in the archive...
    foreach $filename (sort(`tar tf $ARGV[0]`)) {
      chop($filename);
      ($filename,undef)=split / /,$filename;         # Assures a clean
  file name (file1 -> sym link to file2 output)
      # Trim any leading relative path identifiers
      if (substr($filename,0,2) eq "./") {
  $filename=substr($filename,2,length($filename)-2); }
      print "Moving: $filename\n";
      rename($filename,"./$ARGV[1]/$filename");
    }
  }
  -----

  Technically the rename fails on any file within a subdirectory that
  has already been moved, but we don't want error checking or something
  might be missed during the move.

  And for those who don't want to actually keep a script around (same
  as above sans output messages):

  -----
  perl -e "mkdir(\$ARGV[1]); foreach \$filename (sort(\`tar tf
  \$ARGV[0]\`)) { chop(\$filename); (\$filename,undef)=split /
  /,\$filename; if (substr(\$filename,0,2) eq \"./\") {
  \$filename=substr(\$filename,2,length(\$filename)-2); }
  rename(\$filename,\"./\$ARGV[1]/\$filename\"); }" [TAR_FILE_NAME]
  [DIRECTORY_TO_MOVE_FILES_TO]
  -----

  If the original tar isn't available for comparison I'm not sure how
  you could determine what was new and what was original.



  #
  # And from Alec Bennett:
  #

  % mkdir foo
  % for i in $( tar -tf foo.tar ); do if [ -e $i ]; then mv $i foo/; fi; done




Q: A utility writes some text, most of which is junk.  There's a
pattern repeated here and there within the text.  I want a simple
filter which prints every instance of the pattern.  E.g., from the
following output of "prog," it should only print the 5-digit numbers:

  $ /usr/local/bin/prog
  Attachments: 31613:  (text/plain / 110b), 31614:  (text/plain / 1k),
               31615:  (text/plain / 2b), 31616:  (text/plain / 0b),
               31619:  (multipart/mixed / 0b),
               31620:  (text/plain / 25b),
               31621:  (multipart/mixed / 702b), 31622:  (text/plain / 0b)
  
In my attempt to solve the problem I use "tr" to remove all linefeeds.
Then with perl, I do a global, non-greedy match (".*?") of everything
prior to the interesting pattern and another non-greedy match of
everything after the pattern, and then replace everything matched with
just the interesting part.

This approach works perfectly... except... as you can see, it doesn't
delete the final unwanted text:

  $ /usr/local/bin/prog 
 tr -d '\n' 
 perl -pe "s/.*?(\d+):.*?/\1 /g"
  31613 31614 31615 31616 31619 31620 31621 31622   (text/plain / 0b)

Yes, I know, I could pipe this through another filter to delete the
remaining unwanted text.  But this problem is so darn simple, I'm
frustrated I can't solve it with one regular expression.  Any ideas?

[[ Answers, Questions, and Tips Graciously Accepted ]]


Current Editors:
Ed Kornkven ARSC HPC Specialist ph: 907-450-8669
Kate Hedstrom ARSC Oceanographic Specialist ph: 907-450-8678
Arctic Region Supercomputing Center
University of Alaska Fairbanks
PO Box 756020
Fairbanks AK 99775-6020
E-mail Subscriptions: Archives:
    Back issues of the ASCII e-mail edition of the ARSC T3D/T3E/HPC Users' Newsletter are available by request. Please contact the editors.
Back to Top