ARSC HPC Users' Newsletter 273, July 25, 2003

One Day X1 Seminar, Next Tuesday, July 29

John Levesque, a senior analyst at Cray Inc., will be on-site all next week. He'll be giving a one-day class on X1 technology, usage, and code optimization:

UAF Campus, Butrovich Building, Room #109 Tuesday, July 29th 9:00am - 4:00pm

ARSC, UAF, and HPCMP researchers are invited to attend.

Speeding up Your FTP Transfers: Part II

[ Thanks to Nathan Bills, ARSC Network Specialist for contributing this 2 part series. ]

Last time we looked at a simple way of increasing ftp throughput. We saw that if Unix system default settings have small buffer sizes for transmit and receive buffers, this translates to small transmit and receive windows, which results in data sent across a wide area network in unfortunately small chunks. This produces dead time between chunks as the sender waits for acknowledgments back from the receiver.

We used this simple formula,


  <window size in bytes>/window * windows/sec = bytes/sec
to determine what throughput rate we could get over a wide-area network where the windows/sec was calculated with,

  1 window / <time it takes, in seconds, to send one window of data>
While this is useful for determining how much information can be transmitted based on how many windows can be sent per second, it does not, as one reader pointed out, take into account the bandwidth limitation of the connection. Thus, with our formula above, we could increase the window size to infinity and get basically:

  <infinite window size in bytes>/window * windows/sec = 
         infinite bytes/sec.
which would be excellent--if we lived in a perfect world. However, we are usually limited by a certain bandwidth like the 100Mbits/sec of a 10/100Base-T connection, or 1Gbits/sec for Gigabit Ethernet, or 155 Mbits/sec for an OC3 ATM link.

How does one include this limitation? As always, our goal is to send as much data as we can as quickly as we can. If we have a 100 Mbits/sec connection, we want to be able to send at that rate, right?. We could try to just shove data across the network at the full 100 Mbits/sec but the link may not be reliable and we might lose some of it. Using TCP to reasonably assure delivery of the data, which many applications like ftp, www, email do, the sender of the data will wait, after it has transferred the data, for the acknowledgment of that first part of the data, before sliding its 'send window' forward to send more data. As we noted last time, this delay in the time it takes for data to be sent and the acknowledgment to come back is the round-trip-time.

We would like to keep the pipe full all the time, but if our TCP window is too small and this round-trip-time is large, then there will be a gap in transmission while the sender waits for the acknowledgment to come back. If we are able to send at the highest data rate the whole time it takes for the initial data to get to the receiver and an acknowledgment to come back, we would get,


  
  bandwidth * round-trip-time = 
      amount of data that can be sent in that round-trip-time
which would keep the pipe full because the sender gets the acknowledgment back from the receiver just at the time it has reached the end of its send window and moves the window forward to send more data.

For example, if it took five seconds for the acknowledgment come back from a receiver to the sender, and the connection is a 100 Mbits/sec connection, we would be able to send,


  100 Mbites/sec * 5 seconds = 500 Mbits or 62.5 Mbytes
of data during that time. Note that this is the 'window' of information that can be sent during the five seconds of delay between first data sent and the first acknowledgment received and is what we would try to set for our window or buffer size on the system. This result is called the bandwidth*delay product (pronounced 'bandwidth-delay product' rather than 'bandwidth-times-delay product' :) )

In kerberos ftp, this would mean we would set the buffer sizes to 62.5 Mbytes:


  ftp> lbufsize 65536000
  ftp> rbufsize 65536000
Five seconds is quite a large delay and it is more common to see a delay between 50-200 milliseconds on the Internet. If we have a 100 Mbit/sec connection and the round trip time is 100 milliseconds, or 0.1 seconds, our bandwidth*delay product would be:

  100 Mbits/sec * 0.1 seconds = 10 Mbits or 1.25 Mbytes
        
and we would set our window size in ftp accordingly:

  ftp> lbufsize 1310720
  ftp> rbufsize 1310720
to try to send data at the full 100 Mbits/sec, or 12.5 Mbytes/sec. If we could keep that rate going we could transfer a 100-Gbyte file in,

  100 Gbytes / (12.5 Mbytes/sec) = 8192 seconds or 2.28 hours.
Not bad, eh? Note that our selection of a 1-Mbyte window the last time was close to this size.

This covers the simple aspects of sending data at or almost the full data rate. There are still a lot of other things that could affect your transfer rates such as the communications links between you and the remote end, the effect of data transmission errors on your data rates, system resource issues at either end, the effect of other people's transfers on yours, etc. But this is a good start at speeding up those ftp transfers.

For further information about increasing performance of data transfers, check out these urls:

http://sd.wareonearth.com/woe/Briefings/tcptune/tsld001.htm http://www.networkcomputing.com/1013/1013ws1.html http://dast.nlanr.net/Projects/FTP.html http://dast.nlanr.net/Guides/GettingStarted/TCP_window_size.html http://moat.nlanr.net/NATimes/NAT.1.2/phil.htm http://www.psc.edu/networking/perf_tune.html

ARSC Advanced Display Environments Workshop

As staff and researchers gain experience with ARSC's new four-walled immersive environment, the Discovery Lab, we continue looking to the future of visualization as an aid for analysis and expression of computational results.

To this end, ARSC is sponsoring an "Advanced Display Environments Workshop," here at UAF. The schedule (still subject to minor changes) is posted below. Sessions are open to UAF, ARSC, and HPCMP researchers. Please contact Jon Genetti (ffjdg@uaf.edu) in advance if you are interested in attending.

--

Advanced Display Environments Workshop Aug 6-8, 2003

Wednesday, Aug 6


  109 Butrovich
    8:15 -  8:45 Coffee
    8:45 -  9:00 Welcome
    9:00 - 10:00 Chandrajit Bajaj, UT-Austin, Curved Powerwall
  10:00 - 10:15 Coffee Break
  10:15 - 11:15 John Clyne, NCAR, Stereo and Collaboration
  11:15 - 12:15 John Moreland, SDSC, High-density Tiled Display
  12:15 -  1:30 Catered Lunch
    1:30 -  2:30 Randy Frank, LLNL, Tera-scale Data on Tiled Displays
    2:30 -  3:30 Claudio Silva, OHSU, Massive Polygonal Rendering
    3:30 -  3:45 Coffee Break
    3:45 -  4:45 Sam Uselton, Consultant, The Future Office
    5:30 -  7:30 Group dinner at The Pump House

Thursday, Aug 7


  Discovery Lab, 375C Rasmusson Library
    8:15 -  8:45 Coffee
    8:45 -  9:30 Christoph Sensen, U Calgary, A Java 3D-Enabled CAVE
    9:30 - 10:15 Greg Johnson, TACC, Depth Perception in Immersive Environments
  10:15 - 10:30 Coffee Break
  10:30 - 11:15 Eric Wernert, U Indiana, Display Needs for Diverse Applications
  11:15 - 12:00 Craig Stewart, U Indiana, Bio Computation and Storage
  12:00 - 1:30  Lunch at Pike's
  
  GI Globe Room, Elvey
    1:30 - 2:00 Panel - Projector/Display Technologies To Watch
                (Genetti, Johnson, Moreland, Uselton)
    2:00 - 2:30 Panel - How important is pixel density? How many pixels
  are enough?
                (Frank, Johnson, Moreland, Wernert)
    2:30 - 3:00 Panel - How important is stereo/immersion?
                (Clyne, Johnson, Sensen, Wernert)
    3:00 - 3:15 Coffee Break
    3:15 - 4:00 Panel - Flat vs. curved vs. cave vs. ???
                (Bajaj, Clyne, Moreland, Uselton)
    4:00 - 4:30 Panel - Image generators and data handling requirements
                (Bajaj, Clyne, Frank, Silva)
    4:30 - 5:00 Panel - Five years from today ...
                (Bajaj, Frank, Silva, Uselton)
    5:30 - 7:00 No host dinner at Alaska Salmon Bake

Friday, Aug 8


  204 Butrovich
    8:30 -  9:00 Coffee
    9:00 - 12:00 Working group develops recommendations and white paper
  12:00 -  1:00 Lunch / Presentation of recommendations
  
  Room TBA
    8:30 - 12:00 Biomedical Birds-of-feather meeting

--

As a reminder, ARSC's regular summer tours are held in the Discovery Lab. Every Wednesday, 1pm, through August. For more info on all summer tours at UAF, see:

http://www.uaf.edu/univrel/Tour/tours.html

For more on the Discovery Lab:

http://www.arsc.edu/news/mdflex.html

Quick-Tip Q & A


A:[[ There are commands I'd like to issue to ftp, whenever I use
  [[ it. For instance, "idle 7200" and the lbufsize/rbufsize settings.
  [[ Can I do this automatically, without typing them at the ftp prompt
  [[ every single time?


  # 
  # First, Rich Griswold contributes this Newsletter's first expect script: 
  #

  You can use an expect script to do this:
  
    #!/usr/local/bin/expect -f
    spawn ftp $argv
    expect {
      "Name" {
        expect_user -re "(.*)\n"
        send "$expect_out(1,string)\r"
        exp_continue
      } "Password:" {
        stty -echo
        expect_user -re "(.*)\n"
        send "$expect_out(1,string)\r"
        stty echo
        exp_continue
      } "ftp>" {
        send "idle 7200\r"
        # Add other commands here...
      }
    }
    interact


  #
  # Thanks to Jeff McAllister:
  #

  The kftp utility can accept a Unix input stream, e.g.,

    "kftp < kftp.script"

  Thus, in many cases, you can completely eliminate interactive FTP
  sessions.  The script file can contain any commands that you would
  type in, separated by newlines ("\n").  You should include the FTP
  command, "prompt," to toggle prompting off.

  To transfer the files 'test.out.*' to $ARCHIVE_HOST you could create a
  wrapper script to manage the entire process. In this example,
  "batch_ftp.ksh" does the following:

  1) creates a temporary file, "kftp.script," containing the commands
     kftp will execute as it reads them from the input stream
  2) executes kftp, taking commands from "kftp.script"
  3) removes "kftp.scrpt"
 

  File: "batch_ftp.ksh":
  ----------------------  
    #!/bin/ksh

    echo "open $ARCHIVE_HOST"  > kftp.script
    echo ""                    >> kftp.script
    echo "lbufsize 1000000"    >> kftp.script
    echo "rbufsize 1000000"    >> kftp.script
    echo "binary"              >> kftp.script
    echo "prompt"              >> kftp.script
    echo "cd $ARCHIVE"         >> kftp.script
    echo "mput test.out.*"     >> kftp.script
    echo "quit"                >> kftp.script
  
    kftp < kftp.script > /dev/null 2&>1
    rm kftp.script
  ----------------------  
  
  This example assumes that the environment variables exist.
  $ARCHIVE_HOST is the name of the remote host. $ARCHIVE is the path to
  the destination directory on the remote host.

  ($ARCHIVE and $ARCHIVE_HOST are part of a common set of environment
  variables, now available on all ARSC systems, designed to standardize
  our storage environment.  We recommend using these environment
  variables to "hide" the details of the specific storage setup on each
  machine.  Thus scripts can be moved between machines with less work.)



  #
  # And finally, the editor's solution...
  #

  FTP's built-in auto-login process will do the trick.

  In your $HOME/.netrc file, specify each machine you ftp with, your
  login, but NO passwords (passwords should never be stored in files, on
  sticky notes, etc., for obvious reasons). Then, for each machine
  separately, use "macdef" to define an "init" macro--and other macros,
  if you wish.

  For example:


  CHILKOOT$  
  CHILKOOT$  cat $HOME/.netrc 
  machine rimegate.arsc.edu login arscfrb 
  macdef init
  idle 7200 
  rbufsize 1000000 
  lbufsize 1000000

  macdef cdt
   cd /scratch/arscfrb 
   pwd 
   ls

  machine klondike.arsc.edu login fred 
  macdef init
  idle 7200 
  rbufsize 1000000 
  lbufsize 1000000

  macdef cdt
   cd /tmp/fred 
   pwd 
   ls

  CHILKOOT$ 
  CHILKOOT$ 

  # Here's a test. Note that "init" is executed automatically, and that
  # other FTP macros are executed with the command "$":

  CHILKOOT$  ftp klondike   
  Connected to klondike.arsc.edu.
  220 klondike FTP server (Version 5.60) ready.
  334 Using authentication type GSSAPI; ADAT must follow
  GSSAPI accepted as authentication type
  GSSAPI authentication succeeded
  232 GSSAPI user fred@ARSC.EDU is authorized as fred
  idle 7200
  200 Maximum IDLE time set to 7200 seconds
  rbufsize 1000000
  200 TCP buffer size set to 1000000 bytes
  lbufsize 1000000
  Set local TCP buffer size to 1000000 bytes
  Remote system type is UNIX.
  Using binary mode to transfer files.
  ftp>  
  ftp>  
  ftp> $
  (macro name) cdt
  cd /tmp/fred
  250 CWD command successful.
  pwd 257 "/tmp/fred" is current directory.
  ls
  200 PORT command successful.
  150 Opening ASCII mode data connection for /bin/ls.
  total 82512
  drwx------   16 staff         4096 Jul 21 10:50 Progs
  drwx------    2 staff           32 Jul 17 11:19 Scripts
  226 Transfer complete.
  ftp>  



Q: Any "vi" experts out there?  I'm editing a text file, each line starts
   with a version number followed by a space and then a word.  E.g.,

...
33 jade
33.8.2 jasper
10 javelin 
7.1 javelina 
22 juniper
...
   
   Can I move the version numbers to the ends of the lines? Like this:

...
jade 33
jasper 33.8.2
javelin 10
javelina 7.1
juniper 22
...

   Thought I was getting good at vi regexp's, but this is a stumper!  If
   it's impossible in vi, maybe there's another way.

[[ Answers, Questions, and Tips Graciously Accepted ]]


Current Editors:
Ed Kornkven ARSC HPC Specialist ph: 907-450-8669
Kate Hedstrom ARSC Oceanographic Specialist ph: 907-450-8678
Arctic Region Supercomputing Center
University of Alaska Fairbanks
PO Box 756020
Fairbanks AK 99775-6020
E-mail Subscriptions: Archives:
    Back issues of the ASCII e-mail edition of the ARSC T3D/T3E/HPC Users' Newsletter are available by request. Please contact the editors.
Back to Top