ARSC HPC Users' Newsletter Issue 434 2014-07-16

ARSC HPC Users' Newsletter Issue 434 2014-07-16

A publication of the Arctic Region Supercomputing Center.

1 Guest Lecture: Julia For High Performance Computing

ARSC will be hosting a guest lecture with Alan Edelman on August 7, from 1pm to 2pm in the GI Globe Room. Edelman is Professor of Applied Mathematics with the MIT Computer Science and Artificial Intelligence Laboratories. Edelman will be discussing the Julia programming language for High Performance Computing.

Julia is a high-level, high performance dynamic programming language for technical computing. It provides a sophisticated compiler, distributed parallel execution, numerical accuracy, and an extensive mathematical function library. The standard library also integrates mature, best of breed, C and Fortran libraries for linear algebra, random number generation, signal processing, and regular expression matching.

Registration is not required for this event.

2 Registration Opens For SC14

Attendance registration for SC14 opens next Wednesday, July 23. The SC14 hotel reservation system is already open, with penalty-free reservations possible until October 17.

The submission deadline for technical papers has already passed, but SC Technical Program submissions, such as poster submissions, are still possible until July 31.

3 Transfer Queue Now Available On Fish

A "transfer" queue has now been made available on our Cray XK6m-200 supercomputer, The transfer queue allows for automated pre- and post-processing to and from the $ARCHIVE file system. Use of this transfer queue is similar to use of the transfer queue on the pacman system:

4 ROMS: More Debugging Fun

Katherine Hedstrom, Oceanographer, UAF Institute of Marine Science

(The following article is also available from the ROMS/TOMS Developers blog.)

This time the question came up about perfect restarts. In ROMS, the option PERFECT_RESTART promises that a run from beginning to end all in one go should match one made with a restart in the middle. With this option, more fields need to be written to the restart field, but were we saving enough? I had done this exercise before for the sea ice model, but not with WET_DRY as well.

The way to test for perfect restarts is to run for some number of time-steps X, save a restart file, run for one more step and save a history file. Then in another directory, copy the first restart file and use it to start at time-step X and run for one step, saving a history file. Now compare the history files with ncdiff and ncview.

What value of X to use? I didn’t try X=1 because ROMS doesn’t get into its default time-step until time-step 3. I therefore started with X=3 and found lots of differences between the two history files.

Time to go to a debugger - actually, dueling debuggers. One debugger starts from time 0, the other from step X. To do this, you don’t want to try a large X because then you’d be running all those steps in the debugger for the first debugger window. I found that for X=3, the u and v fields matched perfectly at step X, but the value of nrhs differed. nrhs is used in the computation of W, the vertical velocity. With the vertical velocities mismatched, everything evolved differently.

Rather than fix this, I decided to try again with an even value for X, say X=2. After all, I usually save after an even number of time-steps and I just wanted to see if something else was off. Indeed, the differences were more localized, but there still were differences, especially along the shallow parts where WET_DRY would be involved.

After much digging around, I found two things going on. The first is that the wet-dry masks are computed during initialization and also in calls to wetdry.F from step2d. One needs to make sure that on restart, one either needs to have saved all those masks or to recompute them consistently.

The other source of trouble comes from how ROMS writes out fields to the NetCDF files. Most fields are written with masked areas getting the _FillValue. For WET_DRY, many are being saved with rmask_io, which masks out the dry cells. This can cause a loss of information of the state of the dry cells. For instance, tracer values in the dry cells are time-stepped in ROMS, including advection terms if flow happens to be entering a dry cell. If these dry cell tracers are read from masked out values in the restart file, they will be set to zero.

In the end, I didn’t have to save any more fields, but simply to update the mask initialization and to change the masks used in the saving of restart fields. I also found that I had a lingering ice restart bug, also fixed by the writing of restart fields. You might find it to be interesting, so I’ll try to describe it:

  • The ice is time-stepped before the call to output because it contributes to the computation of surface fluxes.
  • For PERFECT_RESTART, I wrote a routine to overwrite surface fluxes from the saved ones in the restart file, skipping the call to seaice on the first restart step.
  • I found a mismatch at one lone point next to a land mask (not near the WET_DRY cells though).
  • The mismatch stemmed from the computation of the vertical mixing (GLS_MIXING) near the surface.
  • Near-surface mixing gets a contribution from the surface stresses.
  • The stresses are averaged from velocity points to rho points.
  • On restart, the stresses were set to zero for the velocity points at the land-sea mask boundary.
  • Turning off the masking while writing the stresses fixed this restart issue.

5 Networking Connectivity Problems: Information Gathering

The Internet is a dangerous place, if you are one of the trillions of tiny data packets racing across cables, switches and routers. Between complicated routing paths, discriminating firewalls, and aging network equipment, it is a real jungle out there. Beyond this, stories are told among network administrators of a race of tiny, mischievous creatures known as "server gremlins", who sneak about in the darkness, unplugging network cables and tweaking router configurations without proper documentation.

Consequently, we at the ARSC Help Desk are not surprised to occasionally receive support requests from users who are unable to access to a remote network resource, such as a Web page or a file server. Typically, the user is working from an ARSC system and attempting to access an external resource (such as the Web page for another campus department) or is at her personal computer and attempting to access an ARSC resource.

We are eager to help, but often users do not come prepared with the information we need to track down the problem. It can be difficult for us to gather this information on our own, because the networks involved can be quite complicated, and many of network devices are not under the direct control of our department.

If you experience difficulty connecting to a network resource, you can speed up resolution of your support request by doing some preliminary information gathering on the computer from which you are connecting. To do this, you will need to open up a terminal, enter some commands, and copy the commands and the output they produce into an e-mail to the ARSC Help Desk.

5.1 Preliminaries

For sake of brevity, we will assume you are either working from a Red Hat 6 Linux system, an OS X system, or a Windows 7 system.

For Red Hat 6 systems with the Gnome desktop, a Terminal application is located in the Applications >> System Tools menu. For Mac OS X, look in the Finder for Applications >> Utilities >> Terminal. For Windows 7, select the Start Menu and enter cmd in the search box. Windows 7 users will need to learn how to copy text out of the terminal, which is described in a article.

5.2 Identifying Your System

These commands provide basic information your computer's network location and identification.

Red Hat 6:

ip addr



Windows 7:

ipconfig /all

5.3 Supporting Network Systems

Here we learn about your network name servers and gateway, which are you computer's "sign posts", as it were, to the outside world.

Red Hat 6:

ip route
cat /etc/resolv.conf


netstat -nr
networksetup -getdnsservers Ethernet
networksetup -getdnsservers Wi-Fi

Windows 7

route print

5.4 Following The Network Trail

These commands can give us some notion of the network path between your computer and the remote resource. Replace with the hostname of the network resource you are trying to access. For example, the hostname of would be

Red Hat 6 or OS X:

ping -c 4
ping6 -c 4

Windows 7:


5.5 Can You Connect From Another Location?

If you move to a computer on another network, are you able to connect to the network resource? For example, if you cannot view an ARSC Web page from your office PC, try viewing it from your PC at home. This will help us narrow down the location of the network problem.

If it is inconvenient to wait until you leave the office, online programs are available allowing you to connect to a resource through a public Internet server. For Web pages, use a proxy browser:

For other types of resources, you can ping or traceroute them through the Web portal

5.6 Who Runs The Network?

Which organization or department owns your computer network, or the network you are trying to connect to (outside of ARSC)? Does it have its own IT department or call center? Armed with this knowledge, we can work with network administrators on the other end of the connection, to diagnose and resolve your networking problem as quickly as possible.

6 More Information

6.1 Editor

Christopher Howard

6.2 Credits

Oralee Nudson, ARSC Lead User Consultant. Reviewer and insider source for ARSC news and tips.

6.3 Publication Schedule

The newsletter is usually released on the third Wednesday of each month.

6.5 Archived Newsletters

6.6 Questions, Comments, And Submissions

Do you want to find out what our readers know about a particular subject? Submit a question about HPC or ARSC software, and we will feature it in a Q&A section in the newsletter.

Back to Top