ARSC HPC Users' Newsletter 420, July 29, 2011

Update from ARSC’s New Director

[by Greg Newby]

Over the past months, ARSC has restructured and refocused. This was a result of the ending of our long-time status within the Department of Defense High Performance Computing Modernization Program (HPCMP). As of June 1, ARSC no longer operates a Defense Supercomputing Resource Center (DSRC). Although we remain involved in the HPCMP, it is in a secondary role, as subcontractors to the maintainers of several other DSRCs.

There is some very good news that has some out of this transition, however: our focus on academic high performance computing (and related infrastructure, services, and science) has become tighter, and the University of Alaska Fairbanks has taken a much more active fiscal and oversight role in ARSC’s future.

Our "pacman" supercomputer, "bigdipper" storage, and many other facilities remain the same, and we are providing the same capable support that we always have had. In the near future, we anticipate growing the ARSC user base, as we work with UAF partners on new initiatives including a campus storage cloud, a data portal for federated data sets, and an institutional repository.

It is a bittersweet time at ARSC, because this rebuilding and refocusing was accompanied by significant downsizing. We have reduced from 46 staff and faculty in late 2010 to 28 today. The majority of ARSC personnel are funded, at least partially, through a variety of grants and contracts. During upcoming months and years, we will continue to reset staffing levels, funding sources, and roles at UAF and beyond.

Throughout the downsizing process, it remained clear that ARSC has become a valuable component in the HPC community of users at UAF and beyond. We will continue to play that role, and to seek new opportunities to grow our user base, expand services, and continually upgrade our technologies. All ARSC users can expect to receive the same outstanding support as always, along with a renewed focus on building and expanding our academic partnerships.

GPU programming for dilettantes: Matlab one-liners with GPUmat

[Part 1 of 2 by Chris Fallen]

Graphics processing units (GPUs) can be found almost everywhere lately: in workstations, in laptops, in HPC systems, and soon, even in cellphones. Massively-parallel GPU architectures, once used almost exclusively to render imaginary enemies to shoot at real gamers in 3D virtual dungeons, are now also used by real researchers to perform billions of calculations on imaginary numbers using less time and fewer carbon credits than streaming the latest funny cat video from YouTube (approximately).

However, a non-specialist who wants to use GPU hardware to accelerate the calculation of, say, millions of discrete Fourier Transforms (DFTs) generally must first learn GPU compute kernels and memory spaces through the CUDA or OpenCL application programming interfaces (APIs) for C/C++/Fortran. Many non-specialists find this thought to be nearly as distressing as whatever idea requires so many DFTs in the first place.

Fortunately for those non-specialists, a GPU programming model nearly as simple to imagine as an abstract box (GPU) attached to another abstract box (CPU) exists. And, of course, the API is written for Matlab. GPUmat (http://gp-you.org/), a GPU toolbox for Matlab, is a freeware API for accelerating (or potentially decelerating) Matlab programs with a CUDA-capable GPU. The key idea in using this "two box programming model" to accelerate a particular algorithm is that the time required to send data between the CPU and GPU boxes can be significant and hence should be minimized. In this two-part article, I will demonstrate how to use GPUmat through simple one-line Matlab benchmarks.

After installing the CUDA library and GPUmat on my workstation TAU, I initialize the GPU interface from the Matlab workspace by executing the GPUstart function. If the GPU initializes successfully, GPUstart will output a few details about my system and ask which of my NVIDIA GeForce GTX 480 accelerators I would like to use for accelerated general-purpose calculations. (The other GPU is presumably free to play Minecraft.)


    >> GPUstart
    Copyright gp-you.org. GPUmat is distributed as Freeware.
    By using GPUmat, you accept all the terms and conditions
    specified in the license.txt file.

    Please send any suggestion or bug report to gp-you@gp-you.org.

    Starting GPU
    There are 2 devices supporting CUDA
    CUDA Driver Version:                           3.20
    CUDA Runtime Version:                          3.10

    Device 0: "GeForce GTX 480"
      CUDA Capability Major revision number:         2
      CUDA Capability Minor revision number:         0
      Total amount of global memory:                 1609760768 bytes

    Device 1: "GeForce GTX 480"
      CUDA Capability Major revision number:         2
      CUDA Capability Minor revision number:         0
      Total amount of global memory:                 1610285056 bytes
      - Your system has multiple GPUs installed
        -> Please specify the GPU device number to use [0-1]: 0
      - CUDA compute capability 2.0
    ...done
    - Loading module EXAMPLES_CODEOPT
    - Loading module EXAMPLES_NUMERICS
      -> numerics20.cubin
    - Loading module NUMERICS
      -> numerics20.cubin
    >> 

The simplicity of the GPUmat interface is apparent when porting a single line of Matlab code that calculates the two-dimensional DFT of a simple 4x4 matrix to use the overloaded GPU-enabled fft2 function.

    >> c = fft2(ones(4))

    c =

        16     0     0     0
         0     0     0     0
         0     0     0     0
         0     0     0     0

    >> g = fft2(GPUsingle(ones(4)))

    ans =

        16     0     0     0
         0     0     0     0
         0     0     0     0
         0     0     0     0

    Single precision COMPLEX GPU type.

Interpreted with the two-box programming model, the compound GPU-enabled command above creates a matrix of ones in the "CPU box" with the ones command, sends it to the GPU as a GPUsingle data type, calculates and stores the DFT inside the GPU box, and finally sends the results back to the CPU box where it can be displayed to the screen. The value of g remains stored on the GPU, as can be verified with Matlab’s class command.

    >> class(g)

    ans =

    GPUsingle

Explicitly copying the result from the GPU memory to the CPU memory is accomplished with a cast.

    >> cg = single(g);
    >> class(cg)

    ans =

    single

We can create a simple benchmark to verify the GPU is in fact accelerating the code by calculating a much larger DFT, repeatedly, and using the timing functions tic and toc:

    >> tic, for k=1:1000, c = fft2(single(ones(2048))); end, toc
    Elapsed time is 118.726524 seconds.

    >> tic, for k=1:1000, g = single(fft2(GPUsingle(ones(2048)))); end, toc
    Elapsed time is 85.896469 seconds.

The GPU speedup is modest but note that the CPU calculation is multithreaded and uses all four cores of TAU’s AMD Opteron processor. GPU performance can be further improved by creating the matrix of ones on the GPU directly, eliminating the data transfer from the CPU to the GPU.

    >> tic, for k=1:1000, g = single(fft2(ones(2048,GPUsingle))); end, toc
    Elapsed time is 37.139037 seconds.

Filling matrices directly on the GPU according to a mathematical formula is often well-suited to the GPU architecture and it also eliminates the data transfer bottleneck from the CPU to the GPU.

In many real world applications, however, input data is read from disk or from some external instrument and must be explicitly copied to the GPU. One of my tasks is to process ionosphere radar data where the input data is stored in single-precision complex-valued matrices: I need to calculate repeated one-dimensional DFTs along the columns of many input matrices. The computational load can be simulated for benchmarking purposes on the CPU and GPU by applying the fft function to random complex matrices, being careful to explicitly transfer the matrix to the GPU rather than letting the GPU generate the random numbers directly.

    >> tic, for k=1:1000, c = fft(single(randn(1024,5000)*(i+1))); end, toc
    Elapsed time is 268.524532 seconds.
    >> tic, for k=1:1000, g = fft(GPUsingle(randn(1024,5000)*(i+1))); end, toc
    Elapsed time is 294.375797 seconds.

In this case, using the GPU instead of the CPU actually increases processing time! One significant difference from the previous benchmark, informally, is that the one-dimensional DFT requires fewer floating-point operations than the two-dimensional DFT for the same amount of data, and is thus less likely to benefit from GPU acceleration, and may even suffer from GPU deceleration!

Does this mean the GPU-enabled one-dimensional DFT calculation is useless? Not necessarily. Often a DFT is just one step in a multi-step algorithm and the performance of the GPU must be evaluated on the entire algorithm rather than as the sum of its constituent parts, given the significant data transfer overhead between the GPU and CPU. In the second part of this article, I will demonstrate this through additional one-line Matlab examples and discuss the use of GPUmat for prototyping stand-alone GPU-accelerated applications.

Random Thoughts from an Editor

1. I gave a talk last month about our fish model and tried to take advantage of the advice from Presentation Zen Design by Garr Reynolds. It’s all about making clean slides, without too many words. Ditch the corporate logos if you can. In terms of design, it’s the same points I learned in a 2-D design art class, but I hadn’t thought about PowerPoint slides in that context.

2. There are also some good tips in Scientifically Speaking, available at http://www.tos.org/pdfs/sci_speaking.pdf, which I recently unearthed while trying to downsize the junk in my cubicle prior to moving upstairs. Notably, there’s a fabulous before and after comparison of a poster - yes, I’m presenting a poster later this month, also on the fish model.

3. On poster creation, we’ve lost our staff who used to help out with such things. In the new ARSC, we have to fend for ourselves. I neither have a license for the old software, nor do I know how to use it. So I went online searching for tips on poster creation. The nature of the web is that you can usually find the opinion you were hoping for. I can’t now find it, but I stumbled on a European fellow who highly recommends inkscape over PowerPoint for making posters. Score! Check out http://inkscape.org/. For tips on using it, try http://screencasters.heathenx.org/.

4. Some lessons from the fish model:

  • This model coupling is tricky business, with people having spent millions on model coupling frameworks. We’re just hardcoding the fish to be part of the ROMS ocean model.
  • Getting funding for this stuff is tricky since it doesn’t quite fit the mold in review panels for either biology or oceanography.
  • Working with diverse people requires care and generous attributions - lots of co-authors on presentations, etc.
  • There have been surprises. I was brought in for my knowledge of the ROMS model, but a data structures class I sat in on has been extremely helpful.
  • Another surprise is that the zooplankton in one class of ecosystem models doesn’t really match the zooplankton needed by the fish model. There are Eulerian "ecosystem" models with nitrate, phytoplankton, and zooplankton, in which the zooplankton are mostly there to limit the phytoplankton in realistic ways - and all the validation is done on the phytoplankton fields. The Lagrangian fish model requires a zooplankton field as food. Somehow the one doesn’t generate enough for the other and the fish all starve to death.

Online Integrated Field Guide to Alaska

Just in time for the annual flood of summer tourists to the Last Frontier, a new website connecting Alaskan field scientists with those curious about Arctic discoveries is now available online at http://frontierscientists.com/. Frontier Scientists shares first-person accounts and real time insights from leading archeologists, grizzly bear biologists, volcano researchers, climate change specialists and other scientists.

"We want to let travelers, teachers, students, aspiring scientists, and anyone else interested in science feel as if they are along when our team is tracking a grizzly or documenting how climate change is disrupting Alaskan ways of life," explains Greg Newby, Director of the Arctic Region Supercomputing Center at the University of Alaska Fairbanks. "Visitors to Frontier Scientists can ask their own questions to our scientists directly, follow some of them on Twitter and Facebook, and converse on their blogs."

Newby has received funding from the National Science Foundation EAGER (EArly-concept Grants for Exploratory Research) program to support "small-scale, exploratory, high-risk, potentially transformative, time-critical research in the fields of science, engineering and education." The NSF-funded project, "DIG Frontier Scientists" (DIG-Digitally Integrated Guide) will "widen the general public’s interaction with the cultural and natural environment by allowing them to access Web sites and load their handheld mobile devices with engaging descriptions of research," according to Newby.

The one-year, experimental DIG project also includes funding for one graduate and two undergraduate students at UAF to assist in the delivery of digital videos to computers and mobile devices.

Co-investigator on the project with Newby is Elizabeth O’Connell, producer, editor and project manager for WonderVisions. O’Connell is a University of Alaska broadcast media graduate. Her award winning programs include the Science of Whales, which aired nationally on the Discovery Channel.

According to O’Connell, fascinating footage of current scientific discoveries in some of the Arctic’s most remote and dramatic landscapes are chronicled in short vodcasts, Twitter feeds, blogs and web reports on Frontier Scientists. The research is organized into categories:

  • Grizzlies
  • Petroglyphs
  • Cook Inlet Volcanoes
  • Alutiiq Weavers
  • Climate Change Watch

Frontier Scientists also provides resources and tips on things to do in Alaska for those considering travel to the Last Frontier state, whether on an Alaska cruise or a backpacking expedition.

Quick-Tip Q & A

A:[[ I’m setting up a new ocean simulation and we have an idea for how to
  [[ handle land-fast ice. It requires reading in the land-fast ice from some
  [[ observations. My problem is that the model grid is in lat/lon and the
  [[ model reads netCDF files while the observations are in various GIS
  [[ formats. Available formats are GeoTiff, ESRI shapefiles, and Microsoft
  [[ Access Database. Any advice on dealing with these?

#
# We received this tip from reader Jed Brown:
#

http://gdal.org/

#
# Elaborating quite a bit:
#

This points to a web site for the Geospatial Data Abstraction Library (GDAL), which can process the GeoTiff files. It also contains a related OGR library for processing the ESRI shapefiles. This gives us choices to make, including which language to access the library from. I chose to go with the GeoTiff files and with the Python language.
As such, I found this site to be quite helpful:

http://www.gis.usu.edu/~chrisg/python/

as was this one:

http://svn.osgeo.org/gdal/trunk/gdal/swig/python/samples/

The files I started with have the fast ice as a 1, free ice as a 0, and only cover the months with ice. I started by copying the July file to August, then transforming any 1 to a 0. I then had the basis for creating a monthly climatology, using the ice-free month for September as well.
In python, I imported all the tools for dealing with our ocean model files as well as the gdal library. I read the map transform from a GeoTiff file and created the inverse map transform. Using these map transform parameters, I was able to map each of my lat,lon points onto the same mapping used in the GeoTiff file, then query each point to see what value it has in the GeoTiff image there. Finally, write out the stuck ice (I want zero) and the free ice (one) into a NetCDF file on my grid.
With this sort of job, the devil is in the details and you want to check things at each step of the way.


Q: I have some Fortran 90 code I want to call from inside Python. I have a setup.py file that can be used to build it, but I need to invoke it with "python setup.py build --fcompiler=gnu95". Is there any way to put the fcompiler option into the setup.py file, at least as a default?

[[ Answers, Questions, and Tips Graciously Accepted ]]


Current Editors:

  • Ed Kornkven ARSC HPC Specialist ph: 907-450-8669
  • Kate Hedstrom ARSC Oceanographic Specialist ph: 907-450-8678

E-mail Subscriptions:

Archives:

    Back issues of the ASCII e-mail edition of the ARSC T3D/T3E/HPC Users' Newsletter are available by request. Please contact the editors.
Back to Top