ARSC HPC Users' Newsletter 415, September 24, 2010

Call for Papers: PCO’11 Workshop

New Trends in Parallel Computing and Optimization

http://spiderman-2.laas.fr/PCO/

To be held in conjunction with IEEE IPDPS, Anchorage, USA, May 16-20, 2011

http://www.ipdps.org

Scope of the Workshop:

The workshop on new trends in Parallel Computing and Optimization aims at providing a forum for scientific researchers and engineers on recent advances in the field of parallel or distributed computing for difficult combinatorial optimization problems, like 0-1 multidimensional knapsack problems and cutting stock problems, large scale linear programming problems, nonlinear optimization problems and global optimization problems. Emphasis will be placed on new techniques for the solution of these difficult problems like cooperative methods for integer programming problems and polynomial optimization methods. Aspects related to Combinatorial Scientific Computing (CSC) will also be treated. Finally, the use of new approaches in parallel computing like GPU or hybrid computing, peer to peer computing and cloud computing will be considered. Application to planning, logistics, manufacturing, finance, telecommunications and computational biology will be considered.

Important Dates:

  • Paper submission deadline: October 15, 2010
  • Notification of acceptance: December 22, 2010
  • Camera ready papers due: February 1, 2011
  • Workshop: May 20, 2011
  • Chair: Dr Didier El Baz, LAAS-CNRS, France, elbaz@laas.fr

Community Earth System Model Tutorial Materials

This past summer NCAR ran a tutorial for CESM for 72 students. The tutorial materials, including PDFs of lectures and practical exercises as well as video of the presentations are now available to the CESM community.

The materials are now available at http://www.cesm.ucar.edu/events/tutorials/ and a Bulletin Board page exists for questions on the tutorial materials at http://bb.cgd.ucar.edu .

The tutorial registration asks for an email address so the providers can get metrics on who is using the tutorial, and advise on updates.

Debugging Python Code Which Uses MPI

[ By Don Bahls and Alec Bennett ]

Recently we have been assisting a user get a Python-based code running on Midnight. This code provides a Python interface to the MPI for interprocess communication. When the sendreceive routine within the Python MPI layer was called, the code would seg fault. Normally it’s pretty simple to debug Python because a stack trace is typically displayed when a program crashes. Unfortunately the error was occurring within a native C call so we weren’t getting any feedback from the Python interpreter. We attempted to use several different MPI routines and everything worked except for the sendreceive routine, so we were fairly confident this was an issue with that routine.

While we did have a copy of the source code, we opted to use two debugging techniques that we’ve used quite a bit in the past to look for MPI related issues:

1) Use the PMPI interface to see what’s going on in the MPI environment. See also: /arsc/support/news/hpcnews/hpcnews329/index.xml#article1 /arsc/support/news/hpcnews/hpcnews328/index.xml#article2

2) Get a core dump from the code

Both of these strategies are nice because they minimize changes to the code that’s being debugged.

Based on the name of the routine, we guessed that sendreceive was calling the MPI_SendRecv routine, which fortunately was correct.

The MPI standard includes two entry points to each MPI routine. This lets code developers write their own version of any routine by providing an alternate definition for the call. In this case, we wrote our own version of MPI_Sendrecv which printed out the parameters that were passed to the routine then called PMPI_Sendrecv to ensure our version of the routine functioned the same as the original.

e.g.


    midnight % cat sendrecv.c

    #include <mpi.h>
    #include <stdio.h>
    #include <stdlib.h>

    int MPI_Sendrecv( void *sendbuf, int sendcount, MPI_Datatype sendtype,
                      int dest, int sendtag,
                      void *recvbuf, int recvcount, MPI_Datatype recvtype,
                      int source, int recvtag, MPI_Comm comm, MPI_Status *status )
    {

        printf("MPI_Sendrecv(%s%x, %s%d, %s%d, %s%d, %s%d, %s%x, %s%d, %s%d, %s%d, %s%d, %s%d, %s%x)\n",
                 "sendbuf=", sendbuf,
                 "sendcount=", sendcount,
                 "sendtype=", sendtype,
                 "dest=", dest,
                 "sendtag=", sendtag,
                 "recvbuf=", recvbuf,
                 "recvcount=", recvcount,
                 "recvtype=", recvtype,
                 "source=", source,
                 "recvtag=", recvtag,
                 "comm=", comm, 
                 "status=", status);

        return PMPI_Sendrecv(sendbuf, sendcount, sendtype, dest, sendtag, recvbuf, recvcount, 
            recvtype, source, recvtag, comm, status);
    }

Then we compiled our version of the MPI_Sendrecv as a shared library so we could use it as a preload library:


    midnight % mpicc -fPIC sendrecv.c -c
    midnight % mpicc sendrecv.o -shared -o libsendrecv.so

The advantage of this is that we can have the code use our version of the MPI_Sendrecv rather than the system version, without recompiling the code. If the code happened to be compiled using the static version of the MPI library, this wouldn’t be an option.

At run time we added LD_PRELOAD environment to the mpirun statement so our MPI_Sendrecv version would be used.

e.g.


    mpirun LD_PRELOAD=$WORKDIR/libsendrecv.so python myscript.py

This resulted in additional output for each "sendreceive" call similar to the following:


    MPI_Sendrecv(sendbuf=120d7c0, sendcount=32, sendtype=3, dest=6, sendtag=52, 
       recvbuf=1162090, recvcount=32, recvtype=3, source=4, recvtag=52, comm=91, 
       status=0)

When the program was run it looked like the MPI_Sendrecv was being called with valid values so we adjusted the Python script so that core dumps could occur. We did this with the Python "resource" module though we also could have done this with the ulimit command.


    midnight % cat myscript.py
    #!/usr/bin/env python

    ...
    import resource
    ...

         # get current CORE limits
         ( score, hcore )= resource.getrlimit(resource.RLIMIT_CORE)
         # set CORE soft limit to the hard limit
         resource.setrlimit(resource.RLIMIT_CORE, (hcore,hcore))

         # run the rest of the program.

When the program was run, we got the core files that we wanted, which we were able to look at using gdb. After staring at the PMPI and core output we finally noticed something strange about the MPI_Status value and tested our hypothesis by writing another version of the sendrecv.c code that corrected the issue.

Ultimately this ended up being a bug in the MPI stack which required a minor code modification.

An Introduction to CMake

[ By Bob Torgerson ]

Imagine a common scenario in the world of software development; a project has hit a level of complexity required for compiling its code that a set of makefiles becomes desirable to ease this process. There are a number of options available to programmers including GNU Make, Autotools, and qmake. Even amongst these, the level of difficulty and time required to learn the new system can vary greatly. For those looking to compile C, C++, or Fortran code, an excellent alternative with a swiftly growing user base is CMake. CMake is a cross-platform open source makefile generation software package that allows for the same files to be used by multiple operating systems with less work required by the end-user and developer alike. Learning CMake is of moderate difficulty, using a small set of keywords and functions that are both robust and easy to use.

Almost all of the CMake code used to generate a native makefile on a host OS is placed into files called "CMakeLists.txt". These are simple text files containing the "skeleton" of the build by which the CMake system will append system dependent changes to compile the source code. One thing to note about this process is that for every new directory stemming from the top-level directory of a project, a new CMakeLists.txt file must accompany this subdirectory. This allows for large-scale projects, spanning multiple directories, to be built without the need to open each directory and retype make.

Let’s start with a very basic example, which creates a single executable from C++ source files found in the same directory. First, create a file called CMakeLists.txt containing these two commands:


    cmake_minimum_required(VERSION 2.4.6) 

    add_executable(helloWorld main.cpp) 

Run cmake on the directory and then make :


    cmake .

    make

That’s it! A new executable, "helloWorld", will be created in the same directory as the source files. It will only be modified via make if the CMakeLists.txt or source files are modified. You may have noticed that neither of these statements above indicates a compiler to use. This is due to the fact that CMake performs dependency checks before creating a makefile, identifying what language the source is written in and what compiler should be used automatically.

This above code also shows two of the more common commands used in CMake:

cmake_mimimum_required() is a function used by CMake to indicate the minimum version of CMake available on a user’s system for this project to compile. If this line is excluded from the CMakeLists.txt file, CMake will give a warning about not specifying a minimum version.

add_executable() is one of three ways that CMake uses to create targets in the generated makefile: add_executable, add_library, and add_custom_target. These three commands will do the bulk of the work involved in creating executable binaries and libraries on a system. We will use these in a more advanced example soon, but if you wish to learn more about any CMake command, simply type:


    cmake --help-command [cmd]

While simplistic projects like the one above are wonderfully easy to write makefiles for, we rarely get them in the real world (outside of tutorials) and most would wonder why a makefile was necessary at all. A more common problem to solve is converting old makefiles into CMakeLists.txt files. For the following example, we will convert a GNU Makefile into a CMakeLists.txt file that is capable of building the same binaries and libraries.

Let’s take a look at the Gnu Makefile that we want to convert:


CC = gcc
CFLAGS = -g -O2 -Wall -m64
LDFLAGS = -L/usr/local/lib 

AR = ar
ARFLAGS = cru
LIBS = -lgu -lm

 SRC =\
grids.c\
geom.c\
hash.c\
istack.c\
vertlist.c

HEADERS =\
grids.h\
geom.h\
hash.h\
istack.h\
vertlist.h

LIBOBJS = $(SRC:.c=.o)

SLIBOBJS = $(SRC:.c=.t)
%.t: %.c
        $(CC) $(CFLAGS) -fPIC -c $*.c -o $*.t

all: gridbuild

lib: libgrid.a

shlib: libgrid.so

gridbuild: main.c $(HEADERS) $(SRC) triangle.o
             $(CC) -o gridbuild $(CFLAGS) main.c $(SRC) $(LDFLAGS) $(LIBS)

libgrid.a: $(HEADERS) $(LIBOBJS)
                $(AR) $(ARFLAGS) $@ $(LIBOBJS) 

libgrid.so: $(HEADERS) $(SLIBOBJS)
                 rm -f $@
                 $(CC) -shared -o $@ $(SLIBOBJS) $(LDFLAGS) $(LIBS)

---------------------------------------------------------------------------

Now let’s go line by line through the equivalent CMakeLists.txt file to see what needs to be changed to make this work in CMake.

First we should add the minimum version of CMake that can use this # CMakeLists.txt file (recommended first step).


    cmake_minimum_required(VERSION 2.4.6)
Next we need to set a variable to hold the source file names. Variables are all declared in the following manner.

    set(SRC
    grids.c
    geom..c
    hash.c
    istack.c
    vertlist.c
    )
Notice that we don’t include the header files as CMake’s dependency checker will place the headers into the build as long as they are in the same directory.

    set(LIBSRC
    ${SRC}
    cmplx.c
    clog.c
    stubs.c
    )
    set(CFLAGS "-g -O2 -Wall -m64")
    set(CFLAGS_SHARED "-g -O2 -Wall -m64 -fPIC")
The command link_directories is used in the same way that -L/usr/local/lib is used in GNU Make.

    link_directories("/usr/local/lib")
To apply specialized compiler flags to certain files, you can use the set_source_files_properties function as is shown below.

    set_source_file_properties(${LIBSRC} PROPERTIES
    COMPILE_FLAGS ${CFLAGS}
    )
add_executable creates an executable given the target name (i.e., gridbuild for this one).

    add_executable(gridbuild main.c ${SRC})
Next we need to link libraries to this executable using target_link_libraries . This function is equivalent to -l<library_name>.

    target_link_libraries(gridbuild
    m
    gu
    )
Now we need to build the two libraries: one static & one shared.

    add_library(grid
    STATIC
    ${LIBSRC}
    )
We can change the compile flags we are using on the source files to change how they will be built.

    set_source_file_properties(${LIBSRC} PROPERTIES
    COMPILE_FLAGS ${CFLAGS_SHARED}
    )

    add_library(grid
    SHARED
    ${LIBSRC}
    )
Notice that when linking libraries to the shared library, that when the name is identical for both static and shared libraries, you must add the prefix & suffix of the library you wish to find to ensure you are linking to the expected library.

    target_link_libraries(
    ${CMAKE_SHARED_LIBRARY_PREFIX}grid${CMAKE_SHARED_LIBRARY_SUFFIX}
    m
    gu
    )

Overall, this process of converting to CMake was relatively painless. This code will now run on multiple operating systems as long as the end-user has CMake and a gcc compiler, as we have only included the compiler flags for gcc in this example. You can also see the power of CMake, as I have used only seven distinct functions to do all of the work of the original makefile. The number of functions to remember is minute compared to similar makefile software, easing the learning curve for new users.

From these examples, you will have a basic understanding of CMake’s functionality. With CMake installed on your system, you are capable of looking at the documentation of any of the functions discussed in this article and several more that you may find useful in future projects. A good resource for more information about CMake is Mastering CMake , which is the only book available for CMake, which comes from the developers at Kitware who created CMake. You can also sign up for their daily mailing list by visiting: http://www.cmake.org/mailman/listinfo/cmake

This mailing list is a useful source of information on resolving problems that may be similar in nature to your own, and allows you to post a query to the members of the list to help you solve your own quandary.

While CMake might not take experienced users away from their preferred makefile software, it is a good alternative to other cross platform makefile options. Although, with CMake’s ease of use and an active following, which welcomes new users to its group, everyone should take a look at what CMake has to offer to their project.

Changing of the Guard

Today is co-editor Craig Stephenson’s last day at ARSC. Craig has accepted the Web Manager position at the International Arctic Research Center on the UAF campus, where he will maintain the IARC website and help faculty and staff create and maintain their own research project web pages. We are of course happy for Craig in this new opportunity but will miss his skills, diligence and good humor at ARSC and at the HPC Newsletter in particular. We wish you well, Craig, and thank you for your service.

The news is not all bad though, because joining the HPC Newsletter as co-editor beginning with the next issue is ARSC Oceanographic Specialist Kate Hedstrom. Kate has been working with the ROMS family of ocean models for over twenty years and has recently completed a User’s Guide for it. She came to Alaska to run simulations of the waters (and sea ice) around Alaska, but has also become involved with projects coupling ROMS to both a climate model and a fish model. She has written for the HPC newsletter in the past, notably about gnu make and git. She has also been a force on the powerful (five-time city champion) CRAYons Spelling Bee team ( /arsc/support/news/hpcnews/hpcnews401/index.xml#article2 ), a talent that will no doubt come in handy in her new role.

Quick-Tip Q & A


A:[[ The colored text in my terminal window can be pretty annoying.  It’s 
[[ useful sometimes, but difficult to see other times.  What are my options?  
[[ Any of the following would be useful:
[[ 
[[  1) Turn off colored text altogether
[[  2) Remove the color, but allow styled text (e.g., bold, underline) to 
[[     point out particular things like executable files or directories
[[  3) Toggle text coloring on and off with the press of a button or a short 
[[     command

#
# Greg Newby responded with the following advice:
#

One of the most common colorized commands I have seen is "ls"
which on Linux systems is often aliased or otherwise defaults to 
"ls --color", which colorizes output from ls.

Search your favorite search engine for "LS_COLORS" if you want to know
how to set an environment variable to select which colors are displayed.

To temporarily unalias the ls command, or any other command, just
preface with a backslash.  So, if you have something like this
in your .cshrc:
 alias ls ’ls --color’
or your .profile:
 alias ls=’ls --color’

you can run "regular" ls (often /bin/ls or /usr/bin/ls) from your
shell prompt with:
 \ls

to remove the aliases.  You could also type the full path to 
avoid alias matching:
 /bin/ls

# An editor had the following trick to contribute:

Whenever I encounter troublesome colors in the "ls" output, I just pipe 
the ls command to "cat" to strip the filenames of their colors.  For 
example:

ls 
 cat

The above command has the side effect of arranging the filenames into a 
single column, but typically I’m more interested in the "long listing" 
ls output anyway:

ls -l 
 cat

The output of this command looks just like the original "ls -l" command, 
but without special colors.


Q: Some times I need to loosen up the permissions on a file so someone
else can access it. Of course, I use ’chmod’ to do this to the file and
possibly even the directories above the file. No matter how many times
this comes up, I sometimes fail to fully achieve my goal and need to
make a second effort. The problem is, the only way I can find out if I
failed is when the "someone else" tells me that they still can’t reach
the file.

My question then is, is there a utility that will tell me if a specific
(or arbitrary)  user can read/write/execute a given file?
[ Thanks to Bob Moench for this question! -- Eds. ]

[[ Answers, Questions, and Tips Graciously Accepted ]]


Current Editors:
Ed Kornkven ARSC HPC Specialist ph: 907-450-8669
Kate Hedstrom ARSC Oceanographic Specialist ph: 907-450-8678
Arctic Region Supercomputing Center
University of Alaska Fairbanks
PO Box 756020
Fairbanks AK 99775-6020
E-mail Subscriptions: Archives:
    Back issues of the ASCII e-mail edition of the ARSC T3D/T3E/HPC Users' Newsletter are available by request. Please contact the editors.
Back to Top