[Menu Bar] Resourses at ARSC Science at ARSC Newsroom Support About ARSC ARSC Home

Introduction to Using the Sun Opteron Cluster

Contents

 

Introduction

The Arctic Region Supercomputing Center (ARSC) operates a Sun Opteron cluster (Midnight) running SuSE Linux Enterprise Server 9.3.

 

"Midnight"

The ARSC Sun Opteron cluster consists of the following hardware:

 

Operating System / Shells

The operating system on Midnight is SuSE Linux Enterprise Server (SLES) version 9.3.

The following shells are available on Midnight:

If you would like to have your login shell changed, please contact User Support.

 

System News & Status

System news is available via the news command when logged on to Midnight. For example, the command "news queues" gives news about the current queue configuration. System status and non sensitive news items are available on the web.

 

Storage

Several environment variables are defined to point to the available storage on Midnight.

Name Notes Quota1 Purge Policy Back Up Policy
$HOME $HOME directories are intended for locally compiled executables and libraries, dot files, and small data sets. ARSC recommends that you avoid using $HOME in parallel jobs. 500 MB not purged backed up
$WORKDIR
$WRKDIR2
$WORKDIR is a high performance parallel Lustre file system available from all nodes on midnight. This is the preferred location to run large parallel jobs. 100 GB 30 day purge policy3 not backed up
$ARCHIVE_HOME
$ARCHIVE2
long-term storage, not available from compute nodes none not purged backed up

NOTES:

  1. Requests for increased quotas should be sent to User Support.
  2. $WRKDIR and $ARCHIVE are deprecated versions of the environment variables and may be phased out in the near future.
  3. The getPurgable command can be used to determine which files are susceptible to purging (see getPurgable -help for available options).

See http://www.arsc.edu/support/howtos/storage.html and for more information on the storage environment at ARSC.

 

Sample Code Repository ($SAMPLES_HOME)

The $SAMPLES_HOME directory on midnight contains a number of examples including, but not limited to:

A description of items available in the Sample Code Repository is available on the web, however you must login to midnight to access these examples.

If you would like to see additional samples or would like to contribute an example, please contact User Support.

 

Parallel Programming Models

Several types of parallelism can be exploited on Midnight using different programming models and methods. These are listed in the table below.

Hardware Level Model Description
Shared-memory node Auto Automatic shared-memory parallel executables can be compiled by and linking with the -apo option to pathf95, pathcc, or pathCC. Since only a subset of loops can generally be parallelized this way OpenMP directives can further improve performance.
Shared-memory node OpenMP This is a form of explicit parallel programming in which the programmer inserts directives into the program to spawn multiple shared-memory threads, typically at the loop level. It is common, portable, and relatively easy. On the downside, it requires shared memory which limits scaling to the number of processors on a node. To activate OpenMP directives in your code, use the -mp option to pathf95, pathcc, or pathCC. OpenMP can be used in conjunction with autoparallelization.
Shared-memory node pthreads The system supports POSIX threads.
Distributed memory system MPI

This is the most common and portable method for parallelizing codes for scalable distributed memory systems. MPI is a library of subroutines for message passing, collective operations, and other forms of inter-processor communication. The programmer is responsible for implementing data distribution, synchronization, and reassembly of results using explicit MPI calls.

Using MPI, the programmer can largely ignore the physical organization of processors into nodes and simply treat the system as a collection of independent processors.

 

Programming Environment

The following programming tools are available on Midnight.
Item PathScale MPI compilers (*) Portland Group GNU Compilers v3.3 Sun Studio
Fortran 77 pathf95 mpif77 pgf77 g77 sunf77
Fortran 90/95 pathf95 mpif90 pgf90   sunf90 / sunf95
C compiler pathcc mpicc pgcc gcc / cc suncc
C++ compiler pathCC mpicxx pgCC g++ / c++ sunCC
Debuggers pathdb totalview pgdbg gdb dbx
Performance Analysis pathprof   pgprof gprof analyzer
Default MPI module (*) PrgEnv or Prgenv.path   PrgEnv.pgi PrgEnv.gcc PrgEnv.sun
Batch queueing system PBS Professional

There are several compiler suites available on Midnight. The PathScale compilers and tools are the recommended set. The GNU compiler and tools are also included for completeness of the system software.

* The default environment for new accounts loads the PathScale MPI compilers into the PATH via the PrgEnv module. Should you wish to use the GNU MPI compilers instead, load the PrgEnv.gcc module. For more information on the available programming environments see news PrgEnv and the modules section below.

 

Other Available PathScale Tools

Command Notes
assign Alters the way Fortran I/O is performed.
explain Provides extended information for compiler error and diagnostic messages.
pathhow-compiled Show the compilers options that were used to produce an object file.
e.g.
  midnight% pathhow-compiled sample.o

 

Modules

Midnight has the modules package installed. This tool allows a user to quickly and easily switch between different versions of a package (e.g. compilers). The module package sets common environment variables used by applications such as PATH, MANPATH, LD_LIBRARY_PATH, etc. The PrgEnv module is loaded by default in the shell skeleton files for new accounts. The PrgEnv module loads the PathScale compilers and MPI compiler wrappers into your PATH. Alternately the PrgEnv.gcc module is provided for users who prefer to use the MPI compiler wrappers with the GNU compilers.

Command Example Use Purpose
module avail module avail lists all available modules for the system.
module load pkg module load PrgEnv loads a module file from the environment
module unload pkg module unload PrgEnv unloads a module file from the environment
module list module list displays the modules which are currently loaded.
module switch old new module switch PrgEnv PrgEnv.gcc replaces the module old with module new in the environment
module purge module purge unload all module settings, restoring the environment to the state before any modules were loaded.

See "man module" for more information on the module command. See news modules for more information on using modules at ARSC.

Typically modules only need to be loaded on login. This can be done by placing module load commands in your .login (csh/tcsh users), .profile (sh/ksh/bash) or .bash_profile (bash).

e.g.
####
# Run common module commands
#
# load the PrgEnv module 
#   (uses Pathscale compilers for MPI environment)
module load PrgEnv
# alternately load the PrgEnv.gcc module 
#   (uses GNU compilers for the MPI environment)
#module load PrgEnv.gcc
# list currently loaded modules
module list



Compiling and Linking Fortran Programs

The default Fortran 90 compiler is:

pathf90

The Fortran 90 compiler with MPI support is:

mpif90

NOTE: the PrgEnv module must be loaded to use this compiler.

Here's a sample compiler command showing several common options:
midnight% pathf90 -O3 reduce.f90

see man pathf90 for additional information.

PathScale Fortran 90 Compiler Options

Option Description
-show-defaults List default compiler options for the compiler and exits.
-c Generate intermediate object file but does not attempt to link.
-g Adds information for debugging to the object file and/or executable.
-I<directory> Tells the preprocessor to search in directory for include or module files.
-L<directory> Tells the linker to search in directory for libraries.
-r8 Promotes REALs from the default size of 4 bytes to 8 bytes.
-i8 Promotes INTEGERs from the default size of 4 bytes to 8 bytes.
-cpp Preprocess files with the C preprocessor. Enabled by default for files ending in .F,.F90, or .F95.
-ftpp Preprocess files with the Fortran preprocessor. Useful when portions of the Fortran code could be misinterpreted as C preprocessor directives (e.g. "//")
-O3 Higher level of optimization than -O2 (the default optimization level).
-O3 -OPT:Ofast Higher optimization level than -O3
-ipa Tells the compiler to perform interprocedural analysis. Can be very time consuming to perform. This flag should also be used in both compilation and linking steps. Not recommended for programs over 100,000 lines for the current compiler release.
-intrinsic=PGI Enables intrinsic functions that are available in the PGI compiler which are not ANSI standard (e.g. rand)
-apo Enables autoparallelization.
-mp Enables parallelization via OpenMP directives.

Many other compiler options are available. Check the online manuals and if you have difficulty finding specific information, or contact User Support for additional help.

 

Compiling and Linking C/C++ Programs

The C compilers are:

pathcc

C++ uses the command:

pathCC

Here are sample compiler commands showing common options:

midnight% pathcc -O3 -o prog prog.c

midnight% pathCC -O3 -o prog prog.cpp

Details concerning the examples (see "man pathcc or man pathCC" for more):

Option Description
-show-defaults List default compiler options for the compiler and exits.
-c Generate intermediate object file but does not attempt to link.
-g Adds information for debugging to the object file and/or executable.
-I<directory> Tells the preprocessor to search in directory for include or module files.
-L<directory> Tells the linker to search in directory for libraries.
-O3 Higher level of optimization than -O2 (the default optimization level).
-Ofast Higher level optimization (default is -O2). This flag should be used in both compilation and linking steps.
-ipa Tells the compiler to perform interprocedural analysis. This option can be very time consuming to perform. This flag should be used in both compilation and linking steps. Not recommended for programs over 100,000 lines for the current compiler release.
-apo Enables autoparallelization.
-mp Enables parallelization via OpenMP directives.

 

Libraries

Libraries on midnight are generally available for the PathScale, Portland Group and GNU compiler suites. Most current versions of libraries and include files are available in the following directories:

Pathscale Compilers Libraries: /usr/local/pathscale/lib
  Include Files: /usr/local/pathscale/include
PGI Compilers Libraries: /usr/local/pgi/lib
  Include Files: /usr/local/pgi/include
GNU Compilers Libraries: /usr/local/GNU/lib
or
/usr/local/lib
 Include Files: /usr/local/GNU/include
or
/usr/local/include

The following Libraries are currently available on midnight:

Library Notes
ACML AMD Core Math Library including BLAS, LAPACK, and FFT routines
Current Pathscale Version:
/usr/local/pathscale/pathscale64/lib (single threaded)
/usr/local/pathscale/pathscale64_mp/lib (multi-threaded)
Current PGI Version: /usr/local/pgi/pgi64/lib (single threaded)
/usr/local/pgi/pgi64_mp/lib (multi-threaded)
Current GNU Version:
/usr/local/GNU/lib
Alternate Versions: /usr/local/pkg/acml
Note: this library makes use of shared libraries.
BLACS Basic Linear Algebra Communication Subprograms
Current Pathscale Version: /usr/local/pathscale/lib
Current PGI Version: /usr/local/pgi/lib
Current GNU Version:
/usr/local/GNU/lib
Alternate Versions:
/usr/local/pkg/blacs
BLAS See ACML
FFTW C subroutine library for computing the Discrete Fourier Transform
Current Pathscale Version:
/usr/local/pathscale/lib
Current PGI Version: /usr/local/pgi/lib
Current GNU Version:
/usr/local/GNU/lib
Alternate Versions:
/usr/local/pkg/fftw
GSL GNU Scientific Mathematic Library for numerical C and C++ programs
Current Pathscale Version:
/usr/local/pathscale/lib
Current PGI Version: /usr/local/lib (Use GNU Version)
Current GNU Version:
/usr/local/GNU/lib
Alternate Versions: /usr/local/pkg/gsl
Note: this library makes use of shared libraries.
HDF Hierarchical Data Format for transferring graphical and numerical data among computers
Current Pathscale Version:
/usr/local/pathscale/lib
Current PGI Version: /usr/local/pgi/lib
Current GNU Version:
/usr/local/GNU/lib
Alternate Versions:
/usr/local/pkg/hdf
LAPACK See ACML
NCAR Graphics Graphics Visualization Software in Fortran and C
Current Pathscale Version:
Current PGI Version: /usr/local/pgi/lib
Current GNU Version:
/usr/local/GNU/lib
Alternate Versions:
/usr/local/pkg/ncar
NetCDF network Common Data Form
Current Pathscale Version:
/usr/local/pathscale/lib
Current PGI Version: /usr/local/pgi/lib
Current GNU Version:
/usr/local/GNU/lib
Alternate Versions:
/usr/local/pkg/netcdf
ScaLAPACK Scalable LAPACK, Library of high-performance linear algebra routines
Current Pathscale Version:
/usr/local/pathscale/lib
Current PGI Version: /usr/local/pgi/lib
Current GNU Version:
/usr/local/GNU/lib
Alternate Versions:
/usr/local/pkg/scalapack

Special Notes for Shared Libraries

Midnight has libraries built for both the PathScale and GNU compiler suite. If you use a package which requires shared libraries, it's recommended that you use the "-Wl,-R" flags on the linking line for your application to ensure you get the proper shared library at runtime.

e.g.

midnight% mpif90 blas_app.o -Wl,-R/usr/local/pathscale/pathscale64/lib -L/usr/local/pathscale/pathscale64/lib -lacml -lacml_mv -o blas_app

This will ensure that the resultant executable can resolve the location of the shared library at runtime without requiring LD_LIBRARY_PATH or LD_PRELOAD environment variables to be set.

 

PathScale Runtime Environment Variables

Below are some environment variables which affect the run time function of executables compiled with the PathScale compilers.
Environment Variable Languages Description
PSC_OMP_AFFINITY OpenMP codes PSC_OMP_AFFINITY specifies whether or not OpenMP tasks should be bound to particular processors. Valid values are TRUE or FALSE
PSC_FDEBUG_ALLOC Fortran codes PSC_FDEBUG_ALLOC specifies the initial value to be placed in Fortran memory allocations for debugging purposes. Valid values are ZERO, NaN or NaN8
FILENV Fortran codes FILENV specifies the file name which contains assign options to be used by a Fortran program.

 

MPI Environment

Midnight uses the Voltaire MVAPICH MPI stack which is based on MPICH.

Passing stdin to MPI applications

The MPI environment on midnight will attempt to send stdin to all tasks by default. Many programs expect stdin only for task 0. The "-input0" flag for mpirun specifies that only task 0 should read stdin.

mpirun -np 16 -input0 ./lammps < lammps.inp

Setting Environment Variables for use in MPI applications

Environment variables set within a PBS script are not available to the MPI environment unless they are explicitly listed in the mpirun statement. The example below sets the environment variable APP_ROOT for the shell and then "exports" the same variable to the MPI environment.

export APP_ROOT=$WORKDIR
mpirun -np 4 APP_ROOT=$APP_ROOT ./a.out

Exceptions:

The following environment variables are available to MPI jobs without using the syntax above.

Repeated allocation and deallocation of memory

Programs that repeatedly allocate and deallocate large segments of memory, may run into issues with the Voltaire MPI stack. The environment variable LAZY_UNREG_DISABLE can be used to override the default memory allocation behavior.

mpirun -np 4 LAZY_UNREG_DISABLE=1 ./a.out

This option is recommended only for codes which experience memory issues.

Setting Limits for MPI applications

System limits set within a PBS script do not propogate to an MPI application. Limits for an MPI application must be set in either the .cshrc (csh/tcsh) or the .bashrc (bash).

csh/tcsh version in .cshrc
# set core file size limit to unlimited
limit coredumpsize unlimited

bash version in .bashrc
# set core file size limit to unlimited
ulimit -Sc unlimited

 

Performance Analysis

Midnight has several different profiling tools available. These tools are generally used for serial applications and may be less useful for parallel applications. The GNU tool, gprof offers a command line interface for profiling executables, while the PathScale profiler is, pathprof, is GUI driven. To use gprof, follow these steps:

  1. Build your program with the flags -g -pg. This will add code to support profiling, and debugging.
    pathf90 program.f90 -pg -g -o profiled-program
  2. Run the executable as normal. This will produce a file (gmon.out), containing output statistics for the run.
    ./profiled-program
  3. Generate a human-readable report from the gmon.out file using a profiling tool such as gprof. Since gprof sends the results to stdout you may want to redirect the output to a file as shown below.
    gprof profiled-program gmon.out > profiled-program.gprof_report
  4. View the report:

    more profiled-program.gprof_report

Running Interactive Jobs

You are encouraged to use the PBS batch system, but may run interactive jobs as well via the debug queue. An interactive command is simply typed at the prompt in a terminal window. Standard error and standard output are displayed on the terminal, redirected to a file, or piped to another command using appropriate Unix shell syntax.

Alternatively, you may use the batch system to spawn an interactive job by using the following command:

qsub -q debug -l select=1:ncpus=4:node_type=4way -I

Once your job is started, you may then run interactive commands on the compute node(s) PBS assigned your session.

 

Running Batch Jobs

All production work on midnight is run through the PBS batch scheduler. The batch system offers greater memory and wall time limits than on interactive nodes. Jobs run through the batch system also convieniently save stdout/stderr to a file(s) for each run, and will continue after logging off.

A batch job is a shell script prefaced by a statement of resource requirements and instructions which PBS will use to manage the job.

PBS scripts are submitted for processing with the command, qsub.

As outlined below, five steps are common in the most basic batch processing:

  1. Create a batch script:

    Normally, users embed all PBS request options within the batch script.

    In a batch script comforming to PBS syntax, all PBS options must precede all shell commands. Each line containing a PBS option must commence with the character string, "#PBS" followed by one or more spaces, followed by the option.

    PBS scripts begin execution from your home directory and thus, the first executable shell script command is a usually a "cd" to the work or run directory. The environment variable PBS_O_WORKDIR is set to the directory from which the script was submitted.

  2. PBS Script- MPI Example using Sun Fire x2200 nodes

    ScriptNotes
    #!/bin/bash
    #PBS -q standard          
    #PBS -l select=8:ncpus=4:node_type=4way
    #PBS -l walltime=8:00:00 
    #PBS -j oe
                   
    cd $PBS_O_WORKDIR
    
    mpirun -np 32 ./myprog
    
    
    
    Select the shell.
    The -q option requests the queue to run in.
    Requests 8 "blocks" of 4 processors on x2200 (i.e. 4 way) nodes.
    Requests that the job be allowed to run 8 hours.
    The -j option joins output and error files.
      
    change to the directory to initial working directory.
      
    run the mpi program.
    
    
    Open Script in New Window

    PBS Script- MPI Example using Sun Fire x4600 nodes

    ScriptNotes
    #!/bin/bash
    #PBS -q standard          
    #PBS -l select=2:ncpus=16:node_type=16way
    #PBS -l walltime=8:00:00 
    #PBS -j oe
                   
    cd $PBS_O_WORKDIR
    
    mpirun -np 32 ./myprog
    
    
    
    Select the shell.
    The -q option requests the queue to run in.
    Requests 2 "blocks" of 16 processors on x4600 (16 way) nodes.
    Requests that the job be allowed to run 8 hours.
    The -j option joins output and error files.
     
    change to the directory to initial working directory.
      
    run the mpi program.
    
    Open Script in New Window

    PBS Script- OpenMP Example using Sun Fire x4600 nodes

    ScriptNotes
    #!/bin/bash
    #PBS -q standard          
    #PBS -l select=1:ncpus=16:node_type=16way
    #PBS -l walltime=8:00:00 
    #PBS -j oe
                   
    cd $PBS_O_WORKDIR
    export OMP_NUM_THREADS=16
    export PSC_OMP_AFFINITY=TRUE
    
    ./myprog
    
    
    Select the shell.
    The -q option requests the queue to run in.
    Requests 1 "blocks" of 16 processors on x4600 (i.e. 16 way) nodes.
    Requests that the job be allowed to run 8 hours.
    The -j option joins output and error files.
     
    change to the directory to initial working directory.
    set the number of OpenMP threads to use.
    set threads to have CPU affinity.  
    
    run the OpenMP program.
    
    Open Script in New Window

    PBS Script- Data Staging Script

    ScriptNotes
    #!/bin/bash
    #PBS -q data          
    #PBS -l select=1:ncpus=1
    #PBS -l walltime=4:00:00 
    #PBS -j oe
                    
    cd $PBS_O_WORKDIR
     
    cp -r $ARCHIVE_HOME/mydataset/* . || exit 1
     
    qsub mpi_job.pbs
    
    Select the shell.
    The -q option requests the queue to run in.
    Requests 1 processor (data queue jobs must be serial)
    Requests that the job be allowed to run 4 hours.
    The -j option joins output and error files.
      
    change to the directory to initial working directory.
     
    copy files from long term storage to current working directory.
     
    submit compute job
    
    Open Script in New Window
  3. Submit a batch script to PBS, using qsub:

    The script file can be given any name. If the above sample were named myprog.pbs, it would be submitted for processing with the command

    qsub myprog.pbs
  4. Monitor the job

    To check the status of the submitted PBS job, execute this command:

    qstat -a
  5. Delete the job

    Given its PBS identification number (returned when you "qsub" the job and shown in "qstat -a" output), you can delete the job from the batch system with this command:

    qdel <PBS-ID>
  6. Examine Output

    When the job completes, PBS will save the stdout and stderr from the job to a file in the directory from which it was submitted. These files will be named using the script name and PBS identification number. For example,

    myprog.pbs.o<PBS-ID>

PBS Queues

List all available queues with the command qstat -Q. List details on any queue, for instance, "standard," with the command qstat -Qf standard. You may also read news queues for information on all queues, but note that the most current information is always available using the qstat commands.

Currently, these are the available queues:

Queue Purpose
background
For projects with little or no remaining allocation. This queue has the lowest priority, however projects running jobs in this queue do not have allocation deducted.
challenge
For projects with challenge status. (Limited Access)
data
Provides access to long term storage (i.e. $ARCHIVE)
debug
Quick turn-around queue for debugging work.
high
Limited access.
special
For jobs which do not fit into the normal queue limits for the system. (Limited access)
standard
General use by all allocated users.
urgent
Limited access.

 

Online Manuals and Documentation

External Links

Local Documentation Information

The documentation links above are available on midnight in the documents news item. Active users may type the command:

midnight% news documents

 

More Information

General information on ARSC and its other resources is available in a number of forms

 

Arctic Region Supercomputing Center
PO Box 756020, Fairbanks, AK 99775 | voice: 907-450-8600 | email:

home | search | about | support | news | science | resources