[Menu Bar] Resourses at ARSC Science at ARSC Newsroom Support About ARSC ARSC Home

Introduction to Using the Cray XT5 (Pingo)

Contents

 

Introduction

The Arctic Region Supercomputing Center (ARSC) operates a Cray XT5 (pingo) running UNICOS/lc 2.1.

 

"Pingo"

The ARSC Cray XT5 consists of the following hardware:

 

Operating System / Shells

The operating system on pingo is UNICOS/lc 2.1 based on the SuSE Linux Enterprise Server.

The following shells are available on pingo:

If you would like to have your login shell changed, please contact User Support.

 

System News & Status

System news is available via the news command when logged on to Pingo. For example, the command "news queues" gives news about the current queue configuration. System status and non sensitive news items are available on the web.

 

Storage

Several environment variables are defined to point to the available storage on Pingo.

Name Notes Quota1 Purge Policy Back Up Policy
$HOME $HOME directories are intended for locally compiled executables and libraries, dot files, and small data sets. 1GB2 not purged backed up
$WORKDIR
$WORKDIR is a high performance parallel Lustre file system available from all nodes on pingo. Currently there are no quotas enabled 30 day purge policy3 not backed up
$ARCHIVE_HOME
long-term storage, not available from compute nodes none not purged backed up

NOTES:

  1. Requests for increased quotas should be sent to User Support.
  2. The qcheck command should be used to check storage usage and quota information.
  3. The getPurgable command can be used to determine which files are susceptible to purging (see getPurgable -help for available options).

See http://www.arsc.edu/support/howtos/storage.html and for more information on the storage environment at ARSC.

 

Sample Code Repository ($SAMPLES_HOME)

The $SAMPLES_HOME directory on pingo contains a number of examples including, but not limited to:

An index of items available in the Sample Code Repository is available on the web, however you must login to pingo to access these examples.

If you would like to recommend a new example, please contact User Support.

 

Parallel Programming Models

Several types of parallelism can be exploited on pingo using different programming models and methods. These are listed in the table below.

Hardware Level Model Description
Shared-memory node Auto When enabled, automatic parallelization can be used on shared-memory programs. Basic loops can be automatically parallelized, but for additional performance improvements, try using OpenMP directives. This option is available with the PGI and Pathscale environments.
Shared-memory node OpenMP This is a form of explicit parallel programming in which the programmer inserts directives into the program to spawn multiple shared-memory threads, typically at the loop level. It is a common method of parallelization, portable, and relatively easy to implement. On the downside, it requires shared memory which limits scaling to the number of processors on just one node. OpenMP can be used in conjunction with autoparallelization and is available with the PGI, Pathscale, and GNU environments.
Distributed memory MPI

This is the most common and portable method for parallelizing codes for scalable distributed memory systems. MPI is a library of subroutines for message passing, collective operations, and other forms of inter-processor communication. The programmer is responsible for implementing data distribution, synchronization, and reassembly of results using explicit MPI calls.

Using MPI, the programmer can largely ignore the physical organization of processors into nodes and simply treat the system as a collection of independent processors. MPI is available with the PGI, Pathscale, and GNU environments.

Distribute memory SHMEM This library supports one-sided communications between tasks.

 

Programming Environment

The following programming tools are available on Pingo. Be sure to swap the appropriate modules when switching among compilers.
Item PGI PathScale GNU Compilers
Fortran 77/90/95 (Compute Node) ftn ftn ftn
C compiler (Compute Node) cc cc cc
C++ compiler (Compute Node) CC CC CC
Fortran 77/90/95 (Login Node Only) pgf90 pathf90 gfortran
C compiler (Login Node Only) pgcc pathcc gcc
C++ compiler (Login Node Only) pgCC pathCC g++
Debuggers totalview totalview totalview
Performance Analysis CrayPat CrayPat CrayPat
Default module (*) PrgEnv-pgi (default) PrgEnv-pathscale PrgEnv-gnu
Batch queueing system
PBS Professional 9.2

The PGI programming environment is loaded by default for all accounts.

 

Modules

Pingo has the modules package installed. This tool allows a user to quickly and easily switch between different versions of a package (e.g. compilers). The module package sets common environment variables used by applications such as PATH, MANPATH, etc. The PrgEnv-pgi module is loaded by default for all accounts. The PrgEnv-pgi module loads the current version of the PGI compilers with MPI support into your PATH.

Command Example Use Purpose
module avail module avail lists all available modules for the system.
module load pkg module load PrgEnv-pathscale loads a module file from the environment
module unload pkg module unload PrgEnv-pgi unloads a module file from the environment
module list module list displays the modules which are currently loaded.
module switch old new module switch PrgEnv-pgi PrgEnv-gnu replaces the module old with module new in the environment
module purge module purge unload all module settings, restoring the environment to the state before any modules were loaded.

See "man module" for more information on the module command. See news modules for more information on using modules at ARSC.

Typically modules only need to be loaded on login. This can be done by placing module load commands in your .cshrc (csh/tcsh users), .bashrc (bash) or .profile (ksh).

 

PGI - Compiling and Linking Fortran Programs

The Fortran 90 compiler is:

ftn

This compiler has OpenMP, MPI, and Shmem support.

Here's a sample compiler command showing several common options:
pingo% ftn -O3 reduce.f90

Portland Group Fortran 90 Compiler Options

Option Description
-c Generate intermediate object file but does not attempt to link.
-g Adds information for debugging to the object file and/or executable.
-I<directory> Tells the preprocessor to search in directory for include or module files.
-L<directory> Tells the linker to search in directory for libraries.
-r8 Promotes REALs from the default size of 4 bytes to 8 bytes.
-i8 Promotes INTEGERs from the default size of 4 bytes to 8 bytes.
-default64 Passes the -i8 and -r8 options to the compiler.
-O3 Higher level of optimization than -O2 (the default optimization level).
-fast Higher optimization level than -O3
-tp barcelona-64 Use optimizations for the AMD Barcelona Quad Core processor (default).
-Mipa Tells the compiler to perform interprocedural analysis. Can be very time consuming to perform. This flag should also be used in both compilation and linking steps.
-Mconcur Enables autoparallelization. Additional options can be used with -Mconcur to provide more fine-grained control of autoparallelization, see man pgf90 for details.
-mp=nonuma Enables parallelization via OpenMP directives.
-Minfo Displays useful information to stderr, see man pgf90 for details.
-Mneginfo Displays information on why a particular optimization was not performed.

Many other compiler options are available. View the man pages for additional information:

 

PGI - Compiling and Linking C/C++ Programs

The C compilers are called using the following command:

cc

Compiling in C++ requires the command:

CC

These compilers have OpenMP, MPI, Shmem and Pthreads support.

Here are sample compiler commands showing common options:

pingo% cc -O3 -o prog prog.c

pingo% CC -O3 -o prog prog.cpp

Option Description
-c Generate intermediate object file but does not attempt to link.
-g Adds information for debugging to the object file and/or executable.
-I<directory> Tells the preprocessor to search in directory for include or module files.
-L<directory> Tells the linker to search in directory for libraries.
-O3 Higher level of optimization than -O2 (the default optimization level).
-fast Higher level optimization (default is -O2). This flag should be used in both compilation and linking steps.
-Mipa Tells the compiler to perform interprocedural analysis. This option can be very time consuming to perform. This flag should be used in both compilation and linking steps.
-Mconcur Enables autoparallelization. Additional options can be used with -Mconcur to provide more fine-grained control of autoparallelization, see man pgcc or man pgCC for details.
-mp=nonuma Enables parallelization via OpenMP directives.
-Minfo Displays useful information to stderr, see man pgf90 for details.
-Mneginfo Displays information on why a particular optimization was not performed.

Many other compiler options are available. View the man pages for additional information:

 

PathScale - Compiling and Linking Fortran Programs

The Fortran 90 compiler is:

ftn

This compiler has OpenMP, MPI, Shmem and Pthreads support.

Here's a sample compiler command showing several common options:

pingo% ftn -O3 reduce.f90

PathScale Fortran 90 Compiler Options

Option Description
-show-defaults List default compiler options for the compiler and exits.
-c Generate intermediate object file but does not attempt to link.
-g Adds information for debugging to the object file and/or executable.
-I<directory> Tells the preprocessor to search in directory for include or module files.
-L<directory> Tells the linker to search in directory for libraries.
-r8 Promotes REALs from the default size of 4 bytes to 8 bytes.
-i8 Promotes INTEGERs from the default size of 4 bytes to 8 bytes.
-default64 Passes the -i8 and -r8 options to the compiler.
-O3 Higher level of optimization than -O2 (the default optimization level).
-cpp Preprocess files with the C preprocessor. Enabled by default for files ending in .F,.F90, or .F95.
-ftpp Preprocess files with the Fortran preprocessor. Useful when portions of the Fortran code could be misinterpreted as C preprocessor directives (e.g. "//")
-O3 Higher level of optimization than -O2 (the default optimization level).
-O3 -OPT:Ofast Higher optimization level than -O3
-ipa Tells the compiler to perform interprocedural analysis. Can be very time consuming to perform. This flag should also be used in both compilation and linking steps. Not recommended for programs over 100,000 lines for the current compiler release.
-intrinsic=PGI Enables intrinsic functions that are available in the PGI compiler which are not ANSI standard (e.g. rand)
-apo Enables autoparallelization.
-mp Enables parallelization via OpenMP directives.

Many other compiler options are available. View the man pages for additional information:

 

PathScale - Compiling and Linking C/C++ Programs

The C compilers are called using the following command:

cc

Compiling in C++ requires the command:

CC

This compiler has OpenMP, MPI, Shmem and Pthreads support.

Here are sample compiler commands showing common options:

pingo% cc -O3 -o prog prog.c

pingo% CC -O3 -o prog prog.cpp

Option Description
-show-defaults List default compiler options for the compiler and exits.
-c Generate intermediate object file but does not attempt to link.
-g Adds information for debugging to the object file and/or executable.
-I<directory> Tells the preprocessor to search in directory for include or module files.
-L<directory> Tells the linker to search in directory for libraries.
-O3 Higher level of optimization than -O2 (the default optimization level).
-Ofast Higher level optimization (default is -O2). This flag should be used in both compilation and linking steps.
-ipa Tells the compiler to perform interprocedural analysis. This option can be very time consuming to perform. This flag should be used in both compilation and linking steps. Not recommended for programs over 100,000 lines for the current compiler release.
-apo Enables autoparallelization.
-mp Enables parallelization via OpenMP directives.

Many other compiler options are available. View the man pages for additional information:

 

PathScale - Other Tools

The following tools are available with the PathScale compiler suite. 

Command Notes
assign Alters the way Fortran I/O is performed.
explain Provides extended information for compiler error and diagnostic messages.
pathhow-compiled Shows the compiler options used to produce an object file.
e.g.
  pingo% pathhow-compiled sample.o

 

PathScale - Runtime Environment Variables

Below are some environment variables which affect the run time function of executables compiled with the PathScale compilers. 

Environment Variable Languages Description
PSC_OMP_AFFINITY OpenMP codes PSC_OMP_AFFINITY specifies whether or not OpenMP tasks should be bound to particular processors. Valid values are TRUE or FALSE
PSC_FDEBUG_ALLOC Fortran codes PSC_FDEBUG_ALLOC specifies the initial value to be placed in Fortran memory allocations for debugging purposes. Valid values are ZERO, NaN or NaN8
FILENV Fortran codes FILENV specifies the file name which contains assign options to be used by a Fortran program.

 

Libraries

Libraries on pingo are generally available for the PathScale, Portland Group and GNU compiler suites. Most current versions of libraries and include files are available in the following directories:

Pathscale Compilers Libraries: /usr/local/pathscale/lib
  Include Files: /usr/local/pathscale/include
PGI Compilers Libraries: /usr/local/pgi/lib
  Include Files: /usr/local/pgi/include
GNU Compilers Libraries: /usr/local/gnu/lib
or
/usr/local/lib
 Include Files: /usr/local/gnu/include
or
/usr/local/include

Performance Analysis

The preferred performance analysis tool on Pingo is CrayPat. Here are the basic steps required to build an instrumented executable

  1. Load the "xt-craypat" module

    pingo1% module load xt-craypat

  2. Recompile the code as you normally would to generate an executable.

    pingo1% ftn mycode.f90 -o mycode

  3. Use the pat_build command to generate an instrumented executable.

    pingo1% pat_build -g mpi -u mycode

    This generates an instrumented executable called mycode+pat. Here the "-g" option enables "mpi" tracegroup. See "man pat_build" for available tracegroups.

  4. Run the instrumented executable with aprun via PBS.

    aprun -n 80 ./mycode+pat

    This generates an instrumented output file (e.g. mycode+pat+2007-12tdt.xf).

  5. Use pat_report to display the statistics from the output file

    pingo1% pat_report mycode+pat+2007-12tdt.xf > mycode.pat_report

Additional profiling options are available. See "man pat_build" for additional instrumentation options.

Running Interactive Jobs

You may use the batch system to spawn an interactive job by using the following command:

qsub -q debug -l mppwidth=16 -I

Once your job is started, you may run interactive commands on the compute node(s) PBS assigned your session. For example:

aprun -n 16 ./myprog

Running Batch Jobs

All production work on pingo is run through the PBS batch scheduler. The batch system offers greater memory and wall time limits than on interactive nodes. Jobs run through the batch system also convieniently save stdout/stderr to a file(s) for each run, and will continue after logging off.

A batch job is a shell script prefaced by a statement of resource requirements and instructions which PBS will use to manage the job.

PBS scripts are submitted for processing with the command, qsub.

As outlined below, five steps are common in the most basic batch processing:

  1. Create a batch script:

    Normally, users embed all PBS request options within the batch script.

    In a batch script comforming to PBS syntax, all PBS options must precede all shell commands. Each line containing a PBS option must commence with the character string, "#PBS" followed by one or more spaces, followed by the option.

    PBS scripts begin execution from your home directory and thus, the first executable shell script command is a usually a "cd" to the work or run directory. The environment variable PBS_O_WORKDIR is set to the directory from which the script was submitted.

  2. PBS Script- MPI Example

    ScriptNotes
    #!/bin/bash
    #PBS -q standard          
    #PBS -l mppwidth=32
    #PBS -l walltime=8:00:00 
    #PBS -j oe
                   
    cd $PBS_O_WORKDIR
    
    aprun -n 32 ./myprog
    
    
    Select the shell.
    The -q option requests the queue to run in.
    Requests 32 cores.
    Requests that the job be allowed to run 8 hours.
    The -j option joins output and error files.
      
    Change to the directory to initial working directory.
      
    Run the mpi program.
    
    
    
    A copy of this script is also available on Pingo at $SAMPLES_HOME/jobSubmission
    
    
    Open Script in New Window

    PBS Script- OpenMP Example

    ScriptNotes
    #!/bin/bash
    #PBS -q standard          
    #PBS -l mppwidth=8
    #PBS -l walltime=8:00:00 
    #PBS -j oe
                   
    cd $PBS_O_WORKDIR
    export OMP_NUM_THREADS=8
    
    aprun -n 1 -d 8 ./myprog
    
    Select the shell.
    The -q option requests the queue to run in.
    Requests 8 cores.
    Requests that the job be allowed to run 8 hours.
    The -j option joins output and error files.
     
    Change to the directory to initial working directory.
    Set the number of OpenMP threads to use.  
    
    Run the OpenMP program using aprun.
    
    
    
    A copy of this script is also available on Pingo at $SAMPLES_HOME/jobSubmission
    
    
    Open Script in New Window

    PBS Script- Data Staging Script

    ScriptNotes
    #!/bin/bash
    #PBS -q data          
    #PBS -l walltime=4:00:00 
    #PBS -j oe
                    
    cd $PBS_O_WORKDIR
     
    find $ARCHIVE_HOME/mydataset/ -offline | batch_stage -i
     
    cp -r $ARCHIVE_HOME/mydataset/* . || exit 1
     
    qsub mpi_job.pbs
    
    Select the shell.
    The -q option requests the queue to run in.
    Requests that the job be allowed to run 4 hours.
    The -j option joins output and error files.
      
    Change to the directory to initial working directory.
     
    Bring offline files from tape back online to disk.
     
    Copy files from long term storage to current working directory.
     
    Submit compute job
    
    
    
    A copy of this script is also available on Pingo at $SAMPLES_HOME/jobSubmission
    
    Please note that $ARCHIVE_HOME is not available from the compute nodes. This example script accesses files from $ARCHIVE_HOME by running on the login node.
    Open Script in New Window
  3. Submit a batch script to PBS, using qsub:

    The script file can be given any name. If the above sample were named myprog.pbs, it would be submitted for processing with the command

    qsub myprog.pbs
  4. Monitor the job

    To check the status of the submitted PBS job, execute this command:

    qstat -a
  5. Delete the job

    Given its PBS identification number (returned when you "qsub" the job and shown in "qstat -a" output), you can delete the job from the batch system with this command:

    qdel <PBS-ID>
  6. Examine Output

    When the job completes, PBS will save the stdout and stderr from the job to a file in the directory from which it was submitted. These files will be named using the script name and PBS identification number. For example,

    myprog.pbs.o<PBS-ID>

PBS Queues

List all available queues with the command qstat -Q. List details on any queue, for instance, "standard," with the command qstat -Qf standard. You may also read news queues for information on all queues, but note that the most current information is always available using the qstat commands.

Currently, these are the available queues:

Queue Purpose
background
For projects with little or no remaining allocation. This queue has the lowest priority, however projects running jobs in this queue do not have allocation deducted.
challenge
For projects with challenge status. (Limited Access)
data
Provides access to long term storage (i.e. $ARCHIVE_HOME)
debug
Quick turn-around queue for debugging work.
high
(Limited Access)
special
For jobs which do not fit into the normal queue limits for the system. (Limited access)
standard
General use by all allocated users.
urgent
(Limited Access)

 

Charging Allocation Hours to an Alternate Project

Users with membership in more than one project should select which project to charge allocation hours towards. The directive for selecting which project is the "-W group_list" PBS option. If the "-W group_list" option is not specified within a user's PBS script, the account charged will default to the user's primary group (i.e. project).

The following is an example "-W group_list" statement.

   #PBS -W group_list=proja

The "-W group_list" option can also be used on the command line, e.g.

   pingo % qsub -W group_list=proja script.bat

Each project has a corresponding UNIX group, therefore the "groups" command will show all projects (or groups) of which you are a member.

   pingo % groups
   proja projb

Without the "-W group_list" PBS option, allocation hours would be charged to proja by default, but could be charged to projb by setting "-W group_list=projb" in the PBS script.

 

Online Manuals and Documentation

External Links

News Items

Some documentation is available directly on pingo via the "news" command. This documentation cover a variety of topics, including availability of software, sample job scripts, and more.

Many news items available on the system through "news <topic>" are also available on our website:

 

More Information

General information on ARSC and its other resources is available in a number of forms

 

Arctic Region Supercomputing Center
PO Box 756020, Fairbanks, AK 99775 | voice: 907-450-8600 | email:

home | search | about | support | news | science | resources