Introduction to Using the Cray XK6m, Fish

 

 

Introduction

The Arctic Region Supercomputing Center (ARSC) operates a Cray XK6m  ( fish ) running the Cray Linux Environment version 4 (CLE4).

Fish is a resource dedicated to University of Alaska affiliated academic users performing non-commercial, scientific research of Arctic interest.

 

"Fish"

The Cray XK6m at ARSC consists of the following hardware:

  • 2 Login Nodes: "fish1.arsc.edu" and "fish2.arsc.edu"
    • One six core, 2.6 GHz AMD Istanbul processor per node
    • 16 GB of memory per node. 
  • 48 GPU Enabled Sixteen Core Compute Nodes
    • One sixteen core, 2.1 GHz AMD Interlagos Processor per node
    • 64 GB of memory per node (4 GB per core)
    • One nVIDIA Tesla X2090 GPU accelerator with 6GB RDDR5 memory
  • 32 Twelve Core Compute Nodes
    • Two six core, 2.6 GHz AMD Istanbul processors per node
    • 32 GB of memory per node (2.5 GB per core)
  • Cray Proprietary Gemini Interconnect
  • 20 TB Lustre $HOME file system

 

Operating System / Shells

The operating system on fish is the Cray Linux Environment version 4 (CLE4).  Under the CLE4 environment, the fish login nodes run 64-bit SUSE Linux.   The fish compute nodes run Compute Node Linux (CNL) - a lightweight operating system custom designed to improve resource availability and scalability for user jobs.  

The following shells are available on fish:

  • sh (Bourne Shell)
  • ksh (Korn Shell)
  • bash (Bourne-Again Shell) default
  • csh (C Shell)
  • tcsh (Tenex C Shell)

If you would like to change your default login shell, please contact User Support .

 

System News, Status and RSS Feeds

System news is available via the news command when logged on to fish . For example, the command "news queues" gives news about the current queue configuration. System status and public news items are available on the web.

 

Storage

Several environment variables are defined to point to the available storage on fish.

Name Notes Default Quota 1 Purge Policy Back Up Policy
$HOME $HOME directories are intended for locally compiled executables and libraries,dot files, and small data sets. ARSC recommends that you avoid accessing $HOME in parallel jobs. 8GB 3 not purged backed up
$CENTER $CENTER is a high performance Lustre file system available from all nodes on fish. This is the preferred location to run large parallel jobs. 750GB 3 30 day purge policy 4 not backed up
$ARCHIVE $ARCHIVE directories are intended for long term storage of executables, large datasets, and compressed backups of important data. $ARCHIVE is only accessible by the fish login nodes and transfer queue. no quota not purged backed up
/center/d /center/d is an alternate directory used to store large group data sets. This directory is available upon request. Based on request and availability.3 not purged not backed up

NOTES:

  1. Individual user's are responsible to monitor their data usage. If usage exceeds the set quota further writes may fail.
  2. Requests for increased quotas should be sent to User Support.
  3. The "show_storage" command will display current usage for $HOME, $CENTER and /center/d directories.
  4. The $CENTER filesystem is not currently being purged on a daily basis, however it may be enabled at a future date.   Data residing in $CENTER  for inactive accounts may be purged without warning.
  5. The "show_storage" command is run automatically on login to warn when disk usage is approaching set quotas.   This automatic check can be disabled by running "touch $HOME/.hush_storage"

See http://www.arsc.edu/support/howtos/storage and for more information on the storage environment at ARSC.

 

Available Software

Fish third party software configuration is in progress. Thank you for your patience while we work to complete common package installations.

Open source and commercial applications have been installed on the system in /usr/local/pkg.  In most cases, the most recent versions of these packages are easily accessible via modules.  Additional packages may be installed upon request.  

 

Sample Code Repository ($SAMPLES_HOME)

The $SAMPLES_HOME directory on fish contains a number of examples including, but not limited to:

  • Torque/Moab scripts for MPI, OpenMP and Hybrid applications
  • Examples for Installed Commercial Software Packages and other installed applications.
  • Examples using common libraries

A description of items available in the Sample Code Repository is available on the web and in the file $SAMPLES_HOME/INDEX.txt on fish, however you must login to fish to access the examples.

% less $SAMPLES_HOME/INDEX.txt

If you would like to see additional samples or would like to contribute an example, please contact User Support .

 

User Installed Software

The following directory on fish is intended as a place for users to build third-party software packages and provide them for others to use:

/usr/local/unsupported

The purpose of this directory is to share your efforts with others. Packages built for personal use only should be installed in your $HOME directory. To request a subdirectory for your project, please contact the ARSC Help Desk with the following information:

  • the name of your requested subdirectory, which can be your project's name (e.g., UAFCLMIT) or the type of software you intend to install in the directory (e.g., "ClimateModels")
  • a general description of what you intend to install
  • a rough estimate of the amount of storage you will need (e.g., 100 MB)
  • the member of your group who will maintain this directory
    • whether this person would like their email address listed for other users to contact them

An entry for your software directory will be placed in the /usr/local/unsupported/INDEX file. Entries take the following form:

DIRECTORY   DESCRIPTION                                 CONTACT

---------------      ----------------------------------------------   -----------------------
UAFCLMIT      Climate Models and example runs    johndoe@alaska.edu

Commercial & Export Controlled Software Policies

Please do not install any commercial or export controlled software in the /usr/local/unsupported directory without explicit approval from the ARSC Help Desk.

File Permission Policies

You have the option of sharing the packages you install with either:

  1. the other members of your project
  2. all other users on fish

Access is controlled via Linux groups. For example:

The following command will grant read/execute access to members of your group:

chmod -Rh g+rX /usr/local/unsupported/UAFCLMIT/fftw-3.2.2.pgi

The following command will grant read/execute access to all users on fish:

chmod -Rh go+rX /usr/local/unsupported/UAFCLMIT/fftw-3.2.2.pgi

While group-write access is allowed for these directories, please remember that world-write access is never permitted. For more information, please refer to the "Non-Home Directory/File Permissions" policies described on the following web page:

http://www.arsc.edu/support/policy/secpolicy/#other

Storage Policies

Only the files belonging to a package's installation should be included in the /usr/local/unsupported directory. Input/output files, PBS scripts, etc., should be excluded from the installation directory unless they came with the package itself.

Due to the way the /usr/local/unsupported directory is configured, files installed in this directory will count toward your $HOME file system quota. If necessary, please send a request for an increased $HOME quota to the ARSC Help Desk.

Daily backups are performed on the /usr/local/unsupported directory.

Create a README File

If you plan to share your package installations with all other users on the system, we recommend that you create a README file in your group's root directory. E.g.,

/usr/local/unsupported/ClimateModels/README

In this file, you may choose to include descriptions of each package, how they were built, and contact information.

Library Naming Scheme Recommendation

If you are building a library package from source, we recommend that you add a suffix to your installation directory indicating which compiler suite was used to build the library. This will help other users determine whether the installation is compatible with the compiler they are using.

The following compiler suite suffix conventions are recommended:

Compiler Suite Suffix
Portland Group .pgi
GNU .gnu

For example, the directory structure for FFTW 3.2.2, built with the Portland Group compilers, might look like this:

/usr/local/unsupported/UAFCLMIT/fftw-3.2.2.pgi

Setting Up Module Files

If you would like your package installations to be widely used on fish, you may want to create a module to set up the proper environment to run the package. Please refer to the files in the following directory as examples of how to create a module:

/usr/local/pkg/modulefiles

To provide a module for your package, put the module file in a "modulefiles" subdirectory. E.g.:

/usr/local/unsupported/ClimateModels/fftw-3.2.2.pgi/modulefiles/fftw-3.2.2

Then, contact the ARSC Help Desk with the location of this module for it to be made available to all other users of the system via the "module" command.

More information on how to create module files is available via the "man modulefile" page.

 

Parallel Programming Models

Several types of parallelism can be employed on fish using different programming models and methods. These are listed in the table below.

Hardware Level Model Description
Shared-memory node AutoAutomatic shared-memory parallel executables can be compiled by and linked with the-Mconcur=option to pgf90 pgcc or pgCC. Since only a subset of loops can generally be parallelized this way OpenMP directives can further improve performance.
Shared-memory node OpenMPThis is a form of explicit parallel programming in which the programmer inserts directives into the program to spawn multiple shared-memory threads, typically at the loop level. It iscommon, portable, and relatively easy. On the downside, it requires shared memory which limitsscaling to the number of processors on a node. To activate OpenMP directives in your code, usethe -mp option to pgf90 pgcc or pgCCOpenMP can be used in conjunction with autoparallelization.
Shared-memory node pthreads The system supports POSIX threads.
Distributed memory system MPI

This is the most common and portable method for parallelizing codes for scalable distributed memory systems. MPI is a library of subroutines for message passing, collective operations, andother forms of inter-processor communication. The programmer is responsible for implementingdata distribution, synchronization, and reassembly of results using explicit MPI calls.

Using MPI, the programmer can largely ignore the physical organization of processors into nodes and simply treat the system as a collection of independent processors.

Node Level GPU GPU Fish supports several programming models used to interact with GPU devices. The PGI and Cray compilers support OpenACC directives for interacting with GPU devices. The PGI compiler supports CUDA Fortran and CUDA C. The NVidia nvcc compiler is also available
Distributed memory system PGAS The Cray Compiler Environment supports the Partitioned Global Address Space (PGAS) languages- UPC and CoArray Fortran.

 

Programming Environment

The default environment for new accounts loads the PGI compilers with MPI into the PATH via the PrgEnv-pgi module. Should you wish to use the Cray or GNU MPI compilers instead, load the appropriate module. 

Item Portland Group Cray Compiler Environment GNU Compiler
Fortran 77 ftn ftn ftn
Fortran 90/95 ftn ftn ftn
C compiler cc cc cc
C++ compiler CC CC CC
Debuggers lgdb lgdb lgdb
Performance Analysis CrayPat CrayPat CrayPat
Default MPI module (*) PrgEnv-pgi PrgEnv-cray PrgEnv-gnu
Batch queueing system Torque/Moab

All compilers listed above include MPI support when the "xt-mpich2" module is loaded.   The "xt-mpich2" module is loaded by default for accounts on fish.

 

Modules

Fish has the modules package installed. This tool allows a user to quickly and easily switch between different versions of a package (e.g. compilers). The module package sets common environment variables used by applications such as PATH , MANPATH , LD_LIBRARY_PATH , etc. The PrgEnv-pgi module is loaded by default in the shell skeleton files for new accounts. The PrgEnv-pgi module loads the PGI compilers and MPI compiler wrappers into your PATH.   Alternately, the PrgEnv-gnu and PrgEnv-cray modules are available for users who prefer to use the MPI compiler wrappers with the GNU and Cray compilers.

Command Example Use Purpose
module avail module avail lists all available modules for the system.
module load pkg module load PrgEnv-pgi loads a module file from the environment
module unload pkg module unload PrgEnv-pgi unloads a module file from the environment
module list module list displays the modules which are currently loaded.
module switch old new module switch PrgEnv-pgi PrgEnv-gnu replaces the module old with module new in the environment
module purge module purge unload all module settings, restoring the environment to the state before anymodules were loaded.

See "man module" for more information on the module command. See news modules for more information on using modules at ARSC.

Typically modules only need to be loaded on login. To change the default modules, modify your .cshrc (csh/tcsh users) or .kshrc (ksh) or .bashrc (sh/bash). Note that only one PrgEnv should be loaded at any given time.

e.g.

####
# Run common module commands
#
# load the PrgEnv-pgi module 
# (uses PGI compilers for MPI environment) 
module load PrgEnv-pgi 

# alternately load the PrgEnv-gnu module 
# (uses GNU compilers for the MPI environment) 
#module switch PrgEnv-pgi PrgEnv-gnu 
# list currently loaded modules 

module list

 

PGI - Compiling and Linking Fortran Programs

The PGI Fortran 90 compiler is:

ftn

The Fortran 90 compiler with MPI support is:

ftn

NOTE: the PrgEnv-pgi module must be loaded to use this compiler.

Here's a sample compiler command showing several common options:

fish1% ftn -O3 -g -0 reduce reduce.f90

see "man pgf90" for additional information.

Portland Group Fortran 90 Compiler Options

Option Description
-c Generate intermediate object file but does not attempt to link.
-g Adds information for debugging to the object file and/or executable.
-I<directory> Tells the preprocessor to search in directory for include or module files.
-L<directory> Tells the linker to search in directory for libraries.
-r8 Promotes REALs from the default size of 4 bytes to 8 bytes.
-i8 Promotes INTEGERs from the default size of 4 bytes to 8 bytes.
-O3 Higher level of optimization than -O2 (the default optimization level).
-fast Higher optimization level than -O3
-Mipa Tells the compiler to perform interprocedural analysis. Can be very time consuming toperform. This flag should also be used in both compilation and linking steps.
-Mconcur Enables autoparallelization. Additional options can be used with -Mconcur to provide morefine-grained control of autoparallelization, see man pgf90 for details.
-Minfo Instructs the compiler to report optimizations that are made.
-Mneginfo Instructs the compiler to report optimizations that are not made.
-mp Enables parallelization via OpenMP directives.

Many other compiler options are available. Check the online manuals and if you have difficulty finding specific information, or contact User Support for additional help.

 

PGI - Compiling and Linking C/C++ Programs

The PGI C compiler for serial and MPI applications is:

cc

The PGI C++ compiler for serial and MPI applications is:

CC

Here are sample compiler commands showing common options:

fish1 % cc -O3 -o prog prog.c

fish1 % CC -O3 -o prog prog.cpp

See "man pgcc" or "man pgCC" for more information.:

Option Description
-c Generate intermediate object file but does not attempt to link.
-g Adds information for debugging to the object file and/or executable.
-I<directory> Tells the preprocessor to search in directory for include or module files.
-L<directory> Tells the linker to search in directory for libraries.
-O3 Higher level of optimization than -O2 (the default optimization level).
-fast Higher level optimization (default is -O2). This flag should be used in both compilation andlinking steps.
-Mipa Tells the compiler to perform interprocedural analysis. This option can be very timeconsuming to perform. This flag should be used in both compilation and linking steps.
-Mconcur Enables autoparallelization. Additional options can be used with -Mconcur to provide morefine-grained control of autoparallelization, see man pgcc or man pgCC for details.
-Minfo Instructs the compiler to report optimizations that are made.
-Mneginfo Instructs the compiler to report optimizations that are not made.
-mp Enables parallelization via OpenMP directives.

 

 

 

 

Running Batch Jobs

All production work on fish is run through the Torque/Moab batch scheduler.  Jobs run through the batch system execute the programs on the compute nodes. The scheduler assigns a unique jobid to each individual job submission, conveniently saves stdout/stderr to a file(s) for each run, and will allow jobs to run after the user logs off.

A batch job is a shell script prefaced by a statement of resource requirements and instructions which Torque/Moab will use to manage the job.

Torque/Moab scripts are submitted for processing with the command, qsub.

As outlined below, five steps are common in the most basic batch processing:

1. Create a batch script:

Normally, users embed all Torque/Moab request options within the batch script.

In a batch script comforming to Torque/Moab syntax, all Torque/Moab options must precede all shell commands. Each line containing a Torque/Moab option must commence with the character string, " #PBS followed by one or more spaces, followed by the option.

Torque/Moab scripts begin execution from your home directory and thus, the first executable shell script command is a usually a " cd " to the work or run directory. The environment variable PBS_O_WORKDIR is set to the directory from which the script was submitted.

 

Torque Script: MPI Example requesting 4 CPU-only nodes (all 12 processors on each node)

Script Notes
#!/bin/bash
#PBS -q standard

#PBS -l walltime=8:00:00 

#PBS -l nodes=4:ppn=12

#PBS -j oe

cd $PBS_O_WORKDIR

NP=$(( $PBS_NUM_NODES * $PBS_NUM_PPN ))
aprun -n $NP ./myprog
Select the shell.
The -q option requests to run the job 
in the standard queue
Walltime requests the job be allowed 
to run for a maximum of 8 hours.  
Requests 4 nodes and all 12 processes on 
each node.  
The -j option joins output and error messages 
into one file.
Change to the initial working directory 
($PBS_O_WORKDIR).
Determine total number of processors to utilize.
Execute the program by calling aprun and passing 
the executable name.

 

Torque Script: OpenMP Example requesting 1 CPU-only node (all 12 processors on the node)

Script Notes
#!/bin/bash
#PBS -q standard

#PBS -l walltime=8:00:00 

#PBS -l nodes=1:ppn=12

#PBS -j oe

cd $PBS_O_WORKDIR
export OMP_NUM_THREADS=12

aprun -n 1 -d 12 ./myprog
Select the shell.
The -q option requests to run the job 
in the standard queue
Walltime requests the job be allowed 
to run for a maximum of 8 hours.  
Requests 1 node, all 12 processes on 
the node.  
The -j option joins output and error messages 
into one file.
Change to the initial working directory ($PBS_O_WORKDIR).
Set the environment variable for the max number of threads.

Execute the program by calling aprun and passing 
the executable name.

 

Torque Script: MPI and OpenMP Example using 4 GPU nodes

Script Notes
#!/bin/bash
#PBS -q gpu

#PBS -l walltime=8:00:00 

#PBS -l nodes=4:ppn=16

#PBS -j oe

cd $PBS_O_WORKDIR

export OMP_NUM_THREADS=$PBS_NUM_PPN
NP=$PBS_NUM_NODES

aprun -n $NP -d ${OMP_NUM_THREADS} ./myprog
Select the shell.
The -q option requests to run the job 
in the standard queue
Walltime requests the job be allowed 
to run for a maximum of 8 hours.  
Requests 4 nodes and all 12 processes on 
each node.  
The -j option joins output and error messages 
into one file.
Change to the initial working directory 
($PBS_O_WORKDIR).
Set the number of max threads for OpenMP. 
Determine total number of MPI tasks to utilize 
(one per node when accessing GPU accelerators).
Execute the program by calling aprun and passing 
the executable name.

 

2. Submit a batch script to Moab/Torque, using qsub

The script file can be given any name, and should be submitted with the "qsub" command.  For example, if the script is named myprog.pbs, it would be submitted for processing with the following:

qsub myprog.pbs

3. Monitor the job

To check the status of the submitted Torque/Moab job, execute this command:

qstat -a

4. Delete the job

Given its Torque/Moab identification number (returned when you " qsub the job and shown in " qstat -a output), you can delete the job from the batch system with this command:

qdel <PBS-ID>

5. Examine Output

When the job completes, Torque/Moab will save the stdout and stderr from the job to a file in the directory from which it was submitted. These files will be named using the script name and Torque/Moab identification number. For example,

myprog.pbs.o<PBS-ID>

Torque/Moab Queues

List all available queues with the command qstat -Q . List details on any queue, for instance, "standard " with the command qstat -Qf standard . You may also read news queues for information on all queues, but note that the most current information is always available using the qstat commands.

 

 

Charging Allocation Hours to an Alternate Project

Users with membership in more than one project should select which project to charge allocation hours towards. The directive for selecting which project is the "-W group_list" Torque/Moab option. If the "-W group_list" option is not specified within a user's Torque/Moab script, the account charged will default to the user's primary group (i.e. project).

The following is an example "-W group_list" statement.

#PBS -W group_list=proja

The " -W group_list"   option can also be used on the command line, e.g.

fish1 % qsub -W group_list=proja script.bat

Each project has a corresponding UNIX group, therefore the "groups" command will show all projects (or groups) of which you are a member.

fish1 % groups proja projb

Without the "-W group_list" Torque/Moab option, allocation hours would be charged to proja by default, but could be charged to projb by setting " -W group_list=projb"   in the Torque/Moab script.

 

 

Online Manuals and Documentation

External Links

News Items

Some documentation is available directly on the system via the "news" command. This documentation covers a variety of topics, including availability of software, sample job scripts, and more.

 

More Information

General information on ARSC and its other resources is available in a number of forms

  • Online, on the web:

    The ARSC web site covers policies, research, events, and technical information.

  • The ARSC HPC Users' Newsletter

    This bi-weekly newsletter is designed to inform users of HPC systems.

  • Training Classes

    The ARSC training schedule is maintained on our web site and you are always welcome to schedule a one-on-one work session with ARSC specialists or consultants.

  • Consultants

    Contact User Support for more information.