ARSC system news for fish
Menu to filter items by type
Contents for fish
Machines: linuxws pacman bigdipper fish
How to update your LDAP password ======================================== User authentication and login to ARSC systems uses University of Alaska (UA) passwords and follows the LDAP protocol to connect to the University's Enterprise Directory. Because of this, users must change their passwords using the UA Enterprise tools. While logging into ARSC systems, if you see the following message, please change your password on https://elmo.alaska.edu Password: Your are required to change your LDAP password immediately. Enter login(LDAP) password: Attempts to change your password on ARSC systems will fail. Please contact the ARSC Help Desk if you are unable to log into https://elmo.alaska.edu to change your login password.
Programming Environment on Fish ======================================== Compiler and MPI Library versions on fish are controlled via the modules package. All accounts load the "PrgEnv-pgi" module by default. This module adds the PGI compilers to the PATH. Should you experience problems with a compiler or library in many cases a new programming environment may be available. Below is a description of available Programming Environments: Module Name Description =============== ============================================== PrgEnv-pgi Programming environment using PGI compilers and MPI stack (default version). PrgEnv-cray Programming environment using Cray compilers and MPI stack. PrgEnv-gcc Programming environment using GNU compilers and MPI stack. Additionally multiple compiler versions may also be available. Module Name Description =============== ============================================== pgi The PGI compiler and related tools cce The Cray Compiler Environment and related tools gcc The GNU Compiler Collection and related tools. To list the available version of a package use the "module avail pkg" command: % module avail pgi -------------------------- /opt/modulefiles -------------------------- pgi/12.3.0(default) pgi/12.4.0 Programming Environment Changes ================================ The following is a table of recent additions and changes to the Programming Environment on fish. Date Module Name Description ---------- --------------------- -----------------------------------
Using the Modules Package ========================= The modules package is used to prepare the environment for various applications before they are run. Loading a module will set the environment variables required for a program to execute properly. Conversely, unloading a module will unset all environment variables that had been previously set. This functionality is ideal for switching between different versions of the same application, keeping differences in file paths transparent to the user. Sourcing the Module Init Files --------------------------------------------------------------------- For some jobs, it may be necessary to source these files, as they may not be automatically sourced as with login shells. Before the modules package can be used, its init file must first be sourced. To do this using tcsh or csh, type: source /opt/modules/default/init/tcsh To do this using bash, type source /opt/modules/default/init/bash To do this using ksh, type: source /opt/modules/default/init/ksh Once the modules init file has been sourced, the following commands become available: Command Purpose --------------------------------------------------------------------- module avail - list all available modules module load <pkg> - load a module file from environment module unload <pkg> - unload a module file from environment module list - display modules currently loaded module switch <old> <new> - replace module <old> with module <new> module purge - unload all modules (not recommended on fish)
Fish Queues ======================================== The queue configuration is as described below. It is subject to review and further updates. Login Nodes Use: ================= Login nodes are a shared resource and are not intended for computationally or memory intensive work. Processes using more than 30 minutes of CPU time on login nodes may be killed by ARSC without warning. Please use compute nodes for computationally or memory intensive work. Queues: =============== Specify one of the following queues in your Torque/Moab qsub script (e.g., "#PBS -q standard"): Queue Name Purpose of queue ------------- ------------------------------ standard Runs on 12 core nodes without GPUs standard_long Runs longer jobs on 12 core nodes without GPUs. gpu Runs on 16 core nodes with 1- NVIDIA X2090 GPU per node. gpu_long Runs longer jobs on 16 core nodes with 1- NVIDIA X2090 GPU per node. debug Quick turn around debug queue. Runs on GPU nodes. debug_cpu Quick turn around debug queue. Runs on 12 core nodes. transfer For data transfer to and from $ARCHIVE. NOTE: transfer queue is not yet functional. See 'qstat -q' for a complete list of system queues. Note, some queues are not available for general use. Maximum Walltimes: =================== The maximum allowed walltime for a job is dependent on the number of processors requested. The table below describes maximum walltimes for each queue. Queue Min Max Max Nodes Nodes Walltime Notes --------------- ----- ----- --------- ------------ standard 1 32 24:00:00 standard_long 1 2 168:00:00 12 nodes are available to this queue. gpu 1 32 24:00:00 gpu_long 1 2 168:00:00 12 nodes are available to this queue. debug 1 2 1:00:00 Runs on GPU nodes debug_cpu 1 2 1:00:00 Runs on 12 core nodes (no GPU) transfer 1 1 24:00:00 Not currently functioning correctly. NOTES: * August 11, 2012 - transfer queue is not yet functional. * October, 16 2012 - debug queues and long queues were added to fish. PBS Commands: ============= Below is a list of common PBS commands. Additional information is available in the man pages for each command. Command Purpose -------------- ----------------------------------------- qsub submit jobs to a queue qdel delete a job from the queue qsig send a signal to a running job Running a Job: ============== To run a batch job, create a qsub script which, in addition to running your commands, specifies the processor resources and time required. Submit the job to PBS with the following command. (For more PBS directives, type "man qsub".) qsub <script file> Sample PBS scripts: -------------- ## Beginning of MPI Example Script ############ #!/bin/bash #PBS -q standard #PBS -l walltime=24:00:00 #PBS -l nodes=4:ppn=12 #PBS -j oe cd $PBS_O_WORKDIR NP=$(( $PBS_NUM_NODES * $PBS_NUM_PPN )) aprun -n $NP ./myprog ## Beginning of OpenMP Example Script ############ #!/bin/bash #PBS -q standard #PBS -l nodes=1:ppn=12 #PBS -l walltime=8:00:00 #PBS -j oe cd $PBS_O_WORKDIR export OMP_NUM_THREADS=16 aprun -d $OMP_NUM_THREADS ./myprog #### End of Sample Script ################## NOTE: jobs using the "standard" and "gpu" queues must run compute and memory intensive applications using the "aprun" or "ccmrun" command. Jobs failing to use "aprun" or "ccmrun" may be killed without warning. Resource Limits: ================== The only resource limits users should specify are walltimes and nodes, ppn limits. The "nodes" statement requests a job be allocated a number of chunks with the given "ppn" size. Tracking Your Job: ================== To see which jobs are queued and/or running, execute this command: qstat -a Current Queue Limits: ===================== Queue limits are subject to change and this news item is not always updated immediately. For a current list of all queues, execute: qstat -Q For all limits on a particular queue: qstat -Q -f <queue-name> Maintenance ============ Scheduled maintenance activities on Fish use the Reservation functionality of Torque/Moab to reserve all available nodes on the system. This reservation keeps Torque/Moab from scheduling jobs which would still be running during maintenance. This allows the queues to be left running until maintenance. Because walltime is used to determine whether or not a job will complete prior to maintenance, using a shorter walltime in your job script may allow your job to begin running sooner. e.g. If maintenance begins at 10AM and it is currently 8AM, jobs specifying walltimes of 2 hours or less will start if there are available nodes. CPU Usage ========== Only one job may run per node for most queues on fish (i.e. jobs may not share nodes). If your job uses fewer than the number of available processors on a node the job will be charged for all processors on the node unless you use the "shared" queue Utilization for all other queues is charged for the entire node regardless of the number of tasks using that node: * standard - 12 CPU hours per node per hour * standard_long - 12 CPU hours per node per hour * gpu - 16 CPU hours per node per hour * gpu_long - 16 CPU hours per node per hour * debug - 16 CPU hours per node per hour * debug_cpu - 12 CPU hours per node per hour
Sample Code Repository ======================== Filename: INDEX.txt Description: This file contains the name,location, and brief explanation of "samples" included in this Sample Code Repository. There are several subdirectories within this code repository containing frequently-used procedures, routines, scripts, and code used on this allocated system, pacman. This sample code repository can be accessed from pacman by changing directories to $SAMPLES_HOME, or changing directories to the following location: pacman% /usr/local/pkg/samples. This particular file can be viewed from the internet at: http://www.arsc.edu/support/news/systemnews/fishnews.xml#samples_home Contents: applications jobSubmission libraries ***************************************************************************** Directory: applications Description: This directory contains sample PBS batch scripts for applications installed on fish. Contents: abaqus ***************************************************************************** Directory: jobSubmission Description: This directory contains sample PBS batch scripts and helpful commands for monitoring job progress. Examples include options to submit a jobs such as declaring which group membership you belong to (for allocation accounting), how to request a particular software license, etc. Contents: MPI_OpenMP_scripts MPI_scripts OpenMP_scripts ***************************************************************************** Directory: libraries Description: This directory contains examples of common libraries and programming paradigms. Contents: cuda openacc scalapack
Fish Software ======================================== python: python version 2.7.2 (2013-02-26) This version includes various popular add-ons including numpy, scipy, matplotlib, basemap and more. module load python/2.7.2 abaqus: abaqus version 6.11 (2012-12-26) Version 6.12 of abaqus is available via modules: module load abaqus/6.11 matlab: matlab version R2012b (2012-12-26) Matlab R2012a is now available to UAF users via modules: module load matlab/R2012b matlab: matlab version R2012a (2012-12-07) Matlab R2012a is now available to UAF users via modules: module load matlab/R2012a comsol: comsol version 4.3a (2012-11-30) This version of comsol is now available to UAF users via modules: module load comsol/4.3a idl/envi: idl-8.2 and envi 5.0 (2012-10-31) IDL version 8.2 and ENVI version 5.0 are now available on fish via modules: module load idl/8.2
Fish Storage ======================================== The environment variables listed below represent paths. They are expanded to their actual value by the shell, and can be used in commands (i.e. ls $ARCHIVE). From the command prompt, the expanded path and the variable are usually interchangeable. However, in non-shell settings like ftp, you will need to use the actual path, not the variable. In the listing below, $USER is an environment variable holding your ARSC username. Filesystem Purpose Default Allowed Use ------------------ ------------------------ ----------- $HOME dotfiles, sm. files 8 GB /u1/uaf/$USER $CENTER do work here 750 GB /center/w/$USER $ARCHIVE long-term remote storage no quota /archive/$HOME -- $HOME: Home directories are intended primarily for basic account info (e.g. dotfiles). Please use $CENTER (your /center/w/$USER directory) for compiles, inputs, outputs, etc. Files in the $HOME are backed up periodically. Quotas are enabled on this filesystem. Use the command "show_storage" to show your current $HOME use. -- $ARCHIVE: Long-term backed up storage is only available in your $ARCHIVE directory. As this is an NFS-mounted file system from bigdipper, files will be temporarily unavailable when bigdipper goes down for maintenance. I/O performance in this directory will be much slower. Compiles in $ARCHIVE are not recommended. $ARCHIVE is not available from compute nodes. -- $CENTER: Short term, not backed up, purged file system. This is a large fast local disk. The $CENTER file system is available to all nodes on fish. This is the recommended location for input, output, and temporary files. The $ARCHIVE file system is available for long term storage. NOTE: Usage of $CENTER and $HOME can be monitored with the "show_storage" command. Long Term Storage Use ====================== batch_stage ------------ Files saved in $ARCHIVE can potentially be offline (i.e. not on disk). When accessing multiple files in $ARCHIVE, the "batch_stage" can significantly speed the process of retrieving files from tape. e.g. cd $ARCHIVE/somedirectory find . -type f | batch_stage -i See "man batch_stage" for additional examples. /usr/bin/rcp ------------- While $ARCHIVE is available as an NFS file system, higher transfer rates can be obtained by using the "rcp" command for large transfers to and from $ARCHIVE. The non-kerberosized version of rsh may be used to transfer files to $ARCHIVE using the "bigdip-s" hostname. e.g. /usr/bin/rcp results.tar "bigdip-s:$ARCHIVE" NOTE: The full path to rcp (i.e. /usr/bin/rcp) must be used to make transfers without a ticket. $CENTER file purging --------------------- File purging on the $CENTER directory is not currently enabled. Data not in use should be migrated to $ARCHIVE. Use may be monitored with the "show_storage" command. See http://www.arsc.edu/support/howtos/storage for more information on storage best practices at ARSC.
Totalview on Fish ======================================== Totalview is available on fish and can be used to debug MPI, OpenMP and serial applications. Generally debugging should occur on compute nodes through the use of an interactive PBS job. Totalview may be run on login nodes to inspect core files. The instructions below are prefaced by a prompt corresponding to a system name where the command should be run. * fish % corresponds to a fish login node (e.g. fish1 or fish2 ). * fish-compute% corresponds to a fish compute node. * local% corresponds to the name of your local workstation. I. MPI Code Compilation MPI applications should be compiled with "-g" in order to get the best possible debugging information. II. Starting an interactive job with X11 forwarding enabled. A) Log into fish1 or fish2 with X11 forwarding enabled. local% ssh -X -Y email@example.com B) Start an interactive PBS job requesting the number of processors required for your job. * The "standard" queue may be used to debug application requiring up to 12 MPI tasks per node. pacman % qsub -q standard -l nodes=2:ppn=12 -X -I The "-X" qsub option enables X11 forwarding and the "-I" option requests that the job be interactive. When there are a sufficient number of nodes available, torque will start the job. III. Running totalview. For MPI applications, start the application using "aprun" fish-compute % module load xt-totalview # serial example fish-compute % totalview ./a.out # MPI Example fish-compute % totalview aprun -a -n 24 ./a.out # Command line args after "-a" are passed to the command being run. # In this case "aprun". Additional hints: 1) Code should be compiled with -g. This makes it possible for totalview to refer back to the source code. Code compiled without -g will appear as assembly and you will not have meaningful access to variable values. 2) You can view core files with totalview by passing the executable and core file to totalview. A core file from an MPI application can be viewed without using aprun. pacman % totalview ./a.out core.1234