Job Submission on Pacman

Running Interactive Jobs

You are encouraged to submit jobs to the Torque/Moab batch system, but spawning interactive jobs is also available. To use the batch system for spawning an interactive job (with an X11 support enabled), type the following command:

qsub -q debug -l nodes=1:ppn=16 -X -I

Standard error and standard output are displayed within the terminal, redirected to a file, or piped to another command using appropriate Unix shell syntax.

After your job has started, you may then run interactive commands on the compute node(s) Torque/Moab assigned to your session.

Running Batch Jobs

All production work on pacman is run through the Torque/Moab batch scheduler . Jobs run through the batch system execute the programs on pacman's compute nodes. The scheduler assigns a unique jobid to each individual job submission, conveniently saves stdout/stderr to a file(s) for each run, and will allow jobs to run after the user logs off.

A batch job is a shell script prefaced by a statement of resource requirements and instructions which Torque/Moab will use to manage the job.

Torque/Moab scripts are submitted for processing with the command, qsub .

As outlined below, five steps are common in the most basic batch processing:

1. Create a batch script:

Normally, users embed all Torque/Moab request options within the batch script.

In a batch script comforming to Torque/Moab syntax, all Torque/Moab options must precede all shell commands. Each line containing a Torque/Moab option must commence with the character string, " #PBS followed by one or more spaces, followed by the option.

Torque/Moab scripts begin execution from your home directory and thus, the first executable shell script command is a usually a " cd " to the work or run directory. The environment variable PBS_O_WORKDIR is set to the directory from which the script was submitted.

  • Torque Script- MPI Example Requesting 4 nodes (Reserving All 16 Processors On Each Node)
  • Script Notes
    #!/bin/bash
    #PBS -q standard_16
    
    #PBS -l walltime=8:00:00 
    
    #PBS -l nodes=4:ppn=16
    
    #PBS -j oe
    
    cd $PBS_O_WORKDIR
    
    mpirun --mca mpi_paffinity_alone 1 ./myprog
    
    Select the shell.
    The -q option requests to run the job 
    in the standard queue
    Walltime requests the job be allowed 
    to run for a maximum of 8 hours.  
    Requests 4 nodes and all 16 processes on 
    each node.  
    The -j option joins output and error messages 
    into one file.
    Change to the initial working directory 
    ($PBS_O_WORKDIR).
    Execute the program by calling mpirun and passing 
    the executable name.
    
    Torque Script- MPI Example Requesting 2 Processors on Each of 6 Nodes
    Script Notes
    
    #!/bin/bash
    #PBS -q standard_16
    #PBS -l walltime=8:00:00 
    #PBS -l nodes=6:ppn=16
    #PBS -j oe
    
    cd $PBS_O_WORKDIR
    
    mpirun -np 12 -npernode 2 ./myprog
    
    
    
    Select the shell.
    
    The -q option requests to run the
    job in the standard queue Walltime
    requests the job be allowed to run
    for a maximum of 8 hours.  Reserves
    all 16 processors on each of 6
    nodes.  The -j option joins output
    and error messages into one file.
     
    Change to the initial working
    directory.
    
    Execute the program by calling
    mpirun.Note this command only runs
    2 tasks on each node despite all 16
    processors being reserved.  This is
    a technique used when individual
    processes demand very large memory
    resources. The 
    -np flag specifies the
    total number of executable copies
    run on the reserved nodes.
    
    Torque Script- OpenMP Example Requesting All 16 Processors on 1 Node
    Script Notes
    
    #!/bin/bash
    #PBS -q standard_16
    #PBS -l walltime=8:00:00 
    #PBS -l nodes=1:ppn=16
    #PBS -j oe
    
    cd $PBS_O_WORKDIR
    export OMP_NUM_THREADS=16
    
    ./myprog
    
    
    
    Select the shell.
    
    The -q option requests to run the
    job in the standard queue.
    
    Walltime requests the job be
    allowed to run for a maximum of 8
    hours.  Requests 1 node, all 16
    processes.  The -j option joins
    output and error messages into one
    file.
     
    Change to the initial working
    directory.  Assign all 16 processes
    to be available for the OpenMP
    program.
    
    Execute the program. 
    
    Torque Script- Serial Example Requesting 1 Processor on a Shared Node
    Script Notes
    
    #!/bin/bash
    #PBS -q shared
    #PBS -l walltime=8:00:00 
    #PBS -l nodes=1:ppn=1
    #PBS -j oe
    
    cd $PBS_O_WORKDIR
    
    ./myprog
    
    
    
    Select the shell.
    
    The -q option requests to run the
    job in the shared queue.  Walltime
    requests the job be allowed to run
    for a maximum of 8 hours.  Requests
    1 node and 1 processor.  The -j
    option joins output and error
    messages into one file.
     
    Change to the initial working
    directory.
    
    Execute the serial program on a
    shared node.
    
    Torque Script- Transfer Queue Example Copying Data from $ARCHIVE
    Script Notes
    
    #!/bin/bash
    #PBS -q transfer
    #PBS -l walltime=8:00:00 
    #PBS -l nodes=1:ppn=1
    #PBS -j oe
    
    cd $PBS_O_WORKDIR
    
    batch_stage $ARCHIVE/results.tar
    /usr/bin/rcp "bigdip-s:$ARCHIVE/results.tar" $CENTER || exit 1
    
    
    qsub mpi_job.pbs
    
    
    
    Select the shell.
    
    The -q option requests to run the
    job in the transfer queue.
    Walltime requests the job be
    allowed to run for a maximum of 8
    hours.  Requests 1 node and 1
    processor.  The -j option joins
    output and error messages into one
    file.
    
    Change to the initial working
    directory.
    
    Bring $ARCHIVE files online with
    batch_stage.  Copy files to working
    directory.
    
    Submit computational job. 
    
    

    2. Submit a batch script to Moab/Torque, using qsub

    The script file can be given any name. If the above sample were named myprog.pbs, it would be submitted for processing with the command:

    qsub myprog.pbs

    3. Monitor the job

    To check the status of the submitted Torque/Moab job, execute this command:

    qstat -a

    4. Delete the job

    Given its Torque/Moab identification number (returned when you " qsub the job and shown in " qstat -a output), you can delete the job from the batch system with this command:

    qdel <PBS-ID>

    5. Examine Output

    When the job completes, Torque/Moab will save the stdout and stderr from the job to a file in the directory from which it was submitted. These files will be named using the script name and Torque/Moab identification number. For example,

    myprog.pbs.o<PBS-ID>

    Torque/Moab Queues

    List all available queues with the command qstat -Q . List details on any queue, for instance, " standard " with the command qstat -Qf standard . You may also read news queues for information on all queues, but note that the most current information is always available using the qstat commands.

    Currently, these are the available queues:

    Queue Purpose

    background

    For projects with little or no remaining allocation. This queue has the lowest priority,however projects running jobs in this queue do not have allocation deducted. Runs on 16core nodes only.

    bigmem

    Queue for jobs requiring significant memory. Runs on 32 cores nodes with 256GB of memory.

    debug

    Quick turn-around queue for debugging work. Runs on 12 core and 16 core nodes only..

    shared

    For sharing a node with other serial or small parallel jobs. Runs on 12 core nodes only.

    standard

    General use by all allocated users routes to standard_16

    standard_4

    General use by all allocated users. Runs on 4 core nodes only.

    standard_12

    General use by all allocated users. Runs on 12 core nodes only.

    standard_16

    General use by all allocated users. Runs on 16 core nodes only.

    transfer

    For transferring data to and from $ARCHIVE. Be sure to bring all $ARCHIVE files online usingbatch_stage prior to the file copy.

    MPI Run Time Options

    The OpenMPI MPI stack has a number of run-time options which affect the run time behavior of MPI applications. Below are a few useful options

    Enable CPU Affinity

    Standard MPI applications not using OpenMP or pthreads may see performance improvement if CPU Affinity is enabled. The flag "--mca mpi_paffinity_alone 1" enables CPU affinity.

    mpirun -np 16 --mca mpi_paffinity_alone 1 ./a.out

    MPI jobs using the standard_4 queue may see performance may see noticable performance improvements by enabling CPU affinity.

    Running Fewer Tasks Per Node

    At times it may be adventageous to run with fewer tasks per node while still requesting all node on thenode with the torque script. The "--npernode" option allows for finer control of task placement.

    mpirun -np 4 --npernode 2 ./a.out

    Charging Allocation Hours to an Alternate Project

    Users with membership in more than one project should select which project to charge allocation hours towards. The directive for selecting which project is the "-W group_list" Torque/Moab option. If the "-W group_list" option is not specified within a user's Torque/Moab script, the account charged will default to the user's primary group (i.e. project).

    The following is an example "-W group_list" statement.

    #PBS -W group_list=proja

    The " -W group_list option can also be used on the command line, e.g.

    pacman1 % qsub -W group_list=proja script.bat

    Each project has a corresponding UNIX group, therefore the "groups" command will show all projects (or groups) of which you are a member.

    pacman1 % groups proja projb

    Without the "-W group_list" Torque/Moab option, allocation hours would be charged to proja by default, but could be charged to projb by setting " -W group_list=projb in the Torque/Moab script.

    Monitoring Project Allocation Usage

    Each project is allocated time on pacman. The "show_usage" command can be used to monitor allocation use for projects.

    pacman1 % show_usage 
    
                ARSC - Subproject Usage Information (in CPU Hours)
                         As of 04:24:01 hours ADT 24 May 2012
              For Fiscal Year 2012 (01 October 2011 - 30 September 2012)
                      Percentage of Fiscal Year Remaining: 35.62% 
    
                              Hours      Hours      Hours      Percent  Background
    System     Subproject     Allocated  Used       Remaining  Remaining Hours Used
    ========== ============== ========== ========== ========== ========= ==========
    pacman     proja             20000.00       0.00   20000.00   100.00%       0.00
    pacman     projb            300000.00  195887.32  104112.68    34.70%   11847.78
    
    Back to Top