Job Submission (Torque and Moab)
Overview
Teaching: 30 min
Exercises: 30 minQuestions
How to submit jobs on the HPC cluster?
Objectives
Learn the most frequently used commands for Torque and Moab
When you are using your own computer, you execute your calculations and you are responsible of not overloading the machine with more workload that the machine can actually process efficiently. Also, you probably have only one machine to work, if you have several you login individually on each and execute calculations by directly running calculations.
On a shared resource like an HPC cluster, things are very different. You and several others, maybe hundreds are competing for getting their calculations done. A Resource Manager take care of receiving job submissions. From the other side a Job Scheduler is in charge of associate jobs with the appropriated resources and trying to maximize and objective function such as total utilization constrained by priorities and the best balance between the resources requested and resources available.
TORQUE Resource and Queue Manager
Terascale Open-source Resource and QUEue Manager (TORQUE) is a distributed resource manager providing control over batch jobs and distributed compute nodes. TORQUE can be integrated both, commercial and non-commercial Schedulers. In the case of Mountaineer and Spruce TORQUE is used with the commercial Moab Scheduler.
This is a list of TORQUE commands:
Command | Description |
---|---|
momctl |
Manage/diagnose MOM (node execution) daemon |
pbsdsh |
Launch tasks within a parallel job |
pbsnodes |
View/modify batch status of compute nodes |
qalter |
Modify queued batch jobs |
qchkpt |
Checkpoint batch jobs |
qdel |
Delete/cancel batch jobs |
qgpumode |
Specifies new mode for GPU |
qgpureset |
Reset the GPU |
qhold |
Hold batch jobs |
qmgr |
Manage policies and other batch configuration |
qmove |
Move batch jobs |
qorder |
Exchange order of two batch jobs in any queue |
qrerun |
Rerun a batch job |
qrls |
Release batch job holds |
qrun |
Start a batch job |
qsig |
Send a signal to a batch job |
qstat |
View queues and jobs |
qsub |
Submit jobs |
qterm |
Shutdown pbs server daemon |
tracejob |
Trace job actions and states recorded in Torque logs (see Using “tracejob” to Locate Job Failures) |
From those commands, a basic knowledge about qsub
, qstat
and qdel
is suficient for most purposes on a normal usage of the cluster.
TORQUE includes numerous directives, which are used to specify resource
requirements and other attributes for batch and interactive jobs.
TORQUE directives can appear as header lines (lines that start with #PBS)
in a batch job script or as command-line options to the qsub
command.
Submission scripts
A TORQUE job script for a serial job might look like this:
#!/bin/bash
#PBS -k o
#PBS -l nodes=1:ppn=1,walltime=00:30:00
#PBS -M username@mix.wvu.edu
#PBS -m abe
#PBS -N JobName
#PBS -j oe
#PBS -q standby
cd $PBS_O_WORKDIR
./a.out
The following table describes the most basic directives
TORQUE directive | Description |
---|---|
#PBS -k o | Keeps the job output |
#PBS -l nodes=1:ppn=1,walltime=00:30:00 | Indicates the job requires one node, one processor per node, and 30 minutes of wall-clock time |
#PBS -M username@mix.wvu.edu | Sends job-related email to username@mix.wvu.edu |
#PBS -m abe | Sends email if the job is (a) aborted, when it (b) begins, and when it (e) ends |
#PBS -N JobName | Names the job JobName |
#PBS -j oe | Joins standard output and standard error |
#PBS -q standby | Submit the job on the standby queue |
A parallel job using MPI could be like this:
#!/bin/bash
#PBS -k o
#PBS -l nodes=1:ppn=16,walltime=30:00
#PBS -M username@mix.wvu.edu
#PBS -m abe
#PBS -N JobName
#PBS -j oe
#PBS -q standby
cd $PBS_O_WORKDIR
mpirun -np 16 -machinefile $PBS_NODEFILE ./a.out
The directives are very similar to the serial case
TORQUE directive | Description |
---|---|
#PBS -l nodes=1:ppn=16,walltime=00:30:00 | Indicates the job requires one node, using 16 processors per node, and 30 minutes of runtime. |
Exercise: Creating a Job script and submit it
On
1.Intro-HPC/07.jobs
you will find the same 3 ABINIT files that we worked on the Command Line Interface episode. The exercise is to prepare a submission script for computing the calculation. This is all that you need to know:
- You need to load the these modules to use ABINIT
module load compilers/gcc/6.3.0 mpi/openmpi/2.1.2_gcc63 libraries/fftw/3.3.6_gcc63 compilers/intel/17.0.1_MKL_only atomistic/abinit/8.6.3_gcc63
It is good idea to purge the modules first to avoid conflicts with the modules that you probably are loading by default.
- ABINIT works in parallel using MPI, for this exercise lets request 4 cores on a single node. The actual command to be executed is:
mpirun -np 4 abinit < t17.files
Assuming that you have moved into the folder that has the 3 files.
Job Arrays
Job array is a way to submit many jobs that can be indexed. The jobs are independent between them but you can submit them with a single qsub
#!/bin/sh
#PBS -N <jobname>_${PBS_ARRAYID}
#PBS -t <num_range>
#PBS -l nodes=<number_of_nodes>:ppn=<PPN number>,walltime=<time_needed_by_job>
#PBS -m ae
#PBS -M <email_address>
#PBS -q <queue_name>
#PBS -j oe
cd $PBS_O_WORKDIR
# Enter the command here
mpirun -np <PPN number> ./a.out
There a few new elements here: ${PBS_ARRAYID}
is a variable that receives one different value for each job in the range described from the -t
variable.
#PBS -j oe
is an option often used for job arrays, it merges the standard output and error files in a single file, avoiding the overload of files for large job arrays.
Exercise: Job arrays
Using the same 3 files from our previous exercise, prepare a set of 5 folders called
1
,2
,3
,4
and5
. The file14si.pspnc
is better as a symbolic link as that file will never change.Solution
$ mkdir 1 2 3 4 5 $ cp t17.* 1 $ cp t17.* 2 $ cp t17.* 3 $ cp t17.* 4 $ cp t17.* 5 $ ln -s ../14si.pspnc 1/14si.pspnc $ ln -s ../14si.pspnc 2/14si.pspnc $ ln -s ../14si.pspnc 3/14si.pspnc $ ln -s ../14si.pspnc 4/14si.pspnc $ ln -s ../14si.pspnc 5/14si.pspnc
Modify the submission script from the previous exercise to create a job array. When executed, each job in the array receives a different value inside
${PBS_ARRAYID}
, we use the value to go into the corresponding folder and execute ABINIT there.solution
#!/bin/bash #PBS -N ABINIT_${PBS_ARRAYID} #PBS -l nodes=1:ppn=4 #PBS -q debug #PBS -t 1-5 #PBS -j oe module purge module load compilers/gcc/6.3.0 mpi/openmpi/2.1.2_gcc63 libraries/fftw/3.3.6_gcc63 compilers/intel/17.0.1_MKL_only atomistic/abinit/8.6.3_gcc63 cd $PBS_O_WORKDIR cd ${PBS_ARRAYID} mpirun -np 4 abinit < t17.files
Environment variables
We are using PBS_O_WORKDIR to change directory to the place where the job was submitted The following environment variables will be available to the batch job.
Variable | Description |
---|---|
PBS_O_HOST | the name of the host upon which the qsub command is running. |
PBS_SERVER | the hostname of the pbs_server which qsub submits the job to. |
PBS_O_QUEUE | the name of the original queue to which the job was submitted. |
PBS_O_WORKDIR | the absolute path of the current working directory of the qsub command. |
PBS_ARRAYID | each member of a job array is assigned a unique identifier (see -t) |
PBS_ENVIRONMENT | if set to PBS_BATCH is a batch job, if set to PBS_INTERACTIVE is an interactive job. |
PBS_JOBID | the job identifier assigned to the job by the batch system. |
PBS_JOBNAME | the job name supplied by the user. |
PBS_NODEFILE | the name of the file contain the list of nodes assigned to the job (for parallel and cluster systems). |
PBS_QUEUE | the name of the queue from which the job is executed. |
Montioring jobs
To monitor the status of a queued or running job, use the qstat command from Torque of showq from Moab.
Useful qstat options include:
qstat option | Description |
---|---|
-Q | show all queues available |
-u user_list | Displays jobs for users listed in user_list |
-a | Displays all jobs |
-r | Displays running jobs |
-f | Displays the full listing of jobs (returns excessive detail) |
-n | Displays nodes allocated to jobs |
Moab showq offers:
showq option | Description |
---|---|
-b | blocked jobs only |
-c | details about recently completed jobs. |
-g | grid job and system id’s for all jobs. |
-i | extended details about idle jobs. |
-l | local/remote view. For use in a Grid environment, displays job usage of both local and remote compute resources. |
-n | normal showq output, but lists job names under JOBID |
-o | jobs in the active queue in the order specified (uses format showq -o |
-p | only jobs assigned to the specified partition. |
-r | extended details about active (running) jobs. |
-R | only jobs which overlap the specified reservation. |
-u | specified user’s jobs. Use showq -u -v to display the full username if it is truncated in normal -u output. |
-v | local and full resource manager job IDs as well as partitions. |
-w | only jobs associated with the specified constraint. Valid constraints include user, group, acct, class, and qos. |
Deleting jobs
With Torque you can use qdel
. Moab uses mjobctl -c
. For example:
$ qdel 1045
$ mjobctl -c 1045
Prologue and Epilogue
Torque provides administrators the ability to run scripts before and/or after each job executes. With such a script, you can prepare the system, perform node health checks, prepend and append text to output and error log files, cleanup systems, and so forth.
You can add prologue and epilogue scripts from the command line.
$ qsub -l prologue=/home/user/prologue.sh,epilogue=/home/user/epilogue.sh
or adding the corresponding lines on your submission script.
#!/bin/bash
...
#PBS -l prologue=/home/user/prologue.sh
#PBS -l
...
A simple prologue.sh
can be like this:
#!/bin/sh
echo "Prologue Args:"
echo "Job ID: $1"
echo "User ID: $2"
echo "Group ID: $3"
echo ""
exit 0
And a simple epilogue can be like:
#!/bin/sh
echo "Epilogue Args:"
echo "Job ID: $1"
echo "User ID: $2"
echo "Group ID: $3"
echo "Job Name: $4"
echo "Session ID: $5"
echo "Resource List: $6"
echo "Resources Used: $7"
echo "Queue Name: $8"
echo "Account String: $9"
echo ""
exit 0
The purpose of the scripts above is to provide information about the job being stored in the same file as the standard output file created for the job. That is very useful if you want to adjust resources based on previous executions and the epilogue will store the information on file.
The prologue is also useful, it can check if the proper environment for the job is present and based on the return of that script indicate Torque if the job should continue for execution, resubmit it or cancel it. The following table describes each exit code for the prologue scripts and the action taken.
Error | Description | Action |
---|---|---|
-4 | The script timed out | Job will be requeued |
-3 | The wait(2) call returned an error | Job will be requeued |
-2 | Input file could not be opened | Job will be requeued |
-1 | Permission error (script is not owned by root, or is writable by others) | Job will be requeued |
0 | Successful completion | Job will run |
1 | Abort exit code | Job will be aborted |
>1 | other | Job will be requeued |
Exercise: Prologue and Epilogue arrays
Add prologue and epilogue to the job array from the previous exercise
Interactive jobs with X11 Forwarding
Sometimes you need to do some interactive job to create some plots and we would like to see the figures on the fly rather than, bring all the data back to the desktop, this example shows how to achieve that.
$ ssh -Y username@spruce.hpc.wvu.edu
Test the X11 Forwarding
$ xeyes
A windows should appear with two eyes facing you, you can close the window and create a new session with qsub
$ qsub -X -I
After you get a new session and test it again with xeyes
. This new eyes comes from the compute node, not from the head node.
$ xeyes
Lets suppose that the plot will be created with R, load the module for R 3.4.1 (the latest version by the time this was written)
$ module load compilers/R/3.4.1
Enter in the text-based R interpreter
$ R
Test a simple plot in R
> # Define the cars vector with 5 values
> cars <- c(1, 3, 6, 4, 9)
>
> # Graph the cars vector with all defaults
> plot(cars)
>
You should get a new window with a plot and a few points. On the episode about Singularity we will see some other packages that use the X11 server like RStudio and Visit.
Key Points
It is a good idea to keep aliases to common torque commands for easy execution.