Thursday, September 4, 2014

Clustering : PBS: Portable Batch System [HPC = JOB SCHEDULER]


The Portable Batch System, PBS, is a workload management system for Linux clusters. It supplies command to submit, monitor, and manage jobs.

PBS Job Script

  1. Create a job script containing the following PBS options
Request the resources that will be needed (i.e. number of processors, wall-clock time, etc.) and use commands to prepare for execution of the executable (i.e. cd to working directory, etc.).
  1. Submit the job script file to PBS.
  2. Monitor the job.

Common PBS Options

Below are some of the commonly used PBS options in a job script file. The options start with "#PBS."

#PBS -N myJob
Assigns a job name. The default is the name of PBS job script.
#PBS -l nodes=4:ppn=2
The number of nodes and processors per node.
#PBS -q queuename
Assigns the queue your job will use.
#PBS -l walltime=01:00:00
The maximum wall-clock time during which this job can run.
#PBS -o mypath/my.out
The path and file name for standard output.
#PBS -e mypath/my.err
The path and file name for standard error.
#PBS -j oe
Join option that merges the standard error stream with the standard output stream of the job.
#PBS -W stagein=file_list
Copies the file onto the execution host before the job starts. (*)
#PBS -W stageout=file_list
Copies the file from the execution host after the job completes. (*)
#PBS -m b
Sends mail to the user when the job begins.
#PBS -m e
Sends mail to the user when the job ends.
#PBS -m a
Sends mail to the user when job aborts (with an error).
#PBS -m ba
Allows a user to have more than one command with the same flag by grouping the messages together on one line, else only the last command gets executed.
#PBS -r n
Indicates that a job should not rerun if it fails.
Exports all environment variables to the job.


There are a number of predefined environment variables. These include the following:
  • Variables defined on the execution host;
  • Variables exported from the submission host to the execution host; and
  • Variables defined by PBS.
The following environment variables relate to the submission machine:
The host machine on which the qsub command was run.
The login name on the machine on which the qsub was run.
The home directory from which the qsub was run.
The working directory from which the qsub was run.

The following variables relate to the environment where the job is executing:
This is set to PBS_BATCH for batch jobs and to PBS_INTERACTIVE for interactive jobs.
The original queue to which the job was submitted.
The identifier that PBS assigns to the job.
The name of the job.
The file containing the list of nodes assigned to a parallel job.

Submitting a Job

We can submit job by 'qsub' command. Job attributes can be set in 2 different ways .

Method 1: on the qsub command line

qsub -<other options> -N <job_name> <job_script>

ex: qsub -l select=1:ncpus=1:mem=100MB -l walltime=01:00:00 -N my_job myscript

Method 2: within a job script as a PBS directive

#! /bin/bash
#PBS -l walltime=10:00:00
#PBS -N my_job_mpi
#PBS -q workq
#PBS -l select=2:ncpus=12:mpiprocs=12
#PBS -l place=scatter:excl

Go to the directory from which you submitted the job

mpiexec_mpt ./a.out

Note: - PBS expects the directives to begin on the second line, and be on consecutive lines thereafter.
Once started, the interpreter stops processing directives at the first line that contains an executable line. It will ignore comment lines.
- Command line arguments will override PBS directives.

Monitoring a Job

Below are commands for monitoring a job:
qstat -a
check status of jobs, queues, and the PBS server
qstat -f job.ID
get all the information about a job, i.e. resources requested, resource limits, owner, source, destination, queue, etc.
qdel job.ID
delete a job from the queue
qhold job.ID
hold a job if it is in the queue
qrls job.ID
release a job from hold
tracejob job.ID
comprehensive information about a job

Components of PBS

 Batch Server (PBS_Server) : provides the basic batch services such as receiving/creating a batch job, modifying the job, protecting the job against system crashes, and running the job.
  • central focus for a PBS complex
  • routes job to compute host
  • processes all PBS related commands
  • provides the basic batch services
  • server maintains its own server and queue settings
  • daemon executes as pbs_server

Scheduler (PBS_Scheduler) a daemon that contains the site's policy controlling which job is run and where and when it is run. PBS allows each site to create its own Scheduler 
  • queries list of running and queued jobs from the PBS Server
  • queries queue, server, and node properties
  • queries resource consumption and availability from the PBS MOM
  • sorts available jobs according to local scheduling policies
  • determines which job is eligible to run next
  • daemon executing as pbs_sched

MOM (PBS_MOM) : It actually places the job into execution when it receives a copy of the job from the Batch Server. Mom creates a new session as identical to a user login session as is possible and returns the job's output to the user.
  • executes jobs at request of PBS Scheduler
  • monitors resource usage of running jobs
  • enforces resource limits on jobs
  • reports system resource limits, configuration
  • daemon executing as pbs_mom   

No comments: