PBS
The Portable Batch System, PBS, is a workload management system for Linux clusters. It supplies command to submit, monitor, and manage jobs.
PBS Job Script
- Create a job script containing the following PBS options
Request the resources that will be needed (i.e. number of processors, wall-clock time, etc.) and use commands to prepare for execution of the executable (i.e. cd to working directory, etc.).
- Submit the job script file to PBS.
- Monitor the job.
Common PBS Options
Below are some of the commonly used PBS options in a job script file. The options start with "#PBS."
Option
|
Description
|
#PBS -N myJob
|
Assigns a job name. The default is the name of PBS job script.
|
#PBS -l nodes=4:ppn=2
|
The number of nodes and processors per node.
|
#PBS -q queuename
|
Assigns the queue your job will use.
|
#PBS -l walltime=01:00:00
|
The maximum wall-clock time during which this job can run.
|
#PBS -o mypath/my.out
|
The path and file name for standard output.
|
#PBS -e mypath/my.err
|
The path and file name for standard error.
|
#PBS -j oe
|
Join option that merges the standard error stream with the standard output stream of the job.
|
#PBS -W stagein=file_list
|
Copies the file onto the execution host before the job starts. (*)
|
#PBS -W stageout=file_list
|
Copies the file from the execution host after the job completes. (*)
|
#PBS -m b
|
Sends mail to the user when the job begins.
|
#PBS -m e
|
Sends mail to the user when the job ends.
|
#PBS -m a
|
Sends mail to the user when job aborts (with an error).
|
#PBS -m ba
|
Allows a user to have more than one command with the same flag by grouping the messages together on one line, else only the last command gets executed.
|
#PBS -r n
|
Indicates that a job should not rerun if it fails.
|
#PBS -V
|
Exports all environment variables to the job.
|
PBS ENVIRONMENT VARIABLES
There are a number of predefined environment variables. These include the following:
The following environment variables relate to the submission machine:
Option
|
Description
|
PBS_O_HOST
|
The host machine on which the qsub command was run.
|
PBS_O_LOGNAME
|
The login name on the machine on which the qsub was run.
|
PBS_O_HOME
|
The home directory from which the qsub was run.
|
PBS_O_WORKDIR
|
The working directory from which the qsub was run.
|
The following variables relate to the environment where the job is executing:
Option
|
Description
|
PBS_ENVIRONMENT
|
This is set to PBS_BATCH for batch jobs and to PBS_INTERACTIVE for interactive jobs.
|
PBS_O_QUEUE
|
The original queue to which the job was submitted.
|
PBS_JOBID
|
The identifier that PBS assigns to the job.
|
PBS_JOBNAME
|
The name of the job.
|
PBS_NODEFILE
|
The file containing the list of nodes assigned to a parallel job.
|
Submitting a Job
We can submit job by 'qsub' command. Job attributes can be set in 2 different ways .
Method 1: on the qsub command line
qsub -<other options> -N <job_name> <job_script>
ex: qsub -l select=1:ncpus=1:mem=100MB -l walltime=01:00:00 -N my_job myscript
Method 2: within a job script as a PBS directive
#! /bin/bash
#PBS -l walltime=10:00:00
#PBS -N my_job_mpi
#PBS -q workq
#PBS -l select=2:ncpus=12:mpiprocs=12
#PBS -l place=scatter:excl
#PBS -V
# Go to the directory from which you submitted the job
cd $PBS_O_WORKDIR
mpiexec_mpt ./a.out
Note: - PBS expects the directives to begin on the second line, and be on consecutive lines thereafter.
Once started, the interpreter stops processing directives at the first line that contains an executable line. It will ignore comment lines.
- Command line arguments will override PBS directives.
Monitoring a Job
Below are commands for monitoring a job:
Command
|
Function
|
qstat -a
|
check status of jobs, queues, and the PBS server
|
qstat -f job.ID
|
get all the information about a job, i.e. resources requested, resource limits, owner, source, destination, queue, etc.
|
qdel job.ID
|
delete a job from the queue
|
qhold job.ID
|
hold a job if it is in the queue
|
qrls job.ID
|
release a job from hold
|
tracejob job.ID
|
comprehensive information about a job
|
===========================================================
Components of PBS
Batch Server (PBS_Server) : provides the basic batch services such as receiving/creating a batch job, modifying the job, protecting the job against system crashes, and running the job.
Scheduler (PBS_Scheduler) : a daemon that contains the site's policy controlling which job is run and where and when it is run. PBS allows each site to create its own Scheduler
MOM (PBS_MOM) : It actually places the job into execution when it receives a copy of the job from the Batch Server. Mom creates a new session as identical to a user login session as is possible and returns the job's output to the user.
No comments:
Post a Comment