next up previous contents
Next: Submitting a PBS batch Up: Running parallel programs in Previous: Getting ready to run   Contents

Preparing a PBS batch job script

Any parallel program that takes more than a few minutes should normally be run as a PBS batch job. In order to run it as a PBS batch job, you will need to prepare a PBS batch script (which is just a shell script with some additional features). Here is a sample PBS batch job (~/amit/cs430/lab/PVM/tools/psort.pbs):

#!/bin/sh
#PBS -l nodes=1:master+16:node
# This is a PBS job submission script. It asks for the master node 
# and 16 nodes in the PBS cluster to run the PVM application on.
#
# IMPORTANT NOTE:  Be sure to modify the "cd" command below to switch
# to the directory in which you are currently working!  
#
#------------------------------------------------------------------------

cd /home/faculty/amit/cs430/lab/PVM/tools
pvmrun -np 16 psort 20000000 16

The line starting with #PBS is a PBS directive. There are many PBS directives but the one we will use is mainly the one that lists the nodes that we need to run our program. The following list shows some common options that can be used in the PBS directives:

PBS option Description
-N jobname name the job jobname
-l cput=N request N seconds of CPU time; N can also be in hh:mm:ss form
-l mem=N[KMG][BW] request N kilo|mega|gigabytes|words of memory
-l nodes=N:ppn=M request N nodes with M processors per node
-m e mail the user when the job completes
-m a mail the user if the job aborts
-a 1800 Start job after 6pm
-o outfile redirect standard output to outfile
-e errfile redirect standard error to errfile
-j oe combine standard output and standard error

For a full list, see the man page for pbs_resources on the cluster.

Here is another sample PBS batch job. Here the psum program is assumed to spawn processes to the 16 nodes.

#!/bin/sh
#PBS -l nodes=1:master+16:node
# This is a PBS job submission script. It runs a master/slave PVM program
# Note that even though we are specifying only one process to pvmrun, we
# need to reserve the appropriate number of nodes to match what the parallel
# program requires.
# 
# IMPORTANT NOTE:  Be sure to modify the "cd" command below to switch
# to the directory in which you are currently working!  
#
#------------------------------------------------------------------------
cd /home/faculty/amit/cs430/lab/PVM/parallel_sum
pvmrun -np 1 psum 10000 16

Here is a sample PBS batch script for a MPICH2 program.

#!/bin/sh
#PBS -l nodes=1:master+4:node

#------------------------------------------------------------------------
# setup for MPICH2
MPICH2_HOME=/usr/local/mpich2
export PATH=$MPICH2_HOME/bin:$PATH
export MANPATH=$MPICH2_HOME/man:$MANPATH
unset MPI_HOST
#------------------------------------------------------------------------
cd /home/amit/MPI/hello_world

mpdboot
mpiexec -n 4 spmd_hello_world
mpdallexit

Here is a sample PBS batch script for a LAM MPI program.

#!/bin/sh
#PBS -l nodes=1:master+4:node

# reset paths to point to LAM MPI
PATH=/usr/bin/:$PATH
MANPATH=/usr/share/man:/usr/man:$MANPATH
export PATH MANPATH
hash -r

cd /home/amit/MPI/hello_world
lamboot
mpiexec -n 4 spmd_hello_world
lamhalt -v


next up previous contents
Next: Submitting a PBS batch Up: Running parallel programs in Previous: Getting ready to run   Contents
Amit Jain 2010-09-02