Any parallel program that takes more than a few minutes should normally be run as a PBS batch job. In order to run it as a PBS batch job, you will need to prepare a PBS batch script (which is just a shell script with some additional features). Here is a sample PBS batch job (~/amit/cs430/lab/MPI/parallel-sum/sum.pbs):
#!/bin/sh #PBS -l nodes=16:node # This is a PBS job submission script. It asks for 16 nodes in the cluster # to run the MPI application on. # # IMPORTANT NOTE: Be sure to modify the "cd" command below to switch # to the directory in which you are currently working! # #------------------------------------------------------------------------ cd /home/faculty/amit/cs430/lab/MPI/parallel-sum mpiexec -n 16 spmd_sum_3 20000000
The line starting with #PBS is a PBS directive. There are many PBS directives but the one we will use is mainly the one that lists the nodes that we need to run our program. The following list shows some common options that can be used in the PBS directives:
PBS option | Description |
-N jobname | name the job jobname |
-l cput=N | request N seconds of CPU time; N can also be in hh:mm:ss form |
-l mem=N[KMG][BW] | request N kilo|mega|gigabytes|words of memory |
-l nodes=N:ppn=M | request N nodes with M processors per node |
-m b | mail the user when the job begins execution |
-m e | mail the user when the job completes |
-m a | mail the user if the job aborts |
-a 1800 | start job on or after 6pm |
-o outfile | redirect standard output to outfile |
-e errfile | redirect standard error to errfile |
-j oe | combine standard output and standard error |
For a full list, see the man page for pbs_resources on the cluster.
Here is another sample PBS batch job.
#!/bin/sh #PBS -l nodes=16:node #PBS -m be #PBS -a 2200 #PBS -o psum.log # This is a PBS job submission script. It asks for 16 nodes in the cluster # and asks the job to be scheduled at 10pm or later, the user to be mailed # when the job begins and when it ends and capture the output in the file psum.log # # IMPORTANT NOTE: Be sure to modify the "cd" command below to switch # to the directory in which you are currently working! # #------------------------------------------------------------------------ cd /home/faculty/amit/cs430/lab/MPI/parallel-sum mpiexec -n 16 spmd_sum_3 1000000000