next up previous contents
Next: Running parallel programs in Up: Running parallel programs interactively Previous: Running parallel programs interactively   Contents

Acquiring nodes from the PBS system

The cluster uses the TORQUE system to manage the resources effectively. TORQUE stands for Terascale Open-Source Resource and QUEue Manager. It is an Open Source distributed resource manager originally based on OpenPBS, the Portable Batch System (PBS). We will simply call the resource manager the PBS system.

To run a parallel program, the user needs to request nodes from the PBS system. The master node is a shared resource and is always allocated to the user. The compute nodes are allocated in an exclusive mode. Currently there is a time limit of one hour for the use of compute nodes at a time in the interactive mode.

To check the status of the nodes in the PBS system:

To request n nodes, use the command pbsgeton the master node. Here is a sample session.

amit@onyx ~]$ pbsget -8
#####################################################################
Allocate cluster nodes via PBS for running interactive parallel jobs.
#####################################################################
Trying for 8 nodes
*****************************************************
 Scheduling an interactive cluster session with PBS.
 Please end session by typing in exit.
 Use qstat -n to see nodes allocated by PBS.

 You may now run your mpi programs. They will automatically use
 only the nodes allocated by PBS.
 For running MPI programs use the following commands:
    mpiexec [options]  <program> [<prog args>]
*****************************************************

qsub: waiting for job 2608.onyx.boisestate.edu to start
qsub: job 2608.onyx.boisestate.edu ready

[amit@node14 PBS ~]:qstat -n
onyx.boisestate.edu: 
                                                                       Req'd       Elap
Job ID       Username    Queue    Jobname   SessID  NDS   TSK Memory   Time    S   Time
----------------------- ----------- -------- ---------------- ------ --------- - ---------
2608.onyx.boisestate.e  amit    interact STDIN  0   8 8:node   --     00:30:00 R  00:00:04
   node14/0+node21/0+node20/0+node19/0+node18/0+node17/0+node16/0+node15/0
[amit@node14 PBS ~]:cat $PBS_NODEFILE 
node14
node21
node20
node19
node18
node17
node16
node15
[amit@node14 PBS ~]:exit
logout

qsub: job 2608.onyx.boisestate.edu completed
[amit@onyx ~]$

The command pbsget attempts to allocate the requested number of nodes from the PBS system. If it succeeds, it starts a new shell with the prompt modified to have PBS in the prompt. Note that the environment variable PBS_NODEFILE contains the name of a file that contains the list of noes allocated by PBS to the user. Now the user can run MPI parallel programs. When the user is done they would type exit to end the interactive PBS session.

If the required number of nodes are not available, then pbsget will wait. A user can cancel the request by typing in Ctrl-c and try again later. Remember to use qstat -n (or cnodes) to check the status of the nodes.


next up previous contents
Next: Running parallel programs in Up: Running parallel programs interactively Previous: Running parallel programs interactively   Contents
Amit Jain 2016-09-08