pvm
This invokes the PVM control program. The pvm console program is set to automatically add the nodes allocated to you by PBS. You can check what machines are in the PVM system with the conf command.
[amit@onyx amit]$ pbsget -4 ##################################################################### Allocate cluster nodes via PBS for running interactive parallel jobs. ##################################################################### Trying for 4 nodes ... qsub: waiting for job 806.onyx.boisestate.edu to start qsub: job 806.onyx.boisestate.edu ready [amit@onyx PBS ~]:pvm pvm: Using list of machines from PBS. pvm> conf 5 hosts, 1 data format HOST DTID ARCH SPEED DSIG ws00 40000 LINUXI386 1000 0x00408841 ws04 80000 LINUXI386 1000 0x00408841 ws03 c0000 LINUXI386 1000 0x00408841 ws02 100000 LINUXI386 1000 0x00408841 ws01 140000 LINUXI386 1000 0x00408841 pvm>
In the above example, we allocated 4 nodes using the PBS system. Note that the PVM console program adds the four compute nodes plus the master node to the PVM system. PBS always allocates the master node since that is a shared resource. Normally you would not run your programs on the master nodes but use it for monitoring purposes.
If you have already started the daemon, then the pvm console program would inform you. The simplest thing to do is to halt the already running daemon and start a fresh one, as shown below:
[amit@onyx PBS ~]:pvm pvm: Using list of machines from PBS. pvmd already running. pvm> halt Terminated [amit@onyx PBS ~]:pvm pvm: Using list of machines from PBS. pvm>
You can type help in the pvm console program to get a list of all commands. You can run your application using the spawn command from the PVM console. Here is the help on the spawn command.
pvm> help spawn spawn Spawn task Syntax: spawn [ options ] file [ arg ... ] Options: -(count) number of tasks, default is 1 -(host) spawn on host, default is any -(host):(wd) spawn on host, in directory 'wd' --host=(IP) spawn on host with given IP addr --host=(IP):(wd) spawn on IP, in directory 'wd' -(ARCH) spawn on hosts of ARCH -(ARCH):(wd) spawn on hosts of ARCH, in 'wd' -:(wd) spawn in working directory 'wd' -? enable debugging -> redirect job output to console ->(file) redirect output of job to file ->>(file) append output of job to file -@ trace job, output to terminal -@(file) trace job, output to file pvm>
The following figure continues the example by running a parallel program using the spawn command. The program is a SPMD style program. Thus we need to spawn a number of copies together. The output from the various nodes is captured together with the -> option to the spawn command. Note that each output is tagged with the task id of the task creating that output.
pvm> spawn -4 -> spmd_sum 10000 [1] 4 successful t80001 tc0001 t100001 t140001 pvm> [1:tc0001] EOF [1:t80001] I got 2500 from 0 [1:t80001] I got 2500 from 1 [1:t80001] I got 2500 from 3 [1:t80001] I got 2500 from 2 [1:t80001] The total is 10000 [1:t80001] EOF [1:t140001] EOF [1:t100001] EOF [1] finished pvm>
There is no limitation in PVM as to the number of tasks versus the number of nodes. So we could have invoked 12 tasks in the above example even though we acquired only 4 compute nodes. By default the PVM system distributes the tasks on the available nodes in a round robin fashion.
Finally you must remember to halt the PVM system and exit out of the PBS shell to release all resources.
pvm> pvm> halt Terminated [amit@onyx PBS ~]:exit logout qsub: job 806.onyx.boisestate.edu completed [amit@onyx amit]$
Note that MPMD (or master/slave) style programs can be invoked directly from the PBS shell if you have started the pvm daemon before. For example:
[amit@onyx amit]$ pbsget -4 ##################################################################### Allocate cluster nodes via PBS for running interactive parallel jobs. ##################################################################### Trying for 4 nodes ... qsub: waiting for job 807.onyx.boisestate.edu to start qsub: job 807.onyx.boisestate.edu ready [amit@onyx PBS ~]:pvm pvm: Using list of machines from PBS. pvm> quit Console: exit handler called pvmd still running. [amit@onyx PBS ~]:psum 10000 4 Starting 4 copies of spsum I got 2500 from 2 I got 2500 from 0 I got 2500 from 1 I got 2500 from 3 The total is 10000 [amit@onyx PBS ~]:exit logout qsub: job 807.onyx.boisestate.edu completed [amit@onyx amit]$
The above technique does not work for SPMD programs. It also will not show you the output from other nodes (unless you capture it using library calls in PVM). Finally, note that it is possible to embed the start up of PVM daemon and adding of nodes inside a program, making PVM transparent.
For more information on the pvm console program, see the man page for it.