next up previous contents
Up: lab-notes Previous: Documentation   Contents

Dealing with obscure errors

  1. If you see the following errors, then the pvm daemon is not running. You need to start the pvm daemon using the pvm or xpvm control program.

    [amit@kohinoor lab]: psum 100 2
    Starting 2 copies of spsum
    libpvm [pid12121] /tmp/pvmd.221: No such file or directory
    libpvm [pid12121] /tmp/pvmd.221: No such file or directory
    libpvm [pid12121] /tmp/pvmd.221: No such file or directory
    libpvm [pid12121]: pvm_mytid(): Can't contact local daemon
    libpvm [pid12121] /tmp/pvmd.221: No such file or directory
    libpvm [pid12121] /tmp/pvmd.221: No such file or directory
    libpvm [pid12121] /tmp/pvmd.221: No such file or directory
    libpvm [pid12121]: pvm_spawn(): Can't contact local daemon
    Trouble spawning slaves. Aborting. Error codes are:
    TID -140
    TID -130
    TID -120
    TID -110
    TID -100
    TID -90
    TID -80
    TID -70
    TID -60
    TID -50
    TID -40
    TID -30
    TID -20
    TID -117
    TID 00
    TID 10
    libpvm [pid12121] /tmp/pvmd.221: No such file or directory
    [amit@kohinoor lab]:
    

  2. If you see the following error message(s):

    [amit@onyx PBS ~]:xpvm
    xpvm: Using list of machines from PBS.
    libpvm [pid21054] mksocs() connect: Connection refused
    libpvm [pid21054]       socket address tried: /tmp/pvmtmp020154.0
    Connecting to PVMD already running... libpvm [pid21054] mksocs() connect: Connection refused
    libpvm [pid21054]       socket address tried: /tmp/pvmtmp020154.0
    libpvm [pid21054] mksocs() connect: Connection refused
    libpvm [pid21054]       socket address tried: /tmp/pvmtmp020154.0
    libpvm [pid21054] mksocs() connect: Connection refused
    libpvm [pid21054]       socket address tried: /tmp/pvmtmp020154.0
    libpvm [pid21054]: pvm_mytid(): Can't contact local daemon
    libpvm [pid21054]: Error Joining PVM: Can't contact local daemon
    

    This means that even though the PVM system thinks that the pvmd is running, the daemon is not responding. This may happen because the system was rebooted (without a proper shutdown) since you last started the pvmd daemon or at some point you unceremoniously killed the pvm programs and daemon.

    Check if your pvmd is still running using the following command.

    pdsh -a ps -ax | grep pvmd

    Kill all instances of the pvmd using he following command:

    pdsh -a killall -9 pvmd

    Then go to the /tmp directory and look for the files whose name starts with pvm. Remove all such files that are owned by you. You can simply use:

    pdsh -a rm -f /tmp/pvm*

    Ignore warnings about failed deletes on other user's pvm files. Now you should be able restart the PVM daemon. You can use the command pvmclean to help you with this cleanup.


next up previous contents
Up: lab-notes Previous: Documentation   Contents
Amit Jain 2010-09-02