1) I create the file "mynodes": panda #localhost panda2 panda3 2) I did [proffess@panda work]$ lamboot mynodes and got: LAM 7.0/MPI 2 C++/ROMIO - Indiana University ERROR: LAM/MPI unexpectedly received the following on stderr: bash: line 1: hboot: command not found ----------------------------------------------------------------------------- LAM attempted to execute a process on the remote node "panda2", but received some output on the standard error. LAM tried to use the remote agent command "rsh" to invoke "hboot" on the remote node. This can indicate an authentication error with the remote agent, or can indicate an error in your $HOME/.cshrc, $HOME/.login, or $HOME/.profile files. The following is a list of items that you may wish to check on the remote node: - You have an account and can login to the remote machine - Incorrect permissions on your home directory (should probably be 0755) - Incorrect permissions on your $HOME/.rhosts file (if you are using rsh -- they should probably be 0644) - You have an entry in the remote $HOME/.rhosts file (if you are using rsh) for the machine and username that you are running from - Your .cshrc/.profile must not print anything out to the standard error - Your .cshrc/.profile should set a correct TERM type - Your .cshrc/.profile should set the SHELL environment variable to your default shell Try invoking the following command at the unix command line: rsh panda2 -n hboot -t -c lam-conf.lamd -s -I "-H 195.208.40.134 -P 34550 -n 1 -o 0" You will need to configure your local setup such that you will *not* be prompted for a password to invoke this command on the remote node. No output should be printed from the remote node before the output of the command is displayed. When you can get this command to execute successfully by hand, LAM will probably be able to function properly. ----------------------------------------------------------------------------- ----------------------------------------------------------------------------- lamboot encountered some error (see above) during the boot process, and will now attempt to kill all nodes that it was previously able to boot (if any). Please wait for LAM to finish; if you interrupt this process, you may have LAM daemons still running on remote nodes. ----------------------------------------------------------------------------- ERROR: LAM/MPI unexpectedly received the following on stderr: bash: line 1: tkill: command not found ----------------------------------------------------------------------------- LAM attempted to execute a process on the remote node "panda2", but received some output on the standard error. LAM tried to use the remote agent command "rsh" to invoke "tkill" on the remote node. This can indicate an authentication error with the remote agent, or can indicate an error in your $HOME/.cshrc, $HOME/.login, or $HOME/.profile files. The following is a list of items that you may wish to check on the remote node: - You have an account and can login to the remote machine - Incorrect permissions on your home directory (should probably be 0755) - Incorrect permissions on your $HOME/.rhosts file (if you are using rsh -- they should probably be 0644) - You have an entry in the remote $HOME/.rhosts file (if you are using rsh) for the machine and username that you are running from - Your .cshrc/.profile must not print anything out to the standard error - Your .cshrc/.profile should set a correct TERM type - Your .cshrc/.profile should set the SHELL environment variable to your default shell Try invoking the following command at the unix command line: rsh panda2 -n tkill You will need to configure your local setup such that you will *not* be prompted for a password to invoke this command on the remote node. No output should be printed from the remote node before the output of the command is displayed. When you can get this command to execute successfully by hand, LAM will probably be able to function properly. ----------------------------------------------------------------------------- My /etc/hosts file: # Do not remove the following line, or various programs # that require network functionality will fail. 127.0.0.1 localhost.localdomain localhost 195.208.40.134 panda 192.168.100.1 panda1 192.168.100.2 panda2 192.168.100.3 panda3 192.168.100.4 panda4 192.168.100.5 panda5 192.168.100.6 panda6 192.168.100.7 panda7 192.168.100.8 panda8 192.168.100.9 panda9 192.168.100.10 panda10 192.168.100.11 panda11 192.168.100.12 panda12 192.168.100.13 panda13 192.168.100.14 panda14 192.168.100.15 panda15 192.168.100.16 panda16