[Pw_forum] pw.x slurm srun failed with intel mpi?
Rolly Ng
rollyng at gmail.com
Sun Jan 15 08:51:49 CET 2017
Dear QE users,
I have srun problem on ubuntu 16.04 cluster with intel mpi. Could you
please me to check what is going on? Thank you!
I am trying to install slurm in a cluster running ubuntu 16.04.
I am using intel mpi and the installation directory is located at the
head node /opt/intel/impi_5.01.
According to the slurm instruction, it needs to export the libpmi.so
variable.https://slurm.schedmd.com/mpi_guide.html#intel_mpi
But, I installed slurm-llnl via ubuntu
|sudo apt-get slurm-llnl |
and I am not sure where the libpmi.so is located? So, I did a search and
found a file here, is this the file I'm looking for?
|/usr/lib/x86_64-linux-gnu/libpmi.so |
Anyway, I export the variable and I tried
|srun -p old -N3 -n24 hostname |
It returns,
|rolly at head:~$ srun -p old -N3 -n24 hostname node02 node02 node02 node02
node02 node02 node02 node02 node01 node01 head head node01 head head
head node01 node01 head node01 head head node01 node01 |
It appears working.
But as I run my task,
|srun -p old -N3 -n24 ~/QE530-CPU/espresso-5.3.0/bin/pw.x |
It produced errors,
|mpiexec_node02: cannot connect to local mpd (/tmp/mpd2.console_rolly);
possible causes: 1. no mpd is running on this host 2. an mpd is running
but was started without a "console" (-n option) mpiexec_node02: cannot
connect to local mpd (/tmp/mpd2.console_rolly); possible causes: 1. no
mpd is running on this host 2. an mpd is running but was started without
a "console" (-n option) mpiexec_node01: cannot connect to local mpd
(/tmp/mpd2.console_rolly); possible causes: 1. no mpd is running on this
host 2. an mpd is running but was started without a "console" (-n
option) mpiexec_node01: cannot connect to local mpd
(/tmp/mpd2.console_rolly); possible causes: 1. no mpd is running on this
host 2. an mpd is running but was started without a "console" (-n
option) mpiexec_node02: cannot connect to local mpd
(/tmp/mpd2.console_rolly); possible causes: 1. no mpd is running on this
host 2. an mpd is running but was started without a "console" (-n
option) mpiexec_node02: cannot connect to local mpd
(/tmp/mpd2.console_rolly); possible causes: 1. no mpd is running on this
host 2. an mpd is running but was started without a "console" (-n
option) mpiexec_node02: cannot connect to local mpd
(/tmp/mpd2.console_rolly); possible causes: 1. no mpd is running on this
host 2. an mpd is running but was started without a "console" (-n
option) mpiexec_node02: cannot connect to local mpd
(/tmp/mpd2.console_rolly); possible causes: 1. no mpd is running on this
host 2. an mpd is running but was started without a "console" (-n
option) mpiexec_node02: cannot connect to local mpd
(/tmp/mpd2.console_rolly); possible causes: 1. no mpd is running on this
host 2. an mpd is running but was started without a "console" (-n
option) mpiexec_node01: cannot connect to local mpd
(/tmp/mpd2.console_rolly); possible causes: 1. no mpd is running on this
host 2. an mpd is running but was started without a "console" (-n
option) mpiexec_node01: cannot connect to local mpd
(/tmp/mpd2.console_rolly); possible causes: 1. no mpd is running on this
host 2. an mpd is running but was started without a "console" (-n
option) mpiexec_node02: cannot connect to local mpd
(/tmp/mpd2.console_rolly); possible causes: 1. no mpd is running on this
host 2. an mpd is running but was started without a "console" (-n
option) mpiexec_node01: cannot connect to local mpd
(/tmp/mpd2.console_rolly); possible causes: 1. no mpd is running on this
host 2. an mpd is running but was started without a "console" (-n
option) mpiexec_node01: cannot connect to local mpd
(/tmp/mpd2.console_rolly); possible causes: 1. no mpd is running on this
host 2. an mpd is running but was started without a "console" (-n
option) mpiexec_node01: cannot connect to local mpd
(/tmp/mpd2.console_rolly); possible causes: 1. no mpd is running on this
host 2. an mpd is running but was started without a "console" (-n
option) mpiexec_node01: cannot connect to local mpd
(/tmp/mpd2.console_rolly); possible causes: 1. no mpd is running on this
host 2. an mpd is running but was started without a "console" (-n option) |
I believe the error prompts are due to running mpiexec with intel-mpi,
it should be using mpirun instead.
I can confirm that by exporting the environmental variable, export
I_MPI_PMI_LIBRARY=/usr/lib/x86_64-linux-gnu/libpmi.so, kills the mpirun.
if this is set, mpirun -n 24 -ppn 8 -f ~/machines.LINUX
~/QE530-CPU/espresso-5.3.0/bin/pw.x fails. If it is removed, mpirun
works again.
How can I correct the problem?
--
PhD. Research Fellow,
Dept. of Physics & Materials Science,
City University of Hong Kong
Tel: +852 3442 4000
Fax: +852 3442 0538
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20170115/63bb8be0/attachment.html>
More information about the users
mailing list