[Pw_forum] pw.x slurm srun failed with intel mpi?

Rolly Ng rollyng at gmail.com
Sun Jan 15 08:51:49 CET 2017


Dear QE users,

I have srun problem on ubuntu 16.04 cluster with intel mpi. Could you 
please me to check what is going on? Thank you!

I am trying to install slurm in a cluster running ubuntu 16.04.

I am using intel mpi and the installation directory is located at the 
head node /opt/intel/impi_5.01.

According to the slurm instruction, it needs to export the libpmi.so 
variable.https://slurm.schedmd.com/mpi_guide.html#intel_mpi

But, I installed slurm-llnl via ubuntu

|sudo apt-get slurm-llnl |

and I am not sure where the libpmi.so is located? So, I did a search and 
found a file here, is this the file I'm looking for?

|/usr/lib/x86_64-linux-gnu/libpmi.so |

Anyway, I export the variable and I tried

|srun -p old -N3 -n24 hostname |

It returns,

|rolly at head:~$ srun -p old -N3 -n24 hostname node02 node02 node02 node02 
node02 node02 node02 node02 node01 node01 head head node01 head head 
head node01 node01 head node01 head head node01 node01 |

It appears working.

But as I run my task,

|srun -p old -N3 -n24 ~/QE530-CPU/espresso-5.3.0/bin/pw.x |

It produced errors,

|mpiexec_node02: cannot connect to local mpd (/tmp/mpd2.console_rolly); 
possible causes: 1. no mpd is running on this host 2. an mpd is running 
but was started without a "console" (-n option) mpiexec_node02: cannot 
connect to local mpd (/tmp/mpd2.console_rolly); possible causes: 1. no 
mpd is running on this host 2. an mpd is running but was started without 
a "console" (-n option) mpiexec_node01: cannot connect to local mpd 
(/tmp/mpd2.console_rolly); possible causes: 1. no mpd is running on this 
host 2. an mpd is running but was started without a "console" (-n 
option) mpiexec_node01: cannot connect to local mpd 
(/tmp/mpd2.console_rolly); possible causes: 1. no mpd is running on this 
host 2. an mpd is running but was started without a "console" (-n 
option) mpiexec_node02: cannot connect to local mpd 
(/tmp/mpd2.console_rolly); possible causes: 1. no mpd is running on this 
host 2. an mpd is running but was started without a "console" (-n 
option) mpiexec_node02: cannot connect to local mpd 
(/tmp/mpd2.console_rolly); possible causes: 1. no mpd is running on this 
host 2. an mpd is running but was started without a "console" (-n 
option) mpiexec_node02: cannot connect to local mpd 
(/tmp/mpd2.console_rolly); possible causes: 1. no mpd is running on this 
host 2. an mpd is running but was started without a "console" (-n 
option) mpiexec_node02: cannot connect to local mpd 
(/tmp/mpd2.console_rolly); possible causes: 1. no mpd is running on this 
host 2. an mpd is running but was started without a "console" (-n 
option) mpiexec_node02: cannot connect to local mpd 
(/tmp/mpd2.console_rolly); possible causes: 1. no mpd is running on this 
host 2. an mpd is running but was started without a "console" (-n 
option) mpiexec_node01: cannot connect to local mpd 
(/tmp/mpd2.console_rolly); possible causes: 1. no mpd is running on this 
host 2. an mpd is running but was started without a "console" (-n 
option) mpiexec_node01: cannot connect to local mpd 
(/tmp/mpd2.console_rolly); possible causes: 1. no mpd is running on this 
host 2. an mpd is running but was started without a "console" (-n 
option) mpiexec_node02: cannot connect to local mpd 
(/tmp/mpd2.console_rolly); possible causes: 1. no mpd is running on this 
host 2. an mpd is running but was started without a "console" (-n 
option) mpiexec_node01: cannot connect to local mpd 
(/tmp/mpd2.console_rolly); possible causes: 1. no mpd is running on this 
host 2. an mpd is running but was started without a "console" (-n 
option) mpiexec_node01: cannot connect to local mpd 
(/tmp/mpd2.console_rolly); possible causes: 1. no mpd is running on this 
host 2. an mpd is running but was started without a "console" (-n 
option) mpiexec_node01: cannot connect to local mpd 
(/tmp/mpd2.console_rolly); possible causes: 1. no mpd is running on this 
host 2. an mpd is running but was started without a "console" (-n 
option) mpiexec_node01: cannot connect to local mpd 
(/tmp/mpd2.console_rolly); possible causes: 1. no mpd is running on this 
host 2. an mpd is running but was started without a "console" (-n option) |

I believe the error prompts are due to running mpiexec with intel-mpi, 
it should be using mpirun instead.

I can confirm that by exporting the environmental variable, export 
I_MPI_PMI_LIBRARY=/usr/lib/x86_64-linux-gnu/libpmi.so, kills the mpirun. 
if this is set, mpirun -n 24 -ppn 8 -f ~/machines.LINUX 
~/QE530-CPU/espresso-5.3.0/bin/pw.x fails. If it is removed, mpirun 
works again.

How can I correct the problem?

-- 
PhD. Research Fellow,
Dept. of Physics & Materials Science,
City University of Hong Kong
Tel: +852 3442 4000
Fax: +852 3442 0538

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20170115/63bb8be0/attachment.html>


More information about the users mailing list