[Pw_forum] Disk_io error using MPI

Cameron Foss cjfoss at umass.edu
Fri Oct 23 17:55:38 CEST 2015


Hello,

I am trying to extract the KS eigenvalues for a dense grid of 126040 points
using calculation='bands' preceded by a corresponding scf and nscf
calculation. I have done this successfully along symmetry paths using a
list of ~100 k-points. I am using espresso-5.1.2 with MPI and OpenMP
enabled with an LSF scheduling system.

Upon submitting the job (script below for case where scf and nscf have been
completed) I get this error message in the .err file and the code does not
run.

[proxy:0:0 at c16b01] HYD_pmcd_pmip_control_cmd_cb
(./pm/pmiserv/pmip_cb.c:966): process reading stdin too slowly; can't keep
up
[proxy:0:0 at c16b01] HYDT_dmxu_poll_wait_for_event
(./tools/demux/demux_poll.c:77): callback returned error status
[proxy:0:0 at c16b01] main (./pm/pmiserv/pmip.c:206): demux engine error
waiting for event

Furthermore I have tried splitting up the k-points into smaller pools (that
is 4 individual runs consisting of 31510 k-points. 4*31510=126040) but I
got the same error for 31510 k-points as well. I was informed that using
the disk_io option may help however I have re-ran the scf+nscf calculations
with disk_io='low' and the bands calculation with disk_io='none'. This did
not solve the error stated above, in fact doing so introduced the following
error after each scf and nscf calculation.

libibverbs: Warning: couldn't load driver 'mlx5':
/usr/lib64/libmlx5-rdmav2.so: symbol ibv_cmd_create_qp_ex, version
IBVERBS_1.1 not defined in file libibverbs.so.1 with link time reference
(Note: this message was printed once for each MPI process, 16 in this case)

%%%%%%%

#!/bin/sh
#BSUB -J GrGrid
#BSUB -o grGrid.out
#BSUB -e grGrid.err
#BSUB -q long
#BSUB -W 500:00
#BSUB -R select[ncpus=20]
#BSUB -R rusage[mem=1024]
#BSUB -R "span[hosts=1] affinity[core(1):distribute=pack]"
#BSUB -L /bin/sh

export OMP_NUM_THREADS=1

module load gcc/4.7.4
module load mvapich2/2.0a

mpirun -n 16 ~/espresso-par/espresso-5.1.2/bin/pw.x <gr.bands.dense.in>
 gr.bands.dense.out

%%%%%%%%
with the following submission cmd:

$ bsub -n 20 < ./runscript

I am unsure as to what the problem is exactly as I have not encountered
such errors?

Best,
Cameron
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20151023/8a46be2c/attachment.html>


More information about the users mailing list