[Pw_forum] Disk_io error using MPI

Fri Oct 23 21:59:24 CEST 2015

Calculations with many k-points may occasionally run into problems, due to
the opening of too many files and directories. Option "lkpoint_dir" may be
useful. In any event: none of the error messages you report comes from the
code itself, but euther from system libraries or from the operating system

Paolo

On Fri, Oct 23, 2015 at 5:55 PM, Cameron Foss <cjfoss at umass.edu> wrote:

> Hello,
>
> I am trying to extract the KS eigenvalues for a dense grid of 126040
> points using calculation='bands' preceded by a corresponding scf and nscf
> calculation. I have done this successfully along symmetry paths using a
> list of ~100 k-points. I am using espresso-5.1.2 with MPI and OpenMP
> enabled with an LSF scheduling system.
>
> Upon submitting the job (script below for case where scf and nscf have
> been completed) I get this error message in the .err file and the code does
> not run.
>
> [proxy:0:0 at c16b01] HYD_pmcd_pmip_control_cmd_cb
> (./pm/pmiserv/pmip_cb.c:966): process reading stdin too slowly; can't keep
> up
> [proxy:0:0 at c16b01] HYDT_dmxu_poll_wait_for_event
> (./tools/demux/demux_poll.c:77): callback returned error status
> [proxy:0:0 at c16b01] main (./pm/pmiserv/pmip.c:206): demux engine error
> waiting for event
>
> Furthermore I have tried splitting up the k-points into smaller pools
> (that is 4 individual runs consisting of 31510 k-points. 4*31510=126040)
> but I got the same error for 31510 k-points as well. I was informed that
> using the disk_io option may help however I have re-ran the scf+nscf
> calculations with disk_io='low' and the bands calculation with
> disk_io='none'. This did not solve the error stated above, in fact doing so
> introduced the following error after each scf and nscf calculation.
>
> libibverbs: Warning: couldn't load driver 'mlx5':
> /usr/lib64/libmlx5-rdmav2.so: symbol ibv_cmd_create_qp_ex, version
> IBVERBS_1.1 not defined in file libibverbs.so.1 with link time reference
> (Note: this message was printed once for each MPI process, 16 in this case)
>
> %%%%%%%
>
> #!/bin/sh
> #BSUB -J GrGrid
> #BSUB -o grGrid.out
> #BSUB -e grGrid.err
> #BSUB -q long
> #BSUB -W 500:00
> #BSUB -R select[ncpus=20]
> #BSUB -R rusage[mem=1024]
> #BSUB -R "span[hosts=1] affinity[core(1):distribute=pack]"
> #BSUB -L /bin/sh
>
> export OMP_NUM_THREADS=1
>
> module load gcc/4.7.4
> module load mvapich2/2.0a
>
> mpirun -n 16 ~/espresso-par/espresso-5.1.2/bin/pw.x <gr.bands.dense.in>
>  gr.bands.dense.out
>
> %%%%%%%%
> with the following submission cmd:
>
> $ bsub -n 20 < ./runscript
>
> I am unsure as to what the problem is exactly as I have not encountered
> such errors?
>
> Best,
> Cameron
>
> _______________________________________________
> Pw_forum mailing list
> Pw_forum at pwscf.org
> http://pwscf.org/mailman/listinfo/pw_forum
>

-- 
Paolo Giannozzi, Dept. Chemistry&Physics&Environment,
Univ. Udine, via delle Scienze 208, 33100 Udine, Italy
Phone +39-0432-558216, fax +39-0432-558222
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20151023/dd4ddae5/attachment.html>