[Pw_forum] error in the NSCF calculation with a large number of kpoints.

Tue Apr 22 12:24:35 CEST 2014

dear Axel and Paolo,

My situation is the nscf can not run at all with the very dense kpoints.

I used a small cluster of our group and I am  the administrator of it, so I am sure it did not overload the computing resources.

I tested it with serveral situations. Such as  that the calculation with or  without  mpi.  When the number of kpoints is smaller, it runs in both situation.

When the number of kpoints is very large, using pw.x without mpi, the work runs.  But when i use mpi, no matter how much of the processes used, it can not run at all, and lead to such errors immediately. 

I used MPICH3. Maybe i should reconfigure it? Or use open-mpi instead?

Thanks a lot.

Yours Dingfu 

Dingfu Shao, Ph.D
 Institute of Solid State Physics
 Chinese Academy of Sciences
P. O. Box 1129
 Hefei 230031
 Anhui Province
P. R. China

Email: dingfu.shao at gmail.com

From: Axel Kohlmeyer
Date: 2014-04-21 16:02
To: PWSCF Forum
Subject: Re: [Pw_forum] error in the NSCF calculation with a large number of kpoints.
On Mon, Apr 21, 2014 at 12:08 AM, Dingfu Shao <dingfu.shao at gmail.com> wrote:
> Dear QE users,
>
> I want to plot a fermi surface with a very dense kpoints, so that I
> can do some calculation such as the Fermi surface nesting function. A
> smaller kpoints number, such as 1200, works fine. But when I take the
> nscf calculation with a large kpoints number, such as 2000, the error
> happens as forllowing:
>
> [proxy:0:0 at node6] HYD_pmcd_pmip_control_cmd_cb
> (./pm/pmiserv/pmip_cb.c:939): process reading stdin too slowly; can't
> keep up
> [proxy:0:0 at node6] HYDT_dmxu_poll_wait_for_event
> (./tools/demux/demux_poll.c:77): callback returned error status
> [proxy:0:0 at node6] main (./pm/pmiserv/pmip.c:206): demux engine error
> waiting for event
> [mpiexec at node0] control_cb (./pm/pmiserv/pmiserv_cb.c:202): assert
> (!closed) failed
> [mpiexec at node0] HYDT_dmxu_poll_wait_for_event
> (./tools/demux/demux_poll.c:77): callback returned error status
> [mpiexec at node0] HYD_pmci_wait_for_completion
> (./pm/pmiserv/pmiserv_pmci.c:197): error waiting for event
> [mpiexec at node0] main (./ui/mpich/mpiexec.c:331): process manager error
> waiting for completion
>
>
>
> Does anybody know what is the porblem?

first of all, there are messages from your MPI library. my guess is
that you are overloading your compute node with i/o. probably caused
by excessive swapping. do you use a sufficient number of nodes and
k-point parallelization? you most likely best talk to your system
administrators, since overloading nodes can cause disruptions in the
overall service of the cluster and then work with them to run your
calculation better.

axel.

>
>
> Best regards,
>
> Yours Dingfu Shao
>
>
> --
>
> Dingfu Shao, Ph.D
>
> Institute of Solid State Physics
>
> Chinese Academy of Sciences
>
> P. O. Box 1129
>
> Hefei 230031
>
> Anhui Province
>
> P. R. China
>
> Email: dingfu.shao at gmail.com
>
> ________________________________
>
>
>
> _______________________________________________
> Pw_forum mailing list
> Pw_forum at pwscf.org
> http://pwscf.org/mailman/listinfo/pw_forum

-- 
Dr. Axel Kohlmeyer  akohlmey at gmail.com  http://goo.gl/1wk0
College of Science & Technology, Temple University, Philadelphia PA, USA
International Centre for Theoretical Physics, Trieste. Italy.
_______________________________________________
Pw_forum mailing list
Pw_forum at pwscf.org
http://pwscf.org/mailman/listinfo/pw_forum
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20140422/97890a31/attachment.html>