[Pw_forum] error in the NSCF calculation with a large number of kpoints.
dingfu.shao
dingfu.shao at gmail.com
Tue Apr 22 12:24:35 CEST 2014
dear Axel and Paolo,
My situation is the nscf can not run at all with the very dense kpoints.
I used a small cluster of our group and I am the administrator of it, so I am sure it did not overload the computing resources.
I tested it with serveral situations. Such as that the calculation with or without mpi. When the number of kpoints is smaller, it runs in both situation.
When the number of kpoints is very large, using pw.x without mpi, the work runs. But when i use mpi, no matter how much of the processes used, it can not run at all, and lead to such errors immediately.
I used MPICH3. Maybe i should reconfigure it? Or use open-mpi instead?
Thanks a lot.
Yours Dingfu
Dingfu Shao, Ph.D
Institute of Solid State Physics
Chinese Academy of Sciences
P. O. Box 1129
Hefei 230031
Anhui Province
P. R. China
Email: dingfu.shao at gmail.com
From: Axel Kohlmeyer
Date: 2014-04-21 16:02
To: PWSCF Forum
Subject: Re: [Pw_forum] error in the NSCF calculation with a large number of kpoints.
On Mon, Apr 21, 2014 at 12:08 AM, Dingfu Shao <dingfu.shao at gmail.com> wrote:
> Dear QE users,
>
> I want to plot a fermi surface with a very dense kpoints, so that I
> can do some calculation such as the Fermi surface nesting function. A
> smaller kpoints number, such as 1200, works fine. But when I take the
> nscf calculation with a large kpoints number, such as 2000, the error
> happens as forllowing:
>
> [proxy:0:0 at node6] HYD_pmcd_pmip_control_cmd_cb
> (./pm/pmiserv/pmip_cb.c:939): process reading stdin too slowly; can't
> keep up
> [proxy:0:0 at node6] HYDT_dmxu_poll_wait_for_event
> (./tools/demux/demux_poll.c:77): callback returned error status
> [proxy:0:0 at node6] main (./pm/pmiserv/pmip.c:206): demux engine error
> waiting for event
> [mpiexec at node0] control_cb (./pm/pmiserv/pmiserv_cb.c:202): assert
> (!closed) failed
> [mpiexec at node0] HYDT_dmxu_poll_wait_for_event
> (./tools/demux/demux_poll.c:77): callback returned error status
> [mpiexec at node0] HYD_pmci_wait_for_completion
> (./pm/pmiserv/pmiserv_pmci.c:197): error waiting for event
> [mpiexec at node0] main (./ui/mpich/mpiexec.c:331): process manager error
> waiting for completion
>
>
>
> Does anybody know what is the porblem?
first of all, there are messages from your MPI library. my guess is
that you are overloading your compute node with i/o. probably caused
by excessive swapping. do you use a sufficient number of nodes and
k-point parallelization? you most likely best talk to your system
administrators, since overloading nodes can cause disruptions in the
overall service of the cluster and then work with them to run your
calculation better.
axel.
>
>
> Best regards,
>
> Yours Dingfu Shao
>
>
> --
>
> Dingfu Shao, Ph.D
>
> Institute of Solid State Physics
>
> Chinese Academy of Sciences
>
> P. O. Box 1129
>
> Hefei 230031
>
> Anhui Province
>
> P. R. China
>
> Email: dingfu.shao at gmail.com
>
> ________________________________
>
>
>
> _______________________________________________
> Pw_forum mailing list
> Pw_forum at pwscf.org
> http://pwscf.org/mailman/listinfo/pw_forum
--
Dr. Axel Kohlmeyer akohlmey at gmail.com http://goo.gl/1wk0
College of Science & Technology, Temple University, Philadelphia PA, USA
International Centre for Theoretical Physics, Trieste. Italy.
_______________________________________________
Pw_forum mailing list
Pw_forum at pwscf.org
http://pwscf.org/mailman/listinfo/pw_forum
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20140422/97890a31/attachment.html>
More information about the users
mailing list