[QE-users] DFPT getting stuck [MPI_ERR_TRUNCATE]
M.J. Hutcheon
mjh261 at cam.ac.uk
Wed May 20 10:34:36 CEST 2020
Dear Lorenzo,
> I'm quite sure that the pw code stops if you try to run with more pools than k-points !
This doesn't seem to be the case? I ran a vc-relax and an scf (attached
output) with these (terrible) parallelism settings, and they ran just
fine.
Best,
Michael
On 2020-05-20 09:25, Lorenzo Paulatto wrote:
>> While I am quite sure that such a wasteful parallelization works anyway for the self-consistent code,
>
> I'm quite sure that the pw code stops if you try to run with more pools than k-points !
>
>> I am not equally sure it will for the phonon code.
>
> If the ph code does not stop in this case, I'm confident it will not work properly!
>
> cheers
>
> It isn't presumably difficult to fix it, but I would move to a more sensible parallelization. For 20 k points and 32 processors, I would try 4 pools of 8 processors (mpirun -np 32
> ph.x -nk 4 ...)
> Paolo
>
> On Tue, May 19, 2020 at 2:12 PM M.J. Hutcheon <mjh261 at cam.ac.uk <mailto:mjh261 at cam.ac.uk>> wrote:
>
> Dear QE users/developers,
>
> Following from the previous request, I've changed to a newer MPI
> library which gives a little more error information, specifically it
> does now crash with the following message:
>
> An error occurred in MPI_Allreduce
> eported by process [1564540929,0]
> on communicator MPI COMMUNICATOR 6 SPLIT FROM 3
> MPI_ERR_TRUNCATE: message truncated
> MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
> and potentially your MPI job)
>
> It appears that this is thrown at the end of a self-consistent DFPT
> calculation (see the attached output file - it appears the final
> iteration has converged). I'm using the development version of QE,
> so I suspect that the error arises from somewhere inside
> https://gitlab.com/QEF/q-e/-/blob/develop/PHonon/PH/solve_linter.f90.
>
> I don't really know how to debug/workaround this further, any
> ideas/suggestions would be most welcome.
>
> Best,
>
> Michael Hutcheon
>
> TCM group, University of Cambridge
>
> On 2020-05-12 13:29, M.J. Hutcheon wrote:
>
> Dear QE users/developers,
>
> I am running an electron-phonon coupling calculation at the gamma
> point for a large unit cell Calcium-Hydride (Output file
> attached). The calculation appears to get stuck during the DFPT
> stage. It does not crash, or produce any error files/output of any
> sort, or run out of walltime, but the calculation does not
> progress either. I have tried different parameter sets (k-point
> grids + cutoffs), which changes the representation where the
> calculation gets stuck, but it still gets stuck. I don't really
> know what to try next, short of compiling QE in debug mode and
> running under a debugger to see where it gets stuck. Any ideas
> before I head down this laborious route?
>
> Many thanks,
>
> Michael Hutcheon
>
> TCM group, University of Cambridge
>
> _______________________________________________
> Quantum ESPRESSO is supported by MaX
> (www.max-centre.eu/quantum-espresso [1]
> <http://www.max-centre.eu/quantum-espresso>)
> users mailing list users at lists.quantum-espresso.org
> <mailto:users at lists.quantum-espresso.org>
> https://lists.quantum-espresso.org/mailman/listinfo/users
>
> -- Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e Fisiche,
> Univ. Udine, via delle Scienze 208, 33100 Udine, Italy
> Phone +39-0432-558216, fax +39-0432-558222
>
> _______________________________________________
> Quantum ESPRESSO is supported by MaX (www.max-centre.eu/quantum-espresso [1])
> users mailing list users at lists.quantum-espresso.org
> https://lists.quantum-espresso.org/mailman/listinfo/users
Links:
------
[1] http://www.max-centre.eu/quantum-espresso
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20200520/dbf328d3/attachment.html>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: scf.out
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20200520/dbf328d3/attachment.ksh>
More information about the users
mailing list