[QE-users] DFPT getting stuck [MPI_ERR_TRUNCATE]

M.J. Hutcheon mjh261 at cam.ac.uk
Tue May 19 14:12:02 CEST 2020


Dear QE users/developers, 

Following from the previous request, I've changed to a newer MPI library
which gives a little more error information, specifically it does now
crash with the following message: 

An error occurred in MPI_Allreduce
eported by process [1564540929,0]
on communicator MPI COMMUNICATOR 6 SPLIT FROM 3
MPI_ERR_TRUNCATE: message truncated
MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, and
potentially your MPI job) 

It appears that this is thrown at the end of a self-consistent DFPT
calculation (see the attached output file - it appears the final
iteration has converged). I'm using the development version of QE, so I
suspect that the error arises from somewhere inside
https://gitlab.com/QEF/q-e/-/blob/develop/PHonon/PH/solve_linter.f90. 

I don't really know how to debug/workaround this further, any
ideas/suggestions would be most welcome. 

Best, 

Michael Hutcheon 

TCM group, University of Cambridge 

On 2020-05-12 13:29, M.J. Hutcheon wrote:

> Dear QE users/developers, 
> 
> I am running an electron-phonon coupling calculation at the gamma point for a large unit cell Calcium-Hydride (Output file attached). The calculation appears to get stuck during the DFPT stage. It does not crash, or produce any error files/output of any sort, or run out of walltime, but the calculation does not progress either. I have tried different parameter sets (k-point grids + cutoffs), which changes the representation where the calculation gets stuck, but it still gets stuck. I don't really know what to try next, short of compiling QE in debug mode and running under a debugger to see where it gets stuck. Any ideas before I head down this laborious route? 
> 
> Many thanks, 
> 
> Michael Hutcheon 
> 
> TCM group, University of Cambridge
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20200519/2a7ff566/attachment.html>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: elph.out
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20200519/2a7ff566/attachment.ksh>


More information about the users mailing list