[QE-users] DFPT getting stuck [MPI_ERR_TRUNCATE]

Lorenzo Paulatto paulatz at gmail.com
Wed May 20 10:25:10 CEST 2020


> While I am quite 
> sure that such a wasteful parallelization works anyway for the 
> self-consistent code, 

I'm quite sure that the pw code stops if you try to run with more pools 
than k-points !

> I am not equally sure it will for the phonon code. 

If the ph code does not stop in this case, I'm confident it will not 
work properly!

cheers

> It isn't presumably difficult to fix it, but I would move to a more 
> sensible parallelization. For 20 k points and 32 processors, I would try 
> 4 pools of 8 processors (mpirun -np 32
>   ph.x -nk 4 ...)
> Paolo
> 
> On Tue, May 19, 2020 at 2:12 PM M.J. Hutcheon <mjh261 at cam.ac.uk 
> <mailto:mjh261 at cam.ac.uk>> wrote:
> 
>     Dear QE users/developers,
> 
>     Following from the previous request, I've changed to a newer MPI
>     library which gives a little more error information, specifically it
>     does now crash with the following message:
> 
>     An error occurred in MPI_Allreduce
>     eported by process [1564540929,0]
>     on communicator MPI COMMUNICATOR 6 SPLIT FROM 3
>     MPI_ERR_TRUNCATE: message truncated
>     MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
>     and potentially your MPI job)
> 
>     It appears that this is thrown at the end of a self-consistent DFPT
>     calculation (see the attached output file - it appears the final
>     iteration has converged). I'm using the development version of QE,
>     so I suspect that the error arises from somewhere inside
>     https://gitlab.com/QEF/q-e/-/blob/develop/PHonon/PH/solve_linter.f90.
> 
>     I don't really know how to debug/workaround this further, any
>     ideas/suggestions would be most welcome.
> 
>     Best,
> 
>     Michael Hutcheon
> 
>     TCM group, University of Cambridge
> 
> 
> 
>     On 2020-05-12 13:29, M.J. Hutcheon wrote:
> 
>>     Dear QE users/developers,
>>
>>     I am running an electron-phonon coupling calculation at the gamma
>>     point for a large unit cell Calcium-Hydride (Output file
>>     attached). The calculation appears to get stuck during the DFPT
>>     stage. It does not crash, or produce any error files/output of any
>>     sort, or run out of walltime, but the calculation does not
>>     progress either. I have tried different parameter sets (k-point
>>     grids + cutoffs), which changes the representation where the
>>     calculation gets stuck, but it still gets stuck. I don't really
>>     know what to try next, short of compiling QE in debug mode and
>>     running under a debugger to see where it gets stuck. Any ideas
>>     before I head down this laborious route?
>>
>>     Many thanks,
>>
>>     Michael Hutcheon
>>
>>     TCM group, University of Cambridge
>>
> 
>     _______________________________________________
>     Quantum ESPRESSO is supported by MaX
>     (www.max-centre.eu/quantum-espresso
>     <http://www.max-centre.eu/quantum-espresso>)
>     users mailing list users at lists.quantum-espresso.org
>     <mailto:users at lists.quantum-espresso.org>
>     https://lists.quantum-espresso.org/mailman/listinfo/users
> 
> 
> 
> -- 
> Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e Fisiche,
> Univ. Udine, via delle Scienze 208, 33100 Udine, Italy
> Phone +39-0432-558216, fax +39-0432-558222
> 
> 
> _______________________________________________
> Quantum ESPRESSO is supported by MaX (www.max-centre.eu/quantum-espresso)
> users mailing list users at lists.quantum-espresso.org
> https://lists.quantum-espresso.org/mailman/listinfo/users
> 

-- 
Lorenzo Paulatto - Paris


More information about the users mailing list