<div dir="ltr"><div>Yes, I changed it recently:</div>   <a href="https://gitlab.com/QEF/q-e/-/commit/334e70c7c6c61f5a16fc5d9027fed52bcf0ffdcf">https://gitlab.com/QEF/q-e/-/commit/334e70c7c6c61f5a16fc5d9027fed52bcf0ffdcf</a><div> I was fed up with automated tests crashing if k-point parallelization was used.</div><div><br></div><div>Paolo<br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, May 20, 2020 at 10:54 AM Lorenzo Paulatto <<a href="mailto:paulatz@gmail.com">paulatz@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">You are right, in the sense that now the code just writes<br>

"suboptimal parallelization: some nodes have no k-points"<br>

I'm quite sure I remember the code stopping because it was run with more <br>

pools than k-points, was this changed recently, Paolo?<br>

<br>

<br>

On 5/20/20 10:34 AM, M.J. Hutcheon wrote:<br>

> Dear Lorenzo,<br>

> <br>

>> I'm quite sure that the pw code stops if you try to run with more <br>

>> pools than k-points !<br>

>><br>

> This doesn't seem to be the case? I ran a vc-relax and an scf (attached <br>

> output) with these (terrible) parallelism settings, and they ran just fine.<br>

> <br>

> Best,<br>

> <br>

> Michael<br>

> <br>

> <br>

> On 2020-05-20 09:25, Lorenzo Paulatto wrote:<br>

> <br>

>>> While I am quite sure that such a wasteful parallelization works <br>

>>> anyway for the self-consistent code,<br>

>><br>

>> I'm quite sure that the pw code stops if you try to run with more <br>

>> pools than k-points !<br>

>><br>

>>> I am not equally sure it will for the phonon code. <br>

>><br>

>> If the ph code does not stop in this case, I'm confident it will not <br>

>> work properly!<br>

>><br>

>> cheers<br>

>><br>

>>> It isn't presumably difficult to fix it, but I would move to a more <br>

>>> sensible parallelization. For 20 k points and 32 processors, I would <br>

>>> try 4 pools of 8 processors (mpirun -np 32<br>

>>>   ph.x -nk 4 ...)<br>

>>> Paolo<br>

>>><br>

>>> On Tue, May 19, 2020 at 2:12 PM M.J. Hutcheon <<a href="mailto:mjh261@cam.ac.uk" target="_blank">mjh261@cam.ac.uk</a> <br>

>>> <mailto:<a href="mailto:mjh261@cam.ac.uk" target="_blank">mjh261@cam.ac.uk</a>> <mailto:<a href="mailto:mjh261@cam.ac.uk" target="_blank">mjh261@cam.ac.uk</a> <br>

>>> <mailto:<a href="mailto:mjh261@cam.ac.uk" target="_blank">mjh261@cam.ac.uk</a>>>> wrote:<br>

>>><br>

>>>     Dear QE users/developers,<br>

>>><br>

>>>     Following from the previous request, I've changed to a newer MPI<br>

>>>     library which gives a little more error information, specifically it<br>

>>>     does now crash with the following message:<br>

>>><br>

>>>     An error occurred in MPI_Allreduce<br>

>>>     eported by process [1564540929,0]<br>

>>>     on communicator MPI COMMUNICATOR 6 SPLIT FROM 3<br>

>>>     MPI_ERR_TRUNCATE: message truncated<br>

>>>     MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,<br>

>>>     and potentially your MPI job)<br>

>>><br>

>>>     It appears that this is thrown at the end of a self-consistent DFPT<br>

>>>     calculation (see the attached output file - it appears the final<br>

>>>     iteration has converged). I'm using the development version of QE,<br>

>>>     so I suspect that the error arises from somewhere inside<br>

>>> <a href="https://gitlab.com/QEF/q-e/-/blob/develop/PHonon/PH/solve_linter.f90" rel="noreferrer" target="_blank">https://gitlab.com/QEF/q-e/-/blob/develop/PHonon/PH/solve_linter.f90</a>.<br>

>>><br>

>>>     I don't really know how to debug/workaround this further, any<br>

>>>     ideas/suggestions would be most welcome.<br>

>>><br>

>>>     Best,<br>

>>><br>

>>>     Michael Hutcheon<br>

>>><br>

>>>     TCM group, University of Cambridge<br>

>>><br>

>>><br>

>>><br>

>>>     On 2020-05-12 13:29, M.J. Hutcheon wrote:<br>

>>><br>

>>>>     Dear QE users/developers,<br>

>>>><br>

>>>>     I am running an electron-phonon coupling calculation at the gamma<br>

>>>>     point for a large unit cell Calcium-Hydride (Output file<br>

>>>>     attached). The calculation appears to get stuck during the DFPT<br>

>>>>     stage. It does not crash, or produce any error files/output of any<br>

>>>>     sort, or run out of walltime, but the calculation does not<br>

>>>>     progress either. I have tried different parameter sets (k-point<br>

>>>>     grids + cutoffs), which changes the representation where the<br>

>>>>     calculation gets stuck, but it still gets stuck. I don't really<br>

>>>>     know what to try next, short of compiling QE in debug mode and<br>

>>>>     running under a debugger to see where it gets stuck. Any ideas<br>

>>>>     before I head down this laborious route?<br>

>>>><br>

>>>>     Many thanks,<br>

>>>><br>

>>>>     Michael Hutcheon<br>

>>>><br>

>>>>     TCM group, University of Cambridge<br>

>>>><br>

>>><br>

>>>     _______________________________________________<br>

>>>     Quantum ESPRESSO is supported by MaX<br>

>>>     (<a href="http://www.max-centre.eu/quantum-espresso" rel="noreferrer" target="_blank">www.max-centre.eu/quantum-espresso</a> <br>

>>> <<a href="http://www.max-centre.eu/quantum-espresso" rel="noreferrer" target="_blank">http://www.max-centre.eu/quantum-espresso</a>><br>

>>>     <<a href="http://www.max-centre.eu/quantum-espresso" rel="noreferrer" target="_blank">http://www.max-centre.eu/quantum-espresso</a>>)<br>

>>>     users mailing list <a href="mailto:users@lists.quantum-espresso.org" target="_blank">users@lists.quantum-espresso.org</a> <br>

>>> <mailto:<a href="mailto:users@lists.quantum-espresso.org" target="_blank">users@lists.quantum-espresso.org</a>><br>

>>>     <mailto:<a href="mailto:users@lists.quantum-espresso.org" target="_blank">users@lists.quantum-espresso.org</a> <br>

>>> <mailto:<a href="mailto:users@lists.quantum-espresso.org" target="_blank">users@lists.quantum-espresso.org</a>>><br>

>>> <a href="https://lists.quantum-espresso.org/mailman/listinfo/users" rel="noreferrer" target="_blank">https://lists.quantum-espresso.org/mailman/listinfo/users</a><br>

>>><br>

>>><br>

>>><br>

>>> -- Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e Fisiche,<br>

>>> Univ. Udine, via delle Scienze 208, 33100 Udine, Italy<br>

>>> Phone +39-0432-558216, fax +39-0432-558222<br>

>>><br>

>>><br>

>>> _______________________________________________<br>

>>> Quantum ESPRESSO is supported by MaX (<a href="http://www.max-centre.eu/quantum-espresso" rel="noreferrer" target="_blank">www.max-centre.eu/quantum-espresso</a> <br>

>>> <<a href="http://www.max-centre.eu/quantum-espresso" rel="noreferrer" target="_blank">http://www.max-centre.eu/quantum-espresso</a>>)<br>

>>> users mailing list <a href="mailto:users@lists.quantum-espresso.org" target="_blank">users@lists.quantum-espresso.org</a> <br>

>>> <mailto:<a href="mailto:users@lists.quantum-espresso.org" target="_blank">users@lists.quantum-espresso.org</a>><br>

>>> <a href="https://lists.quantum-espresso.org/mailman/listinfo/users" rel="noreferrer" target="_blank">https://lists.quantum-espresso.org/mailman/listinfo/users</a><br>

>>><br>

> <br>

<br>

-- <br>

Lorenzo Paulatto - Paris<br>

_______________________________________________<br>

Quantum ESPRESSO is supported by MaX (<a href="http://www.max-centre.eu/quantum-espresso" rel="noreferrer" target="_blank">www.max-centre.eu/quantum-espresso</a>)<br>

users mailing list <a href="mailto:users@lists.quantum-espresso.org" target="_blank">users@lists.quantum-espresso.org</a><br>

<a href="https://lists.quantum-espresso.org/mailman/listinfo/users" rel="noreferrer" target="_blank">https://lists.quantum-espresso.org/mailman/listinfo/users</a></blockquote></div><br clear="all"><br>-- <br><div dir="ltr" class="gmail_signature"><div dir="ltr"><div><div dir="ltr"><div>Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e Fisiche,<br>Univ. Udine, via delle Scienze 208, 33100 Udine, Italy<br>Phone +39-0432-558216, fax +39-0432-558222<br><br></div></div></div></div></div>