<html><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /></head><body style='font-size: 10pt'>
<p>Dear Lorenzo,</p>
<blockquote type="cite" style="padding: 0 0.4em; border-left: #1010ff 2px solid; margin: 0">
<p>I'm quite sure that the pw code stops if you try to run with more pools than k-points !</p>
</blockquote>
<p>This doesn't seem to be the case? I ran a vc-relax and an scf (attached output) with these (terrible) parallelism settings, and they ran just fine.</p>
<p>Best,</p>
<p>Michael</p>
<p><br /></p>
<p id="reply-intro">On 2020-05-20 09:25, Lorenzo Paulatto wrote:</p>
<blockquote type="cite" style="padding: 0 0.4em; border-left: #1010ff 2px solid; margin: 0">
<div class="pre" style="margin: 0; padding: 0; font-family: monospace">
<blockquote type="cite" style="padding: 0 0.4em; border-left: #1010ff 2px solid; margin: 0">While I am quite sure that such a wasteful parallelization works anyway for the self-consistent code,</blockquote>
<br />I'm quite sure that the pw code stops if you try to run with more pools than k-points !<br /><br />
<blockquote type="cite" style="padding: 0 0.4em; border-left: #1010ff 2px solid; margin: 0"><span style="white-space: nowrap;">I am not equally sure it will for the phonon code. </span></blockquote>
<br />If the ph code does not stop in this case, I'm confident it will not work properly!<br /><br />cheers<br /><br />
<blockquote type="cite" style="padding: 0 0.4em; border-left: #1010ff 2px solid; margin: 0">It isn't presumably difficult to fix it, but I would move to a more sensible parallelization. For 20 k points and 32 processors, I would try 4 pools of 8 processors (mpirun -np 32<br /><span style="white-space: nowrap;"> ph.x -nk 4 ...)</span><br />Paolo<br /><br />On Tue, May 19, 2020 at 2:12 PM M.J. Hutcheon <<a href="mailto:mjh261@cam.ac.uk">mjh261@cam.ac.uk</a> <mailto:<a href="mailto:mjh261@cam.ac.uk">mjh261@cam.ac.uk</a>>> wrote:<br /><br /><span style="white-space: nowrap;"> Dear QE users/developers,</span><br /><br /><span style="white-space: nowrap;"> Following from the previous request, I've changed to a newer MPI</span><br /><span style="white-space: nowrap;"> library which gives a little more error information, specifically it</span><br /><span style="white-space: nowrap;"> does now crash with the following message:</span><br /><br /><span style="white-space: nowrap;"> An error occurred in MPI_Allreduce</span><br /><span style="white-space: nowrap;"> eported by process [1564540929,0]</span><br /><span style="white-space: nowrap;"> on communicator MPI COMMUNICATOR 6 SPLIT FROM 3</span><br /><span style="white-space: nowrap;"> MPI_ERR_TRUNCATE: message truncated</span><br /><span style="white-space: nowrap;"> MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,</span><br /><span style="white-space: nowrap;"> and potentially your MPI job)</span><br /><br /><span style="white-space: nowrap;"> It appears that this is thrown at the end of a self-consistent DFPT</span><br /><span style="white-space: nowrap;"> calculation (see the attached output file - it appears the final</span><br /><span style="white-space: nowrap;"> iteration has converged). I'm using the development version of QE,</span><br /><span style="white-space: nowrap;"> so I suspect that the error arises from somewhere inside</span><br /><span style="white-space: nowrap;"> <a href="https://gitlab.com/QEF/q-e/-/blob/develop/PHonon/PH/solve_linter.f90" target="_blank" rel="noopener noreferrer">https://gitlab.com/QEF/q-e/-/blob/develop/PHonon/PH/solve_linter.f90</a>.</span><br /><br /><span style="white-space: nowrap;"> I don't really know how to debug/workaround this further, any</span><br /><span style="white-space: nowrap;"> ideas/suggestions would be most welcome.</span><br /><br /><span style="white-space: nowrap;"> Best,</span><br /><br /><span style="white-space: nowrap;"> Michael Hutcheon</span><br /><br /><span style="white-space: nowrap;"> TCM group, University of Cambridge</span><br /><br /><br /><br /><span style="white-space: nowrap;"> On 2020-05-12 13:29, M.J. Hutcheon wrote:</span><br /><br />
<blockquote type="cite" style="padding: 0 0.4em; border-left: #1010ff 2px solid; margin: 0"><span style="white-space: nowrap;"> Dear QE users/developers,</span><br /><br /><span style="white-space: nowrap;"> I am running an electron-phonon coupling calculation at the gamma</span><br /><span style="white-space: nowrap;"> point for a large unit cell Calcium-Hydride (Output file</span><br /><span style="white-space: nowrap;"> attached). The calculation appears to get stuck during the DFPT</span><br /><span style="white-space: nowrap;"> stage. It does not crash, or produce any error files/output of any</span><br /><span style="white-space: nowrap;"> sort, or run out of walltime, but the calculation does not</span><br /><span style="white-space: nowrap;"> progress either. I have tried different parameter sets (k-point</span><br /><span style="white-space: nowrap;"> grids + cutoffs), which changes the representation where the</span><br /><span style="white-space: nowrap;"> calculation gets stuck, but it still gets stuck. I don't really</span><br /><span style="white-space: nowrap;"> know what to try next, short of compiling QE in debug mode and</span><br /><span style="white-space: nowrap;"> running under a debugger to see where it gets stuck. Any ideas</span><br /><span style="white-space: nowrap;"> before I head down this laborious route?</span><br /><br /><span style="white-space: nowrap;"> Many thanks,</span><br /><br /><span style="white-space: nowrap;"> Michael Hutcheon</span><br /><br /><span style="white-space: nowrap;"> TCM group, University of Cambridge</span><br /><br /></blockquote>
<br /><span style="white-space: nowrap;"> _______________________________________________</span><br /><span style="white-space: nowrap;"> Quantum ESPRESSO is supported by MaX</span><br /><span style="white-space: nowrap;"> (<a href="http://www.max-centre.eu/quantum-espresso" target="_blank" rel="noopener noreferrer">www.max-centre.eu/quantum-espresso</a></span><br /><span style="white-space: nowrap;"> <<a href="http://www.max-centre.eu/quantum-espresso" target="_blank" rel="noopener noreferrer">http://www.max-centre.eu/quantum-espresso</a>>)</span><br /><span style="white-space: nowrap;"> users mailing list <a href="mailto:users@lists.quantum-espresso.org">users@lists.quantum-espresso.org</a></span><br /><span style="white-space: nowrap;"> <mailto:<a href="mailto:users@lists.quantum-espresso.org">users@lists.quantum-espresso.org</a>></span><br /><span style="white-space: nowrap;"> <a href="https://lists.quantum-espresso.org/mailman/listinfo/users" target="_blank" rel="noopener noreferrer">https://lists.quantum-espresso.org/mailman/listinfo/users</a></span><br /><br /><br /><br />-- Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e Fisiche,<br /><span style="white-space: nowrap;">Univ. Udine, via delle Scienze 208, 33100 Udine, Italy</span><br /><span style="white-space: nowrap;">Phone +39-0432-558216, fax +39-0432-558222</span><br /><br /><br />_______________________________________________<br /><span style="white-space: nowrap;">Quantum ESPRESSO is supported by MaX (<a href="http://www.max-centre.eu/quantum-espresso" target="_blank" rel="noopener noreferrer">www.max-centre.eu/quantum-espresso</a>)</span><br /><span style="white-space: nowrap;">users mailing list <a href="mailto:users@lists.quantum-espresso.org">users@lists.quantum-espresso.org</a></span><br /><span style="white-space: nowrap;"><a href="https://lists.quantum-espresso.org/mailman/listinfo/users" target="_blank" rel="noopener noreferrer">https://lists.quantum-espresso.org/mailman/listinfo/users</a></span><br /><br /></blockquote>
</div>
</blockquote>
<p><br /></p>
</body></html>