[QE-users] Running efficiently on multiple nodes

Antoine Jay ajay at laas.fr
Fri Nov 6 08:14:44 CET 2020


Dear Brad,
I can only confirm what Paolo and Michal suggested.
Even with infiniband the efficiency of the FFT parallelization drastically decreases at each new node, WHATEVER THE CODE (not only QE) or the librairy.
For SLURM jobs, if you ask 2 nodes of 16 cores, the first 16 are indexed 1 to 16 and 16 last 17-32, that is exactlly the same repartition implemented in QE for k points, bands or images parallelization.
Thanks to this, I never face trouble concerning the way the mpi processes are spread to the cores when the number of pools (or images) equals the number of nodes.
For these reason, except for large supercells at gamma only, I always do npool=nodes

Regards,
Antoine Jay
LAAS CNRS
Toulouse France

Le Vendredi, Novembre 06, 2020 01:04 CET, Michal Krompiec via users <users at lists.quantum-espresso.org> a écrit:
 Dear Brad,Fast communications means here Infiniband or other RDMA. Make sure your MPI uses RDMA, I’ve seen systems where it isn’t enabled by default. That said, if you use k-point parallelization you can get away with gigabit ethernet as Paolo mentioned.Best wishes,Michal KrompiecMerck KGaA  On Thu, Nov 5, 2020 at 11:40 PM Baer, Bradly via users <users at lists.quantum-espresso.org> wrote:Paolo, I believe the nodes I am using have gigabit connections. There are additional nodes that have 10 or 25 gigabit connections but I don't think I would land on one of them without specifically requesting them.  What communication speed would be appropriate for QE's needs? I also did consider trying to manually set the parallelization but I don't currently know enough about SLURM to identify each node and ensure that all 16 cores assigned from a pool are on the same node.  I will keep it in mind though as a possible future solution. Thanks,Brad --------------------------------------------------------Bradly BaerGraduate Research Assistant, Walker LabInterdisciplinary Materials ScienceVanderbilt University   ______________________________________________________________________________
From: Paolo Giannozzi <p.giannozzi at gmail.com>
Sent: Thursday, November 5, 2020 3:54 PM
To: Baer, Bradly <bradly.b.baer at Vanderbilt.Edu>; Quantum ESPRESSO users Forum <users at lists.quantum-espresso.org>
Subject: Re: [QE-users] Running efficiently on multiple nodes Are there fast communications between the two nodes? if not, the parallel distributed 3D FFT will be very slow (note the time taken by fft_scatt_yz). You might find convenient to exploit k-point parallelization, that requires much less communication: for instance, "mpirun -n 32 pw.x -nk 2" (2 pools of 16 processors, each pool performing parallel FFT), but you have to figure out a way to convince the first pool of 16 processors on node 1, the second on node 2 (or vice versa, as long as FFT parallelization happens inside a node, k-point parallelization across nodes ) Paolo On Thu, Nov 5, 2020 at 7:29 PM Baer, Bradly via users <users at lists.quantum-espresso.org> wrote:Paolo, Thank you for your suggestion.  I will add recompiling to move to 6.6 to my to do list.  For now, I corrected the pseudopotential files as you indicated and the calculation ran successfully.  It has become slightly faster, but still much slower than running on a single node (3:30s vs 0:30s).  Is there more that I should be doing to improve performance or is my test problem too small to see the benefits of parallelization? Thanks,Brad  --------------------------------------------------------Bradly BaerGraduate Research Assistant, Walker LabInterdisciplinary Materials ScienceVanderbilt University   ______________________________________________________________________________
From: users <users-bounces at lists.quantum-espresso.org> on behalf of Paolo Giannozzi <p.giannozzi at gmail.com>
Sent: Thursday, November 5, 2020 10:01 AM
To: Quantum ESPRESSO users Forum <users at lists.quantum-espresso.org>
Subject: Re: [QE-users] Running efficiently on multiple nodes On Thu, Nov 5, 2020 at 3:05 PM Baer, Bradly <bradly.b.baer at vanderbilt.edu> wrote: Pseudo file Ga.pbe-dn-kjpaw_psl.1.0.0.UPF has been fixed on the fly.To avoid this message in the future, permanently fix  your pseudo files following these instructions: https://gitlab.com/QEF/q-e/blob/master/upftools/how_to_fix_upf.md This is a possible source of trouble if the output directory is not visible to all processors. Please try one of the following:- do what it is suggested (or simply: edit Ga.pbe-dn-kjpaw_psl.1.0.0.UPF, replace all occurrences of "&" with "&")- get version 6.6, that reads the pseudopotential file on one processor and broadcast its contents to all other processes- get the development version, that in addition is not sensitive to the presence of nonstandard "&" in the files, Paolo 
-Brad --------------------------------------------------------Bradly BaerGraduate Research Assistant, Walker LabInterdisciplinary Materials ScienceVanderbilt University   ______________________________________________________________________________
From: users <users-bounces at lists.quantum-espresso.org> on behalf of Paolo Giannozzi <p.giannozzi at gmail.com>
Sent: Thursday, November 5, 2020 2:33 AM
To: Quantum ESPRESSO users Forum <users at lists.quantum-espresso.org>
Subject: Re: [QE-users] Running efficiently on multiple nodes On Wed, Nov 4, 2020 at 11:28 PM Baer, Bradly <bradly.b.baer at vanderbilt.edu> wrote: Now that I have two nodes, the script for a single node results in a crash shortly after reading in the pseudopotentials. which version of QE are you using, and which crash do you obtain, with which executable?Paolo--Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e Fisiche,
Univ. Udine, via delle Scienze 208, 33100 Udine, Italy
Phone +39-0432-558216, fax +39-0432-558222
 _________________
Quantum ESPRESSO is supported by MaX (www.max-centre.eu)
users mailing list  users at lists.quantum-espresso.org
https://lists.quantum-espresso.org/mailman/listinfo/users

--Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e Fisiche,
Univ. Udine, via delle Scienze 208, 33100 Udine, Italy
Phone +39-0432-558216, fax +39-0432-558222
 _______________________________________________
Quantum ESPRESSO is supported by MaX (www.max-centre.eu)
users mailing list  users at lists.quantum-espresso.org
https://lists.quantum-espresso.org/mailman/listinfo/users

--Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e Fisiche,
Univ. Udine, via delle Scienze 208, 33100 Udine, Italy
Phone +39-0432-558216, fax +39-0432-558222
 _______________________________________________
Quantum ESPRESSO is supported by MaX (www.max-centre.eu)
users mailing list users at lists.quantum-espresso.org
https://lists.quantum-espresso.org/mailman/listinfo/users


 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20201106/7ef5c798/attachment.html>


More information about the users mailing list