[QE-users] Running efficiently on multiple nodes

Baer, Bradly bradly.b.baer at Vanderbilt.Edu
Sat Nov 7 03:54:59 CET 2020


Thank you all for taking the time to share your experience.  It looks like I have some work to do this weekend to learn more about how our cluster handles inter-node communication.  I appreciate all the help.

-Brad

--------------------------------------------------------
Bradly Baer
Graduate Research Assistant, Walker Lab
Interdisciplinary Materials Science
Vanderbilt University


________________________________
From: users <users-bounces at lists.quantum-espresso.org> on behalf of Antoine Jay via users <users at lists.quantum-espresso.org>
Sent: Friday, November 6, 2020 1:14 AM
To: Quantum ESPRESSO users Forum <users at lists.quantum-espresso.org>
Subject: Re: [QE-users] Running efficiently on multiple nodes

Dear Brad,
I can only confirm what Paolo and Michal suggested.
Even with infiniband the efficiency of the FFT parallelization drastically decreases at each new node, WHATEVER THE CODE (not only QE) or the librairy.
For SLURM jobs, if you ask 2 nodes of 16 cores, the first 16 are indexed 1 to 16 and 16 last 17-32, that is exactlly the same repartition implemented in QE for k points, bands or images parallelization.
Thanks to this, I never face trouble concerning the way the mpi processes are spread to the cores when the number of pools (or images) equals the number of nodes.
For these reason, except for large supercells at gamma only, I always do npool=nodes

Regards,
Antoine Jay
LAAS CNRS
Toulouse France

Le Vendredi, Novembre 06, 2020 01:04 CET, Michal Krompiec via users <users at lists.quantum-espresso.org> a écrit:

Dear Brad,
Fast communications means here Infiniband or other RDMA. Make sure your MPI uses RDMA, I’ve seen systems where it isn’t enabled by default. That said, if you use k-point parallelization you can get away with gigabit ethernet as Paolo mentioned.
Best wishes,
Michal Krompiec
Merck KGaA

On Thu, Nov 5, 2020 at 11:40 PM Baer, Bradly via users <users at lists.quantum-espresso.org<mailto:users at lists.quantum-espresso.org>> wrote:
Paolo,

I believe the nodes I am using have gigabit connections. There are additional nodes that have 10 or 25 gigabit connections but I don't think I would land on one of them without specifically requesting them.  What communication speed would be appropriate for QE's needs?

I also did consider trying to manually set the parallelization but I don't currently know enough about SLURM to identify each node and ensure that all 16 cores assigned from a pool are on the same node.  I will keep it in mind though as a possible future solution.

Thanks,
Brad

--------------------------------------------------------
Bradly Baer
Graduate Research Assistant, Walker Lab
Interdisciplinary Materials Science
Vanderbilt University



________________________________
From: Paolo Giannozzi <p.giannozzi at gmail.com<mailto:p.giannozzi at gmail.com>>
Sent: Thursday, November 5, 2020 3:54 PM
To: Baer, Bradly <bradly.b.baer at Vanderbilt.Edu>; Quantum ESPRESSO users Forum <users at lists.quantum-espresso.org<mailto:users at lists.quantum-espresso.org>>

Subject: Re: [QE-users] Running efficiently on multiple nodes

Are there fast communications between the two nodes? if not, the parallel distributed 3D FFT will be very slow (note the time taken by fft_scatt_yz). You might find convenient to exploit k-point parallelization, that requires much less communication: for instance, "mpirun -n 32 pw.x -nk 2" (2 pools of 16 processors, each pool performing parallel FFT), but you have to figure out a way to convince the first pool of 16 processors on node 1, the second on node 2 (or vice versa, as long as FFT parallelization happens inside a node, k-point parallelization across nodes )

Paolo

On Thu, Nov 5, 2020 at 7:29 PM Baer, Bradly via users <users at lists.quantum-espresso.org<mailto:users at lists.quantum-espresso.org>> wrote:
Paolo,

Thank you for your suggestion.  I will add recompiling to move to 6.6 to my to do list.  For now, I corrected the pseudopotential files as you indicated and the calculation ran successfully.  It has become slightly faster, but still much slower than running on a single node (3:30s vs 0:30s).  Is there more that I should be doing to improve performance or is my test problem too small to see the benefits of parallelization?

Thanks,
Brad

--------------------------------------------------------
Bradly Baer
Graduate Research Assistant, Walker Lab
Interdisciplinary Materials Science
Vanderbilt University



________________________________
From: users <users-bounces at lists.quantum-espresso.org<mailto:users-bounces at lists.quantum-espresso.org>> on behalf of Paolo Giannozzi <p.giannozzi at gmail.com<mailto:p.giannozzi at gmail.com>>
Sent: Thursday, November 5, 2020 10:01 AM
To: Quantum ESPRESSO users Forum <users at lists.quantum-espresso.org<mailto:users at lists.quantum-espresso.org>>
Subject: Re: [QE-users] Running efficiently on multiple nodes

On Thu, Nov 5, 2020 at 3:05 PM Baer, Bradly <bradly.b.baer at vanderbilt.edu<mailto:bradly.b.baer at vanderbilt.edu>> wrote:

Pseudo file Ga.pbe-dn-kjpaw_psl.1.0.0.UPF has been fixed on the fly.
To avoid this message in the future, permanently fix
 your pseudo files following these instructions:
https://gitlab.com/QEF/q-e/blob/master/upftools/how_to_fix_upf.md<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.com%2FQEF%2Fq-e%2Fblob%2Fmaster%2Fupftools%2Fhow_to_fix_upf.md&data=04%7C01%7Cbradly.b.baer%40vanderbilt.edu%7Cbb9dbbbf6caf4f5c264708d88223b838%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C1%7C637402437232156388%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=Dow2rXyGSRZd%2FvKNOc5T1izM%2FiPxPoAJzVJjU28DHfo%3D&reserved=0>

This is a possible source of trouble if the output directory is not visible to all processors. Please try one of the following:
- do what it is suggested (or simply: edit Ga.pbe-dn-kjpaw_psl.1.0.0.UPF, replace all occurrences of "&" with "&")
- get version 6.6, that reads the pseudopotential file on one processor and broadcast its contents to all other processes
- get the development version, that in addition is not sensitive to the presence of nonstandard "&" in the files,

Paolo


-Brad

--------------------------------------------------------
Bradly Baer
Graduate Research Assistant, Walker Lab
Interdisciplinary Materials Science
Vanderbilt University



________________________________
From: users <users-bounces at lists.quantum-espresso.org<mailto:users-bounces at lists.quantum-espresso.org>> on behalf of Paolo Giannozzi <p.giannozzi at gmail.com<mailto:p.giannozzi at gmail.com>>
Sent: Thursday, November 5, 2020 2:33 AM
To: Quantum ESPRESSO users Forum <users at lists.quantum-espresso.org<mailto:users at lists.quantum-espresso.org>>
Subject: Re: [QE-users] Running efficiently on multiple nodes

On Wed, Nov 4, 2020 at 11:28 PM Baer, Bradly <bradly.b.baer at vanderbilt.edu<mailto:bradly.b.baer at vanderbilt.edu>> wrote:

Now that I have two nodes, the script for a single node results in a crash shortly after reading in the pseudopotentials.

which version of QE are you using, and which crash do you obtain, with which executable?
Paolo
--
Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e Fisiche,
Univ. Udine, via delle Scienze 208, 33100 Udine, Italy<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.google.com%2Fmaps%2Fsearch%2FUdine%2C%2Bvia%2Bdelle%2BScienze%2B208%2C%2B33100%2BUdine%2C%2BItaly%3Fentry%3Dgmail%26source%3Dg&data=04%7C01%7Cbradly.b.baer%40vanderbilt.edu%7Cbb9dbbbf6caf4f5c264708d88223b838%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C1%7C637402437232166380%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=%2FL9RjqUBnCiGTjxZVVqXjTGLXNDgdKtBbKq%2BYmoz2zY%3D&reserved=0>
Phone +39-0432-558216, fax +39-0432-558222

_________________
Quantum ESPRESSO is supported by MaX (www.max-centre.eu<https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.max-centre.eu%2F&data=04%7C01%7Cbradly.b.baer%40vanderbilt.edu%7Cbb9dbbbf6caf4f5c264708d88223b838%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C1%7C637402437232166380%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=XNmuXbu%2B6V9RPKUftkAR4vgmEAtaOuS%2BYcqHy%2BIM90Y%3D&reserved=0>)
users mailing list users at lists.quantum-espresso.org<mailto:users at lists.quantum-espresso.org>
https://lists.quantum-espresso.org/mailman/listinfo/users<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.quantum-espresso.org%2Fmailman%2Flistinfo%2Fusers&data=04%7C01%7Cbradly.b.baer%40vanderbilt.edu%7Cbb9dbbbf6caf4f5c264708d88223b838%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C1%7C637402437232176369%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=4Et3MwKdIWAUfskmMfK67xygjhuA59BPEny3Wen7%2B34%3D&reserved=0>


--
Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e Fisiche,
Univ. Udine, via delle Scienze 208, 33100 Udine, Italy<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.google.com%2Fmaps%2Fsearch%2FUdine%2C%2Bvia%2Bdelle%2BScienze%2B208%2C%2B33100%2BUdine%2C%2BItaly%3Fentry%3Dgmail%26source%3Dg&data=04%7C01%7Cbradly.b.baer%40vanderbilt.edu%7Cbb9dbbbf6caf4f5c264708d88223b838%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C1%7C637402437232176369%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=JvhttsG%2FN90fLTCY9U1smphqVslvjmX28T18DsReo98%3D&reserved=0>
Phone +39-0432-558216, fax +39-0432-558222

_______________________________________________
Quantum ESPRESSO is supported by MaX (www.max-centre.eu<https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.max-centre.eu%2F&data=04%7C01%7Cbradly.b.baer%40vanderbilt.edu%7Cbb9dbbbf6caf4f5c264708d88223b838%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C1%7C637402437232186367%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=9uUZAM1sqTl76J4cOO0WLIgLI24HBlDiIAX0L0HBVDg%3D&reserved=0>)
users mailing list users at lists.quantum-espresso.org<mailto:users at lists.quantum-espresso.org>
https://lists.quantum-espresso.org/mailman/listinfo/users<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.quantum-espresso.org%2Fmailman%2Flistinfo%2Fusers&data=04%7C01%7Cbradly.b.baer%40vanderbilt.edu%7Cbb9dbbbf6caf4f5c264708d88223b838%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C1%7C637402437232196360%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=EH3ADOtCL9lCrL4t63oYM6DEsdrRIIe6g%2BwAOP7dfIs%3D&reserved=0>


--
Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e Fisiche,
Univ. Udine, via delle Scienze 208, 33100 Udine, Italy<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.google.com%2Fmaps%2Fsearch%2FUdine%2C%2Bvia%2Bdelle%2BScienze%2B208%2C%2B33100%2BUdine%2C%2BItaly%3Fentry%3Dgmail%26source%3Dg&data=04%7C01%7Cbradly.b.baer%40vanderbilt.edu%7Cbb9dbbbf6caf4f5c264708d88223b838%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C1%7C637402437232196360%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=2SohlU0wvyYOy%2FqnoLKlytqDkaIDtW9OD7CnGgXBRfM%3D&reserved=0>
Phone +39-0432-558216, fax +39-0432-558222

_______________________________________________
Quantum ESPRESSO is supported by MaX (www.max-centre.eu<https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.max-centre.eu%2F&data=04%7C01%7Cbradly.b.baer%40vanderbilt.edu%7Cbb9dbbbf6caf4f5c264708d88223b838%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C1%7C637402437232206353%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=bA%2FcKlXHLFuQXVZY%2BQCVAmTjoDchhqt9iIqM2fmqfAI%3D&reserved=0>)
users mailing list users at lists.quantum-espresso.org<mailto:users at lists.quantum-espresso.org>
https://lists.quantum-espresso.org/mailman/listinfo/users<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.quantum-espresso.org%2Fmailman%2Flistinfo%2Fusers&data=04%7C01%7Cbradly.b.baer%40vanderbilt.edu%7Cbb9dbbbf6caf4f5c264708d88223b838%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C1%7C637402437232206353%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=6JuVeAzN9OsRPy49vFlEZ29pfm3cifmCLBaj1MpRlPM%3D&reserved=0>




-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20201107/5d77f10d/attachment.html>


More information about the users mailing list