[QE-users] Finding the optimal parallelization parameters

Léon Luntadila Lufungula Luntadilatiti at hotmail.com
Tue Apr 18 11:28:52 CEST 2023


Dear QE users,



I need some help in optimizing the different parallelization levels of my QE calculations. Unfortunately, our HPC center is going to start billing the research groups for our calculations so I'm currently working on making our QE calculations as efficient as possible to avoid large bills at the end of the year. In our HPC center we have two clusters at our disposal with different architectures and different billing amounts, so I wanted to figure out which one to use and how many nodes to request per calculation.



The two clusters have the following architecture:
- Cluster 1 (Leibniz): 152 compute nodes containing 2 Xeon E5-2680v4 CPUs at 2.4GHz<mailto:CPUs at 2.4GHz> (Broadwell), 14 cores each (28 cores per node in total)
- Cluster 2 (Vaughan): 152 compute nodes containing 2 AMD Epyc 7452 CPUs at 2.35<mailto:CPUs at 2.35> GHz (Rome), 32 cores each (64 cores per node in total)



>From the QE documentation I've read that there are only several parameters that are important for the parallelization. These parameters are given below with their values for my systems:
- No. of k-points = 2

- 3rd dimension in the smooth FFT grid = 405
- 3rd dimension in the dense FFT grid = 720

- No. of KS states = 457



I am currently using rather arbitrary parallelization settings as I just request 8 nodes (of 28 cores) per calculation with k-point parallelization set to 2 (i.e., -nk 2) and using the serial algorithm for subspace diagonalization (i.e., -nd 1) to make the calculation complete within a reasonable timescale.



I've already read a lot about the parallelization implemented in QE, but I still have several questions relating to the different levels of parallelization:



k-point parallelization:

>From what I understand, having only 2 k-points in my calculations means that I can maximally subdivide the processors into a set of 2 pools as it cannot exceed the number of k-points, so that each pool of processors handles a single k-point. If I would take more pools, this would be detrimental to performance as multiple pools would handle a single k-point resulting in heavy communications between these pools. Therefore, I am wondering if it is also bad to request more than 2 nodes for my calculations considering I only have two k-points and subdivide my processors into 2 pools? Requesting more than 2 nodes would mean every pool contains processors spread across multiple nodes, so each pool would require inter-node communications to do its computations which would slow down the calculation.



FFT parallelization:

It is stated in the documentation of pw.x that the parallelization on PWs yields best results when the number of processors in a pool is a divisor of the 3rd dimension of the smooth (nr3s) and dense (nr3) FFT grids. Unfortunately, in my case the greatest common divisor of both dimensions is 45 which is a bad match with the number of processors available on the nodes on either of the clusters available to me (28 and 64 respectively). Therefore, I was wondering if it is okay to just manually alter the third dimensions with nr3=X and nr3s=X to make sure that the number of processors available are a common divisor of the third dimensions of the FFT grids?



Bands and tasks parallelization:

If I'm not mistaken, I should not use band or task group parallelization because bands parallelization is only useful when using hybrid functionals (which I don't use) and task group parallelization is only necessary when the number of processors exceeds the number of FFT planes (which is not the case here unless I ask an excessive number of nodes which would already be detrimental due to the inter-node communications).



OpenMP parallelization:

OpenMP cannot be used to coordinate multiple node jobs, so if I would want to use this level of parallelization, I would have to make sure that the number of processors in a pool is lower than or equal to the number of processors on a single node right?



Any help on the subject would be greatly appreciated!



Thanks in advance,

Léon Luntadila Lufungula

Structural Chemistry Group

University of Antwerp, Belgium

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20230418/1f5af902/attachment.html>


More information about the users mailing list