[QE-users] efficient parallelization on a system without Infiniband
Ye Luo
xw111luoye at gmail.com
Wed May 27 16:47:30 CEST 2020
3.26x seems possible to me. It can be caused by load imbalance in the
iterative solver among the 4 k-points.
Could you list the time in seconds with 1 node and 4 nodes? Those you used
to calculate 3.26x.
Could you also try diago_david_ndim=2 under "&ELECTRONS" and provide 1 and
4-node time in seconds?
In addition, you may try ELPA which usually gives better performance than
scalapack.
Thanks,
Ye
===================
Ye Luo, Ph.D.
Computational Science Division & Leadership Computing Facility
Argonne National Laboratory
On Wed, May 27, 2020 at 9:27 AM Michal Krompiec <michal.krompiec at gmail.com>
wrote:
> Hello,
> How can I minimize inter-node MPI communication in a pw.x run? My
> system doesn't have Infiniband and inter-node MPI can easily become
> the bottleneck.
> Let's say, I'm running a calculation with 4 k-points, on 4 nodes, with
> 56 MPI tasks per node. I would then use -npool 4 to create 4 pools for
> the k-point parallelization. However, it seems that the
> diagonalization is by default parallelized imperfectly (or isn't it?):
> Subspace diagonalization in iterative solution of the eigenvalue
> problem:
> one sub-group per band group will be used
> scalapack distributed-memory algorithm (size of sub-group: 7* 7
> procs)
> So far, speedup on 4 nodes vs 1 node is 3.26x. Is it normal or does it
> look like it can be improved?
>
> Best regards,
>
> Michal Krompiec
> Merck KGaA
> Southampton, UK
> _______________________________________________
> Quantum ESPRESSO is supported by MaX (www.max-centre.eu/quantum-espresso)
> users mailing list users at lists.quantum-espresso.org
> https://lists.quantum-espresso.org/mailman/listinfo/users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20200527/26f87380/attachment.html>
More information about the users
mailing list