[QE-users] efficient parallelization on a system without Infiniband
Michal Krompiec
michal.krompiec at gmail.com
Wed May 27 16:26:49 CEST 2020
Hello,
How can I minimize inter-node MPI communication in a pw.x run? My
system doesn't have Infiniband and inter-node MPI can easily become
the bottleneck.
Let's say, I'm running a calculation with 4 k-points, on 4 nodes, with
56 MPI tasks per node. I would then use -npool 4 to create 4 pools for
the k-point parallelization. However, it seems that the
diagonalization is by default parallelized imperfectly (or isn't it?):
Subspace diagonalization in iterative solution of the eigenvalue problem:
one sub-group per band group will be used
scalapack distributed-memory algorithm (size of sub-group: 7* 7 procs)
So far, speedup on 4 nodes vs 1 node is 3.26x. Is it normal or does it
look like it can be improved?
Best regards,
Michal Krompiec
Merck KGaA
Southampton, UK
More information about the users
mailing list