[QE-users] k-points parallelization very slow
Christoph Wolf
wolf.christoph at qns.science
Fri Feb 12 05:00:40 CET 2021
Dear all,
I tested k-point parallelization and I wonder if the following results can
be normal or if my cluster has some serious problems...
the system has 74 atoms and a 2x2x1 k-point grid resulting in 4 k-points
number of k points= 4 Fermi-Dirac smearing, width (Ry)= 0.0050
cart. coord. in units 2pi/alat
k( 1) = ( 0.0000000 0.0000000 0.0000000), wk = 0.2500000
k( 2) = ( 0.3535534 -0.3535534 0.0000000), wk = 0.2500000
k( 3) = ( 0.0000000 -0.7071068 0.0000000), wk = 0.2500000
k( 4) = ( -0.3535534 -0.3535534 0.0000000), wk = 0.2500000
1) run on 1 node x 32 CPUs with -nk 4
Parallel version (MPI), running on 32 processors
MPI processes distributed on 1 nodes
K-points division: npool = 4
R & G space division: proc/nbgrp/npool/nimage = 8
Fft bands division: nmany = 1
PWSCF : 5h42m CPU 6h 3m WALL
2) run on 4 nodes x 32 CPUs with -nk 4
Parallel version (MPI), running on 128 processors
MPI processes distributed on 4 nodes
K-points division: npool = 4
R & G space division: proc/nbgrp/npool/nimage = 32
Fft bands division: nmany = 1
PWSCF : 6h32m CPU 6h36m WALL
I compiled my pwscf with intel 19 MKL, MPI and OpenMP. If I understood
correctly, -nk parallelization should work well as there is not much
communication between nodes but this does not seem to work for me at all...
detailed timing logs are attached!
TIA!
Chris
--
IBS Center for Quantum Nanoscience
Seoul, South Korea
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20210212/0055f142/attachment.html>
-------------- next part --------------
Program PWSCF v.6.6 starts on 11Feb2021 at 19: 2:15
This program is part of the open-source Quantum ESPRESSO suite
for quantum simulation of materials; please cite
"P. Giannozzi et al., J. Phys.:Condens. Matter 21 395502 (2009);
"P. Giannozzi et al., J. Phys.:Condens. Matter 29 465901 (2017);
URL http://www.quantum-espresso.org",
in publications or presentations arising from this work. More details at
http://www.quantum-espresso.org/quote
Parallel version (MPI), running on 32 processors
MPI processes distributed on 1 nodes
K-points division: npool = 4
R & G space division: proc/nbgrp/npool/nimage = 8
Fft bands division: nmany = 1
Waiting for input...
[...]
init_run : 24.31s CPU 24.95s WALL ( 1 calls)
electrons : 19680.09s CPU 20835.65s WALL ( 27 calls)
update_pot : 274.59s CPU 283.29s WALL ( 26 calls)
forces : 457.74s CPU 495.34s WALL ( 27 calls)
Called by init_run:
wfcinit : 11.02s CPU 11.23s WALL ( 1 calls)
potinit : 4.79s CPU 4.92s WALL ( 1 calls)
hinit0 : 6.12s CPU 6.23s WALL ( 1 calls)
Called by electrons:
c_bands : 14153.20s CPU 14557.65s WALL ( 461 calls)
sum_band : 2718.56s CPU 3086.52s WALL ( 461 calls)
v_of_rho : 1567.14s CPU 1672.25s WALL ( 481 calls)
newd : 886.93s CPU 1151.66s WALL ( 481 calls)
PAW_pot : 129.25s CPU 130.92s WALL ( 507 calls)
mix_rho : 190.61s CPU 195.43s WALL ( 461 calls)
Called by c_bands:
init_us_2 : 102.63s CPU 105.55s WALL ( 1952 calls)
cegterg : 12700.70s CPU 13083.44s WALL ( 943 calls)
Called by *egterg:
cdiaghg : 399.57s CPU 405.19s WALL ( 8740 calls)
h_psi : 8440.52s CPU 8656.67s WALL ( 8808 calls)
s_psi : 825.25s CPU 835.87s WALL ( 8808 calls)
g_psi : 48.34s CPU 49.24s WALL ( 7863 calls)
Called by h_psi:
h_psi:calbec : 769.39s CPU 778.94s WALL ( 8808 calls)
vloc_psi : 6737.98s CPU 6861.27s WALL ( 8808 calls)
add_vuspsi : 828.64s CPU 839.76s WALL ( 8808 calls)
General routines
calbec : 1017.32s CPU 1030.45s WALL ( 9998 calls)
fft : 1320.70s CPU 1357.02s WALL ( 12416 calls)
ffts : 24.62s CPU 24.91s WALL ( 1884 calls)
fftw : 6542.31s CPU 6663.22s WALL ( 1332404 calls)
interpolate : 71.88s CPU 73.11s WALL ( 962 calls)
Parallel routines
fft_scatt_xy : 1796.12s CPU 1829.54s WALL ( 1346704 calls)
fft_scatt_yz : 1960.95s CPU 1998.74s WALL ( 1346704 calls)
PWSCF : 5h42m CPU 6h 3m WALL
Program PWSCF v.6.6 starts on 11Feb2021 at 12:25:29
This program is part of the open-source Quantum ESPRESSO suite
for quantum simulation of materials; please cite
"P. Giannozzi et al., J. Phys.:Condens. Matter 21 395502 (2009);
"P. Giannozzi et al., J. Phys.:Condens. Matter 29 465901 (2017);
URL http://www.quantum-espresso.org",
in publications or presentations arising from this work. More details at
http://www.quantum-espresso.org/quote
Parallel version (MPI), running on 128 processors
MPI processes distributed on 4 nodes
K-points division: npool = 4
R & G space division: proc/nbgrp/npool/nimage = 32
Fft bands division: nmany = 1
Waiting for input...
Reading input from standard input
[....]
init_run : 28.22s CPU 28.53s WALL ( 1 calls)
electrons : 22733.20s CPU 22961.09s WALL ( 27 calls)
update_pot : 169.17s CPU 171.24s WALL ( 26 calls)
forces : 168.44s CPU 170.57s WALL ( 27 calls)
Called by init_run:
wfcinit : 1.87s CPU 1.92s WALL ( 1 calls)
potinit : 6.19s CPU 6.24s WALL ( 1 calls)
hinit0 : 3.18s CPU 3.30s WALL ( 1 calls)
Called by electrons:
c_bands : 18068.08s CPU 18219.06s WALL ( 454 calls)
sum_band : 2975.43s CPU 3010.91s WALL ( 454 calls)
v_of_rho : 375.99s CPU 380.09s WALL ( 475 calls)
newd : 188.59s CPU 198.55s WALL ( 475 calls)
PAW_pot : 457.31s CPU 461.28s WALL ( 501 calls)
mix_rho : 48.92s CPU 50.08s WALL ( 454 calls)
Called by c_bands:
init_us_2 : 26.53s CPU 27.06s WALL ( 1924 calls)
cegterg : 2422.03s CPU 2451.72s WALL ( 934 calls)
Called by *egterg:
cdiaghg : 394.88s CPU 397.92s WALL ( 8861 calls)
h_psi : 1204.15s CPU 1222.64s WALL ( 8927 calls)
s_psi : 213.33s CPU 215.27s WALL ( 8927 calls)
g_psi : 5.86s CPU 5.93s WALL ( 7991 calls)
Called by h_psi:
h_psi:calbec : 204.24s CPU 206.18s WALL ( 8927 calls)
vloc_psi : 770.69s CPU 784.95s WALL ( 8927 calls)
add_vuspsi : 214.20s CPU 216.29s WALL ( 8927 calls)
General routines
calbec : 269.24s CPU 271.84s WALL ( 10103 calls)
fft : 529.96s CPU 535.07s WALL ( 12264 calls)
ffts : 254.07s CPU 256.05s WALL ( 1858 calls)
fftw : 771.43s CPU 785.72s WALL ( 1301292 calls)
interpolate : 18.85s CPU 19.04s WALL ( 950 calls)
Parallel routines
fft_scatt_xy : 172.35s CPU 175.20s WALL ( 1315414 calls)
fft_scatt_yz : 960.05s CPU 971.32s WALL ( 1315414 calls)
PWSCF : 6h32m CPU 6h36m WALL
This run was terminated on: 19: 2: 9 11Feb2021
More information about the users
mailing list