[QE-users] k-points parallelization very slow

Christoph Wolf wolf.christoph at qns.science
Fri Feb 12 05:00:40 CET 2021


Dear all,

I tested k-point parallelization and I wonder if the following results can
be normal or if my cluster has some serious problems...

the system has 74 atoms and a 2x2x1 k-point grid resulting in 4 k-points

     number of k points=     4  Fermi-Dirac smearing, width (Ry)=  0.0050
                       cart. coord. in units 2pi/alat
        k(    1) = (   0.0000000   0.0000000   0.0000000), wk =   0.2500000
        k(    2) = (   0.3535534  -0.3535534   0.0000000), wk =   0.2500000
        k(    3) = (   0.0000000  -0.7071068   0.0000000), wk =   0.2500000
        k(    4) = (  -0.3535534  -0.3535534   0.0000000), wk =   0.2500000


1) run on 1 node x 32 CPUs with -nk 4
     Parallel version (MPI), running on    32 processors

     MPI processes distributed on     1 nodes
     K-points division:     npool     =       4
     R & G space division:  proc/nbgrp/npool/nimage =       8
     Fft bands division:     nmany     =       1

     PWSCF        :      5h42m CPU      6h 3m WALL


2) run on 4 nodes x 32 CPUs with -nk 4
     Parallel version (MPI), running on   128 processors

     MPI processes distributed on     4 nodes
     K-points division:     npool     =       4
     R & G space division:  proc/nbgrp/npool/nimage =      32
     Fft bands division:     nmany     =       1

     PWSCF        :      6h32m CPU      6h36m WALL

I compiled my pwscf with intel 19 MKL, MPI and OpenMP. If I understood
correctly, -nk parallelization should work well as there is not much
communication between nodes but this does not seem to work for me at all...
detailed timing logs are attached!

TIA!
Chris

-- 
IBS Center for Quantum Nanoscience
Seoul, South Korea
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20210212/0055f142/attachment.html>
-------------- next part --------------

     Program PWSCF v.6.6 starts on 11Feb2021 at 19: 2:15 

     This program is part of the open-source Quantum ESPRESSO suite
     for quantum simulation of materials; please cite
         "P. Giannozzi et al., J. Phys.:Condens. Matter 21 395502 (2009);
         "P. Giannozzi et al., J. Phys.:Condens. Matter 29 465901 (2017);
          URL http://www.quantum-espresso.org", 
     in publications or presentations arising from this work. More details at
     http://www.quantum-espresso.org/quote

     Parallel version (MPI), running on    32 processors

     MPI processes distributed on     1 nodes
     K-points division:     npool     =       4
     R & G space division:  proc/nbgrp/npool/nimage =       8
     Fft bands division:     nmany     =       1
     Waiting for input...

[...] 
     init_run     :     24.31s CPU     24.95s WALL (       1 calls)
     electrons    :  19680.09s CPU  20835.65s WALL (      27 calls)
     update_pot   :    274.59s CPU    283.29s WALL (      26 calls)
     forces       :    457.74s CPU    495.34s WALL (      27 calls)

     Called by init_run:
     wfcinit      :     11.02s CPU     11.23s WALL (       1 calls)
     potinit      :      4.79s CPU      4.92s WALL (       1 calls)
     hinit0       :      6.12s CPU      6.23s WALL (       1 calls)

     Called by electrons:
     c_bands      :  14153.20s CPU  14557.65s WALL (     461 calls)
     sum_band     :   2718.56s CPU   3086.52s WALL (     461 calls)
     v_of_rho     :   1567.14s CPU   1672.25s WALL (     481 calls)
     newd         :    886.93s CPU   1151.66s WALL (     481 calls)
     PAW_pot      :    129.25s CPU    130.92s WALL (     507 calls)
     mix_rho      :    190.61s CPU    195.43s WALL (     461 calls)

     Called by c_bands:
     init_us_2    :    102.63s CPU    105.55s WALL (    1952 calls)
     cegterg      :  12700.70s CPU  13083.44s WALL (     943 calls)

     Called by *egterg:
     cdiaghg      :    399.57s CPU    405.19s WALL (    8740 calls)
     h_psi        :   8440.52s CPU   8656.67s WALL (    8808 calls)
     s_psi        :    825.25s CPU    835.87s WALL (    8808 calls)
     g_psi        :     48.34s CPU     49.24s WALL (    7863 calls)

     Called by h_psi:
     h_psi:calbec :    769.39s CPU    778.94s WALL (    8808 calls)
     vloc_psi     :   6737.98s CPU   6861.27s WALL (    8808 calls)
     add_vuspsi   :    828.64s CPU    839.76s WALL (    8808 calls)

     General routines
     calbec       :   1017.32s CPU   1030.45s WALL (    9998 calls)
     fft          :   1320.70s CPU   1357.02s WALL (   12416 calls)
     ffts         :     24.62s CPU     24.91s WALL (    1884 calls)
     fftw         :   6542.31s CPU   6663.22s WALL ( 1332404 calls)
     interpolate  :     71.88s CPU     73.11s WALL (     962 calls)
 
     Parallel routines
     fft_scatt_xy :   1796.12s CPU   1829.54s WALL ( 1346704 calls)
     fft_scatt_yz :   1960.95s CPU   1998.74s WALL ( 1346704 calls)
 
     PWSCF        :      5h42m CPU      6h 3m WALL




     Program PWSCF v.6.6 starts on 11Feb2021 at 12:25:29 

     This program is part of the open-source Quantum ESPRESSO suite
     for quantum simulation of materials; please cite
         "P. Giannozzi et al., J. Phys.:Condens. Matter 21 395502 (2009);
         "P. Giannozzi et al., J. Phys.:Condens. Matter 29 465901 (2017);
          URL http://www.quantum-espresso.org", 
     in publications or presentations arising from this work. More details at
     http://www.quantum-espresso.org/quote

     Parallel version (MPI), running on   128 processors

     MPI processes distributed on     4 nodes
     K-points division:     npool     =       4
     R & G space division:  proc/nbgrp/npool/nimage =      32
     Fft bands division:     nmany     =       1
     Waiting for input...
     Reading input from standard input

[....]

     init_run     :     28.22s CPU     28.53s WALL (       1 calls)
     electrons    :  22733.20s CPU  22961.09s WALL (      27 calls)
     update_pot   :    169.17s CPU    171.24s WALL (      26 calls)
     forces       :    168.44s CPU    170.57s WALL (      27 calls)

     Called by init_run:
     wfcinit      :      1.87s CPU      1.92s WALL (       1 calls)
     potinit      :      6.19s CPU      6.24s WALL (       1 calls)
     hinit0       :      3.18s CPU      3.30s WALL (       1 calls)

     Called by electrons:
     c_bands      :  18068.08s CPU  18219.06s WALL (     454 calls)
     sum_band     :   2975.43s CPU   3010.91s WALL (     454 calls)
     v_of_rho     :    375.99s CPU    380.09s WALL (     475 calls)
     newd         :    188.59s CPU    198.55s WALL (     475 calls)
     PAW_pot      :    457.31s CPU    461.28s WALL (     501 calls)
     mix_rho      :     48.92s CPU     50.08s WALL (     454 calls)

     Called by c_bands:
     init_us_2    :     26.53s CPU     27.06s WALL (    1924 calls)
     cegterg      :   2422.03s CPU   2451.72s WALL (     934 calls)

     Called by *egterg:
     cdiaghg      :    394.88s CPU    397.92s WALL (    8861 calls)
     h_psi        :   1204.15s CPU   1222.64s WALL (    8927 calls)
     s_psi        :    213.33s CPU    215.27s WALL (    8927 calls)
     g_psi        :      5.86s CPU      5.93s WALL (    7991 calls)

     Called by h_psi:
     h_psi:calbec :    204.24s CPU    206.18s WALL (    8927 calls)
     vloc_psi     :    770.69s CPU    784.95s WALL (    8927 calls)
     add_vuspsi   :    214.20s CPU    216.29s WALL (    8927 calls)

     General routines
     calbec       :    269.24s CPU    271.84s WALL (   10103 calls)
     fft          :    529.96s CPU    535.07s WALL (   12264 calls)
     ffts         :    254.07s CPU    256.05s WALL (    1858 calls)
     fftw         :    771.43s CPU    785.72s WALL ( 1301292 calls)
     interpolate  :     18.85s CPU     19.04s WALL (     950 calls)
 
     Parallel routines
     fft_scatt_xy :    172.35s CPU    175.20s WALL ( 1315414 calls)
     fft_scatt_yz :    960.05s CPU    971.32s WALL ( 1315414 calls)
 
     PWSCF        :      6h32m CPU      6h36m WALL

 
   This run was terminated on:  19: 2: 9  11Feb2021 







More information about the users mailing list