[QE-developers] Different values in phonon calculations with different numbers of processors

Sat Oct 12 12:35:44 CEST 2024

Just a few quick impressions:
- the numerical noise in zero-frequency acoustic modes at Gamma is, 
well, numerical noise: the value of Acoustic Sum Rule violation may 
fluctuate
- the results with tr2_ph=10^-16 differ A LOT from those with 
tr2_ph=10^-14, so presumably the latter aren't converged at all. I don't 
see anything anomalous in the behavior of results with tr2_ph=10^-16 
(even with a number of processors that is presumably too large for a 
relatively small system like MoSe2). Unconverged calculations with 
different number of processors may "un-converge" to different results, 
due to unavoidable numerical differences

Paolo

On 10/10/2024 22:47, Remi Leano wrote:
> 	
> Dear developers,
> 
> We have noticed the results from ph.x have a significant dependence on 
> the number of processors used in the calculation, which we would like to 
> share here. I have reproduced this on QE 7.1, 7.2, and 7.3.1 on two 
> separate machines (NERSC’s Perlmutter, and our university’s HPC 
> facility). NERSC’s Perlmutter has an AMD EPYC 7763 CPU, and our 
> university’s HPC facility runs on a Xeon Gold 6330 CPU with a x86_64 
> architecture. Our home HPC’s QE7.1 was compiled with Intel’s oneapi, 
> version 2021.4.0, and with mvapich2 2.3.6. On NERSC’s Perlmutter, their 
> QE7.1 seems to have been compiled with gcc-native 12.3 and Cray-mpich 
> 8.1.28 (I determined this while logged in to Perlmutter, but see also 
> https://docs.nersc.gov/development/compilers/base/).
> 
>  From what I have seen, differences are negligible in simple systems 
> such as Si. However, for a more complex system, such as MoSe2, the first 
> frequency in the output from ph.x can vary from -50.482019 cm-1 to 
> -60.136795 cm-1. When I run in serial, -60.136707 cm-1 is obtained for 
> the first frequency.
> 
> The last discussion of this parallelization issue seems to have been 
> from 21 Mar 2024 (_https://www.mail-archive.com/users@lists.quantum- 
> espresso.org/msg44272.html <https://www.mail-archive.com/ 
> users at lists.quantum-espresso.org/msg44272.html>_), and prior to that in 
> from 16 Feb 2007 in version 3 (_https://www.mail-archive.com/ 
> users at lists.quantum-espresso.org/msg10137.html <https://www.mail- 
> archive.com/users at lists.quantum-espresso.org/msg10137.html>_). In the 
> last post it was mentioned that zero-frequency acoustic modes that have 
> non-zero frequencies from  ph.x may vary by a few cm-1, but in this case 
> we have found some results which have variation of 2 to 10 cm-1 for most 
> of the modes. Additionally, one of the low-lying frequencies results in 
> an imaginary mode after the Acoustic Sum Rule is applied which is not 
> observed when ph.x is run in serial.
> 
> I have provided details about the machines used at the end. All tests 
> are done on CPUs and with one thread.
> 
> Here is how I have obtained these results:
> 
>  1.
>     A scf calculation is done on the structure.
>  2.
>     The output of the scf calculation is copied into different
>     directories, one for each test of the number of processors, such
>     that all calculations have the same starting point for the ph.x
>     calculation.
>  3.
>     The batch script’s value for the number of processors for each test
>     is modified, holding all else fixed.
> 
> 
> While there is also some variation in scf total energies depending on 
> the number of processors used, it seems within the amount of numerical 
> variation which is unavoidable and expected. So, for the purpose of this 
> test, the number of processors used for the scf is held fixed (56 
> processors) such that the variation can be attributed entirely to the 
> number of processors used in the ph.x step. My input files are below. 
> The data shown is from one pool (-nk 1). However, it was found when 
> comparing runs with different numbers of processors and/or numbers of 
> pools that the frequency results are determined by the number of 
> processors in the pools group (nproc). This was consistent between both 
> NERSC’s Perlmutter and our local HPC.
> 
> Values I obtain from ph.x for QE 7.1 on our university’s HPC, each 
> starting from exactly identical scf output files are in the table below.
> +------------+---------------------------------- 
> +--------------------------------+-----------------------------------+
> |            | First frequency in ph.out [cm-1] | [1,1][1,1] of 
> dynamical matrix | First frequency in dyn.out [cm-1] |
> +------------+---------------------------------- 
> +--------------------------------+-----------------------------------+
> | 1 (Serial) | -50.483039                       | 0.42773666             
>          | 0.00                              |
> +------------+---------------------------------- 
> +--------------------------------+-----------------------------------+
> | 2          | -50.487262                       | 0.42773677             
>          | -0.00                             |
> +------------+---------------------------------- 
> +--------------------------------+-----------------------------------+
> | 4          | -50.482019                       | 0.42773639             
>          | -0.00                             |
> +------------+---------------------------------- 
> +--------------------------------+-----------------------------------+
> | 7          | -50.487262                       | 0.42773677             
>          | -0.00                             |
> +------------+---------------------------------- 
> +--------------------------------+-----------------------------------+
> | 8          | -50.482019                       | 0.42773639             
>          | -0.00                             |
> +------------+---------------------------------- 
> +--------------------------------+-----------------------------------+
> | 14         | -50.487262                       | 0.42773677             
>          | -0.00                             |
> +------------+---------------------------------- 
> +--------------------------------+-----------------------------------+
> | 28         | -60.136621                       | 0.41509106             
>          | -0.00                             |
> +------------+---------------------------------- 
> +--------------------------------+-----------------------------------+
> | 30         | -60.136257                       | 0.41509080             
>          | -0.00                             |
> +------------+---------------------------------- 
> +--------------------------------+-----------------------------------+
> | 32         | -50.482019                       | 0.42773639             
>          | -0.00                             |
> +------------+---------------------------------- 
> +--------------------------------+-----------------------------------+
> | 40         | -60.136605                       | 0.41509088             
>          | -0.00                             |
> +------------+---------------------------------- 
> +--------------------------------+-----------------------------------+
> | 50         | -60.136257                       | 0.41509080             
>          | -0.00                             |
> +------------+---------------------------------- 
> +--------------------------------+-----------------------------------+
> | 52         | -60.136795                       | 0.41509068             
>          | -0.00                             |
> +------------+---------------------------------- 
> +--------------------------------+-----------------------------------+
> | 54         | -60.136609                       | 0.33587825             
>          | -55.99                            |
> +------------+---------------------------------- 
> +--------------------------------+-----------------------------------+
> | 56         | -60.136707                       | 0.33587767             
>          | -56.00                            |
> +------------+---------------------------------- 
> +--------------------------------+-----------------------------------+
> 
> In many cases, the application of the Acoustic Sum Rule by dynmat.x 
> masks this issue. We noticed a difference in values for the first entry 
> of the dynamical matrix obtained in the 56 and 54 processor tests 
> compared to other runs, and thought it could be the origin of the 
> discrepancy. I edited this single value of dynamical matrix of the 56 
> processor case to match the value from the 2 processor run (0.42773677 
> cm-1), and an entirely different value not obtained from other tests, 
> -35.85 cm-1, was obtained for the first frequency in dyn.out. Therefore, 
> while the first entry of the dynamical matrix is one of the most 
> noticeable differences between the outputs from ph.x, there must be at 
> least one other significant difference elsewhere, which we have not been 
> able to pinpoint.
> 
> We increased tr2_ph (threshold for self-consistency) from the default of 
> 1.0e-12 to 1.0e-16, as was suggested in the comments of the 21 Mar 2024 
> post, and we did find that this improved the results. When tr2_ph is 
> decreased to 1e-16, there is still large variation in the ph.x 
> frequencies of the first three modes, around 9 cm-1 in some cases (54 
> processors versus 28 processors, for example). However, the variation in 
> the 4th and higher frequencies becomes much smaller, now only around 0.2 
> cm-1. When dynmat is run, the variation between jobs with differing 
> number of processors remains low, around 0.1 cm-1, and no imaginary 
> modes are obtained in this case. The details of these results are 
> provided below, and were conducted on our local HPC.
> 
> The following results are obtained with tr2_ph = 1e-14, and all 
> frequencies are reported in cm-1:
> +---------+-----------------------+----------------------- 
> +-----------------------+-----------------------+-----------------------+
> | nprocs  | 2                 | 28                | 32                | 
> 54                | 56                |
> +---------+-----------------------+----------------------- 
> +-----------------------+-----------------------+-----------------------+
> | ph.out  | freq (1) = -50.487262 | freq (1) = -60.136621 | freq (1) = 
> -50.482019 | freq (1) = -60.136609 | freq (1) = -60.136707 |
> |     | freq (2) = -41.007113 | freq (2) = -60.136621 | freq (2) = 
> -41.006774 | freq (2) = -60.136609 | freq (2) = -60.136707 |
> |     | freq (3) = -41.007113 | freq (3) = -50.478600 | freq (3) = 
> -41.006774 | freq (3) = -55.994131 | freq (3) = -55.995244 |
> |     | freq (4) = 15.484027  | freq (4) = 15.483725  | freq (4) = 
> 15.484137  | freq (4) = -50.470542 | freq (4) = -50.477401 |
> |     | freq (5) = 15.484027  | freq (5) = 15.483725  | freq (5) = 
> 15.484137  | freq (5) = 15.484171  | freq (5) = 18.326070  |
> |     | freq (6) = 105.412148 | freq (6) = 105.412121 | freq (6) = 
> 105.412117 | freq (6) = 15.484171  | freq (6) = 18.326070  |
> +---------+-----------------------+----------------------- 
> +-----------------------+-----------------------+-----------------------+
> | dyn.out | mode (1) = -0.00  | mode (1) = -0.00  | mode (1) = -0.00  | 
> mode (1) = -55.99 | mode (1) = -56.00 |
> |     | mode (2) = -0.00  | mode (2) = 0.00   | mode (2) = 0.00   | mode 
> (2) = -0.00  | mode (2) = -0.00  |
> |     | mode (3) = 0.00   | mode (3) = 0.00   | mode (3) = 0.00   | mode 
> (3) = -0.00  | mode (3) = -0.00  |
> |     | mode (4) = 15.48  | mode (4) = 15.48  | mode (4) = 15.48  | mode 
> (4) = 0.00   | mode (4) = -0.00  |
> |     | mode (5) = 15.48  | mode (5) = 15.48  | mode (5) = 15.48  | mode 
> (5) = 15.48  | mode (5) = 18.33  |
> |     | mode (6) = 105.41 | mode (6) = 105.41 | mode (6) = 105.41 | mode 
> (6) = 15.48  | mode (6) = 18.33  |
> +---------+-----------------------+----------------------- 
> +-----------------------+-----------------------+-----------------------+
> 
> The following results are obtained with tr2_ph = 1e-16 , and all 
> frequencies are reported in cm-1:
> +---------+-----------------------+----------------------- 
> +-----------------------+-----------------------+-----------------------+
> | nprocs  | 2                 | 28                | 32                | 
> 54                | 56                |
> +---------+-----------------------+----------------------- 
> +-----------------------+-----------------------+-----------------------+
> | ph.out  | freq (1) = 4.232489   | freq (1) = 5.581179   | freq (1) = 
> 4.232403   | freq (1) = -3.089218  | freq (1) = -3.089027  |
> |     | freq (2) = 4.232489   | freq (2) = 5.581179   | freq (2) = 
> 4.232403   | freq (2) = -3.089218  | freq (2) = -3.089027  |
> |     | freq (3) = 5.875788   | freq (3) = 5.879649   | freq (3) = 
> 5.874594   | freq (3) = 5.884004   | freq (3) = 5.879562   |
> |     | freq (4) = 108.212603 | freq (4) = 108.010685 | freq (4) = 
> 108.212604 | freq (4) = 108.132624 | freq (4) = 108.132629 |
> |     | freq (5) = 108.272083 | freq (5) = 108.348110 | freq (5) = 
> 108.272082 | freq (5) = 108.395919 | freq (5) = 108.395934 |
> |     | freq (6) = 108.272083 | freq (6) = 108.348110 | freq (6) = 
> 108.272082 | freq (6) = 108.395919 | freq (6) = 108.395934 |
> +---------+-----------------------+----------------------- 
> +-----------------------+-----------------------+-----------------------+
> | dyn.out | mode (1) = -0.00  | mode (1) = -0.00  | mode (1) = -0.00  | 
> mode (1) = 0.00   | mode (1) = -0.00  |
> |     | mode (2) = 0.00   | mode (2) = 0.00   | mode (2) = 0.00   | mode 
> (2) = 0.00   | mode (2) = 0.00   |
> |     | mode (3) = 0.00   | mode (3) = 0.00   | mode (3) = 0.00   | mode 
> (3) = 0.00   | mode (3) = 0.00   |
> |     | mode (4) = 108.21 | mode (4) = 108.01 | mode (4) = 108.21 | mode 
> (4) = 108.13 | mode (4) = 108.13 |
> |     | mode (5) = 108.27 | mode (5) = 108.35 | mode (5) = 108.27 | mode 
> (5) = 108.40 | mode (5) = 108.40 |
> |     | mode (6) = 108.27 | mode (6) = 108.35 | mode (6) = 108.27 | mode 
> (6) = 108.40 | mode (6) = 108.40 |
> +---------+-----------------------+----------------------- 
> +-----------------------+-----------------------+-----------------------+
> 
> The LDA pseudopotentials used in these jobs were generated using SG15 
> ONCVPSP scalar-relativistic version 3.3.1 from http://_quantum- 
> simulation.org/potentials/sg15_oncv/upf <http://quantum-simulation.org/ 
> potentials/sg15_oncv/upf>_ for Se, which was modified to turn off non- 
> linear core corrections and to use the LDA functional. The .upf for this 
> modified pseudopotential is available here, for now: _https:// 
> www.ocf.io/~rleano/docs/QE/Se_SG15-LDA.upf <https://www.ocf.io/~rleano/ 
> docs/QE/Se_SG15-LDA.upf>_. The Mo pseudopotential is LDA, standard 
> accuracy, NC SR (ONCVPSP) v0.4.1 obtained from PseudoDojo. The issue 
> still persists if the more readily obtained NC SR (ONCVPSP) v0.4.1, with 
> LDA XC pseudopotential and standard accuracy obtained from PseudoDojo is 
> used for Se.
> 
> Sincerely,
> Remi Leano
> PhD Candidate
> UC Merced
> 
> %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
> 
> INPUT FILE FOR PW.X
> 
> %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
> &control
> calculation = 'scf'
> pseudo_dir  = '.'
> tstress     = .true.
> etot_conv_thr   = 1.0D-6
> forc_conv_thr   = 1.0D-4
> outdir      = 'temp'
> /
> 
> &system
> ibrav       = 14
> celldm(1)   = 12.267656
> celldm(2)   = 1.0
> celldm(3)   = 1.848496
> celldm(4)   = 0.0
> celldm(5)   = 0.0
> celldm(6)   = -0.5
> nat         = 12
> ntyp        = 2
> occupations = 'fixed'
> ecutwfc     = 75.0
> /
> 
> &electrons
> conv_thr    = 1.0D-8
> /
> 
> 
> &ions
> /
> &cell
> press_conv_thr = 1.0D-2
> cell_dofree = '2Dxy'
> /
> 
> 
> ATOMIC_SPECIES
> Mo 95.95 Mo.UPF
> Se 78.971 Se_SG15-LDA.upf
> 
> 
> ATOMIC_POSITIONS {crystal}
> Mo        0.1666666558    0.3333333215    0.5000000000
> Se        0.3333333575    0.1666666737    0.3617656310
> Se        0.3333333575    0.1666666737    0.6382343690
> Mo        0.6666666600    0.3333333300    0.5000000000
> Se        0.8333333163    0.1666666737    0.3617656310
> Se        0.8333333163    0.1666666737    0.6382343690
> Mo        0.1666666558    0.8333333342    0.5000000000
> Se        0.3333333300    0.6666666600    0.3617655882
> Se        0.3333333300    0.6666666600    0.6382344118
> Mo        0.6666666685    0.8333333342    0.5000000000
> Se        0.8333333163    0.6666666325    0.3617656310
> Se        0.8333333163    0.6666666325    0.6382343690
> 
> 
> K_POINTS {automatic}
> 2 2 1 1 1 0
> 
> %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
> 
> INPUT FILE FOR PH.X
> 
> %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
> 
> &inputph
>    tr2_ph   = 1e-12
>    alpha_mix(1)  = 0.7
>    fildrho  = 'PH.drho'
>    trans    = .true.
>    ldisp    = .false.
>    fildyn   = 'PH.dyn1'
>    epsil    = .true.
>    lraman   = .true.
> /
> 0.0 0.0 0.0
> 
> %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
> 
> INPUT FILE FOR DYNMAT.X
> 
> %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
> 
> &input
>    fildyn = 'PH.dyn1'
>    asr = 'crystal'
>    filout = 'PH.modes.dat'
>    fileig = 'PH.eig.dat'
>    filxsf = 'PH.axsf'
> /
> 
> 
> 
> 
> ________________________________________________
> The Quantum ESPRESSO community stands by the Ukrainian people
>   and expresses its concerns about the devastating effects that
> the Russian military offensive has on their country and on the
> free and peaceful scientific, cultural, and economic cooperation
> amongst peoples.
> _______________________________________________
> developers mailing list
> developers at lists.quantum-espresso.org
> https://lists.quantum-espresso.org/mailman/listinfo/developers

-- 
Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e Fisiche,
Univ. Udine, via delle Scienze 206, 33100 Udine Italy, +39-0432-558216