[QE-developers] Different values in phonon calculations with different numbers of processors
Paolo Giannozzi
paolo.giannozzi at uniud.it
Sat Oct 12 12:35:44 CEST 2024
Just a few quick impressions:
- the numerical noise in zero-frequency acoustic modes at Gamma is,
well, numerical noise: the value of Acoustic Sum Rule violation may
fluctuate
- the results with tr2_ph=10^-16 differ A LOT from those with
tr2_ph=10^-14, so presumably the latter aren't converged at all. I don't
see anything anomalous in the behavior of results with tr2_ph=10^-16
(even with a number of processors that is presumably too large for a
relatively small system like MoSe2). Unconverged calculations with
different number of processors may "un-converge" to different results,
due to unavoidable numerical differences
Paolo
On 10/10/2024 22:47, Remi Leano wrote:
>
> Dear developers,
>
> We have noticed the results from ph.x have a significant dependence on
> the number of processors used in the calculation, which we would like to
> share here. I have reproduced this on QE 7.1, 7.2, and 7.3.1 on two
> separate machines (NERSC’s Perlmutter, and our university’s HPC
> facility). NERSC’s Perlmutter has an AMD EPYC 7763 CPU, and our
> university’s HPC facility runs on a Xeon Gold 6330 CPU with a x86_64
> architecture. Our home HPC’s QE7.1 was compiled with Intel’s oneapi,
> version 2021.4.0, and with mvapich2 2.3.6. On NERSC’s Perlmutter, their
> QE7.1 seems to have been compiled with gcc-native 12.3 and Cray-mpich
> 8.1.28 (I determined this while logged in to Perlmutter, but see also
> https://docs.nersc.gov/development/compilers/base/).
>
> From what I have seen, differences are negligible in simple systems
> such as Si. However, for a more complex system, such as MoSe2, the first
> frequency in the output from ph.x can vary from -50.482019 cm-1 to
> -60.136795 cm-1. When I run in serial, -60.136707 cm-1 is obtained for
> the first frequency.
>
> The last discussion of this parallelization issue seems to have been
> from 21 Mar 2024 (_https://www.mail-archive.com/users@lists.quantum-
> espresso.org/msg44272.html <https://www.mail-archive.com/
> users at lists.quantum-espresso.org/msg44272.html>_), and prior to that in
> from 16 Feb 2007 in version 3 (_https://www.mail-archive.com/
> users at lists.quantum-espresso.org/msg10137.html <https://www.mail-
> archive.com/users at lists.quantum-espresso.org/msg10137.html>_). In the
> last post it was mentioned that zero-frequency acoustic modes that have
> non-zero frequencies from ph.x may vary by a few cm-1, but in this case
> we have found some results which have variation of 2 to 10 cm-1 for most
> of the modes. Additionally, one of the low-lying frequencies results in
> an imaginary mode after the Acoustic Sum Rule is applied which is not
> observed when ph.x is run in serial.
>
> I have provided details about the machines used at the end. All tests
> are done on CPUs and with one thread.
>
> Here is how I have obtained these results:
>
> 1.
> A scf calculation is done on the structure.
> 2.
> The output of the scf calculation is copied into different
> directories, one for each test of the number of processors, such
> that all calculations have the same starting point for the ph.x
> calculation.
> 3.
> The batch script’s value for the number of processors for each test
> is modified, holding all else fixed.
>
>
> While there is also some variation in scf total energies depending on
> the number of processors used, it seems within the amount of numerical
> variation which is unavoidable and expected. So, for the purpose of this
> test, the number of processors used for the scf is held fixed (56
> processors) such that the variation can be attributed entirely to the
> number of processors used in the ph.x step. My input files are below.
> The data shown is from one pool (-nk 1). However, it was found when
> comparing runs with different numbers of processors and/or numbers of
> pools that the frequency results are determined by the number of
> processors in the pools group (nproc). This was consistent between both
> NERSC’s Perlmutter and our local HPC.
>
> Values I obtain from ph.x for QE 7.1 on our university’s HPC, each
> starting from exactly identical scf output files are in the table below.
> +------------+----------------------------------
> +--------------------------------+-----------------------------------+
> | | First frequency in ph.out [cm-1] | [1,1][1,1] of
> dynamical matrix | First frequency in dyn.out [cm-1] |
> +------------+----------------------------------
> +--------------------------------+-----------------------------------+
> | 1 (Serial) | -50.483039 | 0.42773666
> | 0.00 |
> +------------+----------------------------------
> +--------------------------------+-----------------------------------+
> | 2 | -50.487262 | 0.42773677
> | -0.00 |
> +------------+----------------------------------
> +--------------------------------+-----------------------------------+
> | 4 | -50.482019 | 0.42773639
> | -0.00 |
> +------------+----------------------------------
> +--------------------------------+-----------------------------------+
> | 7 | -50.487262 | 0.42773677
> | -0.00 |
> +------------+----------------------------------
> +--------------------------------+-----------------------------------+
> | 8 | -50.482019 | 0.42773639
> | -0.00 |
> +------------+----------------------------------
> +--------------------------------+-----------------------------------+
> | 14 | -50.487262 | 0.42773677
> | -0.00 |
> +------------+----------------------------------
> +--------------------------------+-----------------------------------+
> | 28 | -60.136621 | 0.41509106
> | -0.00 |
> +------------+----------------------------------
> +--------------------------------+-----------------------------------+
> | 30 | -60.136257 | 0.41509080
> | -0.00 |
> +------------+----------------------------------
> +--------------------------------+-----------------------------------+
> | 32 | -50.482019 | 0.42773639
> | -0.00 |
> +------------+----------------------------------
> +--------------------------------+-----------------------------------+
> | 40 | -60.136605 | 0.41509088
> | -0.00 |
> +------------+----------------------------------
> +--------------------------------+-----------------------------------+
> | 50 | -60.136257 | 0.41509080
> | -0.00 |
> +------------+----------------------------------
> +--------------------------------+-----------------------------------+
> | 52 | -60.136795 | 0.41509068
> | -0.00 |
> +------------+----------------------------------
> +--------------------------------+-----------------------------------+
> | 54 | -60.136609 | 0.33587825
> | -55.99 |
> +------------+----------------------------------
> +--------------------------------+-----------------------------------+
> | 56 | -60.136707 | 0.33587767
> | -56.00 |
> +------------+----------------------------------
> +--------------------------------+-----------------------------------+
>
> In many cases, the application of the Acoustic Sum Rule by dynmat.x
> masks this issue. We noticed a difference in values for the first entry
> of the dynamical matrix obtained in the 56 and 54 processor tests
> compared to other runs, and thought it could be the origin of the
> discrepancy. I edited this single value of dynamical matrix of the 56
> processor case to match the value from the 2 processor run (0.42773677
> cm-1), and an entirely different value not obtained from other tests,
> -35.85 cm-1, was obtained for the first frequency in dyn.out. Therefore,
> while the first entry of the dynamical matrix is one of the most
> noticeable differences between the outputs from ph.x, there must be at
> least one other significant difference elsewhere, which we have not been
> able to pinpoint.
>
> We increased tr2_ph (threshold for self-consistency) from the default of
> 1.0e-12 to 1.0e-16, as was suggested in the comments of the 21 Mar 2024
> post, and we did find that this improved the results. When tr2_ph is
> decreased to 1e-16, there is still large variation in the ph.x
> frequencies of the first three modes, around 9 cm-1 in some cases (54
> processors versus 28 processors, for example). However, the variation in
> the 4th and higher frequencies becomes much smaller, now only around 0.2
> cm-1. When dynmat is run, the variation between jobs with differing
> number of processors remains low, around 0.1 cm-1, and no imaginary
> modes are obtained in this case. The details of these results are
> provided below, and were conducted on our local HPC.
>
> The following results are obtained with tr2_ph = 1e-14, and all
> frequencies are reported in cm-1:
> +---------+-----------------------+-----------------------
> +-----------------------+-----------------------+-----------------------+
> | nprocs | 2 | 28 | 32 |
> 54 | 56 |
> +---------+-----------------------+-----------------------
> +-----------------------+-----------------------+-----------------------+
> | ph.out | freq (1) = -50.487262 | freq (1) = -60.136621 | freq (1) =
> -50.482019 | freq (1) = -60.136609 | freq (1) = -60.136707 |
> | | freq (2) = -41.007113 | freq (2) = -60.136621 | freq (2) =
> -41.006774 | freq (2) = -60.136609 | freq (2) = -60.136707 |
> | | freq (3) = -41.007113 | freq (3) = -50.478600 | freq (3) =
> -41.006774 | freq (3) = -55.994131 | freq (3) = -55.995244 |
> | | freq (4) = 15.484027 | freq (4) = 15.483725 | freq (4) =
> 15.484137 | freq (4) = -50.470542 | freq (4) = -50.477401 |
> | | freq (5) = 15.484027 | freq (5) = 15.483725 | freq (5) =
> 15.484137 | freq (5) = 15.484171 | freq (5) = 18.326070 |
> | | freq (6) = 105.412148 | freq (6) = 105.412121 | freq (6) =
> 105.412117 | freq (6) = 15.484171 | freq (6) = 18.326070 |
> +---------+-----------------------+-----------------------
> +-----------------------+-----------------------+-----------------------+
> | dyn.out | mode (1) = -0.00 | mode (1) = -0.00 | mode (1) = -0.00 |
> mode (1) = -55.99 | mode (1) = -56.00 |
> | | mode (2) = -0.00 | mode (2) = 0.00 | mode (2) = 0.00 | mode
> (2) = -0.00 | mode (2) = -0.00 |
> | | mode (3) = 0.00 | mode (3) = 0.00 | mode (3) = 0.00 | mode
> (3) = -0.00 | mode (3) = -0.00 |
> | | mode (4) = 15.48 | mode (4) = 15.48 | mode (4) = 15.48 | mode
> (4) = 0.00 | mode (4) = -0.00 |
> | | mode (5) = 15.48 | mode (5) = 15.48 | mode (5) = 15.48 | mode
> (5) = 15.48 | mode (5) = 18.33 |
> | | mode (6) = 105.41 | mode (6) = 105.41 | mode (6) = 105.41 | mode
> (6) = 15.48 | mode (6) = 18.33 |
> +---------+-----------------------+-----------------------
> +-----------------------+-----------------------+-----------------------+
>
> The following results are obtained with tr2_ph = 1e-16 , and all
> frequencies are reported in cm-1:
> +---------+-----------------------+-----------------------
> +-----------------------+-----------------------+-----------------------+
> | nprocs | 2 | 28 | 32 |
> 54 | 56 |
> +---------+-----------------------+-----------------------
> +-----------------------+-----------------------+-----------------------+
> | ph.out | freq (1) = 4.232489 | freq (1) = 5.581179 | freq (1) =
> 4.232403 | freq (1) = -3.089218 | freq (1) = -3.089027 |
> | | freq (2) = 4.232489 | freq (2) = 5.581179 | freq (2) =
> 4.232403 | freq (2) = -3.089218 | freq (2) = -3.089027 |
> | | freq (3) = 5.875788 | freq (3) = 5.879649 | freq (3) =
> 5.874594 | freq (3) = 5.884004 | freq (3) = 5.879562 |
> | | freq (4) = 108.212603 | freq (4) = 108.010685 | freq (4) =
> 108.212604 | freq (4) = 108.132624 | freq (4) = 108.132629 |
> | | freq (5) = 108.272083 | freq (5) = 108.348110 | freq (5) =
> 108.272082 | freq (5) = 108.395919 | freq (5) = 108.395934 |
> | | freq (6) = 108.272083 | freq (6) = 108.348110 | freq (6) =
> 108.272082 | freq (6) = 108.395919 | freq (6) = 108.395934 |
> +---------+-----------------------+-----------------------
> +-----------------------+-----------------------+-----------------------+
> | dyn.out | mode (1) = -0.00 | mode (1) = -0.00 | mode (1) = -0.00 |
> mode (1) = 0.00 | mode (1) = -0.00 |
> | | mode (2) = 0.00 | mode (2) = 0.00 | mode (2) = 0.00 | mode
> (2) = 0.00 | mode (2) = 0.00 |
> | | mode (3) = 0.00 | mode (3) = 0.00 | mode (3) = 0.00 | mode
> (3) = 0.00 | mode (3) = 0.00 |
> | | mode (4) = 108.21 | mode (4) = 108.01 | mode (4) = 108.21 | mode
> (4) = 108.13 | mode (4) = 108.13 |
> | | mode (5) = 108.27 | mode (5) = 108.35 | mode (5) = 108.27 | mode
> (5) = 108.40 | mode (5) = 108.40 |
> | | mode (6) = 108.27 | mode (6) = 108.35 | mode (6) = 108.27 | mode
> (6) = 108.40 | mode (6) = 108.40 |
> +---------+-----------------------+-----------------------
> +-----------------------+-----------------------+-----------------------+
>
> The LDA pseudopotentials used in these jobs were generated using SG15
> ONCVPSP scalar-relativistic version 3.3.1 from http://_quantum-
> simulation.org/potentials/sg15_oncv/upf <http://quantum-simulation.org/
> potentials/sg15_oncv/upf>_ for Se, which was modified to turn off non-
> linear core corrections and to use the LDA functional. The .upf for this
> modified pseudopotential is available here, for now: _https://
> www.ocf.io/~rleano/docs/QE/Se_SG15-LDA.upf <https://www.ocf.io/~rleano/
> docs/QE/Se_SG15-LDA.upf>_. The Mo pseudopotential is LDA, standard
> accuracy, NC SR (ONCVPSP) v0.4.1 obtained from PseudoDojo. The issue
> still persists if the more readily obtained NC SR (ONCVPSP) v0.4.1, with
> LDA XC pseudopotential and standard accuracy obtained from PseudoDojo is
> used for Se.
>
> Sincerely,
> Remi Leano
> PhD Candidate
> UC Merced
>
> %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
>
> INPUT FILE FOR PW.X
>
> %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
> &control
> calculation = 'scf'
> pseudo_dir = '.'
> tstress = .true.
> etot_conv_thr = 1.0D-6
> forc_conv_thr = 1.0D-4
> outdir = 'temp'
> /
>
> &system
> ibrav = 14
> celldm(1) = 12.267656
> celldm(2) = 1.0
> celldm(3) = 1.848496
> celldm(4) = 0.0
> celldm(5) = 0.0
> celldm(6) = -0.5
> nat = 12
> ntyp = 2
> occupations = 'fixed'
> ecutwfc = 75.0
> /
>
> &electrons
> conv_thr = 1.0D-8
> /
>
>
> &ions
> /
> &cell
> press_conv_thr = 1.0D-2
> cell_dofree = '2Dxy'
> /
>
>
> ATOMIC_SPECIES
> Mo 95.95 Mo.UPF
> Se 78.971 Se_SG15-LDA.upf
>
>
> ATOMIC_POSITIONS {crystal}
> Mo 0.1666666558 0.3333333215 0.5000000000
> Se 0.3333333575 0.1666666737 0.3617656310
> Se 0.3333333575 0.1666666737 0.6382343690
> Mo 0.6666666600 0.3333333300 0.5000000000
> Se 0.8333333163 0.1666666737 0.3617656310
> Se 0.8333333163 0.1666666737 0.6382343690
> Mo 0.1666666558 0.8333333342 0.5000000000
> Se 0.3333333300 0.6666666600 0.3617655882
> Se 0.3333333300 0.6666666600 0.6382344118
> Mo 0.6666666685 0.8333333342 0.5000000000
> Se 0.8333333163 0.6666666325 0.3617656310
> Se 0.8333333163 0.6666666325 0.6382343690
>
>
> K_POINTS {automatic}
> 2 2 1 1 1 0
>
> %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
>
> INPUT FILE FOR PH.X
>
> %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
>
> &inputph
> tr2_ph = 1e-12
> alpha_mix(1) = 0.7
> fildrho = 'PH.drho'
> trans = .true.
> ldisp = .false.
> fildyn = 'PH.dyn1'
> epsil = .true.
> lraman = .true.
> /
> 0.0 0.0 0.0
>
> %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
>
> INPUT FILE FOR DYNMAT.X
>
> %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
>
> &input
> fildyn = 'PH.dyn1'
> asr = 'crystal'
> filout = 'PH.modes.dat'
> fileig = 'PH.eig.dat'
> filxsf = 'PH.axsf'
> /
>
>
>
>
> ________________________________________________
> The Quantum ESPRESSO community stands by the Ukrainian people
> and expresses its concerns about the devastating effects that
> the Russian military offensive has on their country and on the
> free and peaceful scientific, cultural, and economic cooperation
> amongst peoples.
> _______________________________________________
> developers mailing list
> developers at lists.quantum-espresso.org
> https://lists.quantum-espresso.org/mailman/listinfo/developers
--
Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e Fisiche,
Univ. Udine, via delle Scienze 206, 33100 Udine Italy, +39-0432-558216
More information about the developers
mailing list