[QE-developers] Different values in phonon calculations with different numbers of processors
Remi Leano
rleano at ucmerced.edu
Thu Oct 10 22:47:19 CEST 2024
Dear developers,
We have noticed the results from ph.x have a significant dependence on the number of processors used in the calculation, which we would like to share here. I have reproduced this on QE 7.1, 7.2, and 7.3.1 on two separate machines (NERSC’s Perlmutter, and our university’s HPC facility). NERSC’s Perlmutter has an AMD EPYC 7763 CPU, and our university’s HPC facility runs on a Xeon Gold 6330 CPU with a x86_64 architecture. Our home HPC’s QE7.1 was compiled with Intel’s oneapi, version 2021.4.0, and with mvapich2 2.3.6. On NERSC’s Perlmutter, their QE7.1 seems to have been compiled with gcc-native 12.3 and Cray-mpich 8.1.28 (I determined this while logged in to Perlmutter, but see also https://docs.nersc.gov/development/compilers/base/).
>From what I have seen, differences are negligible in simple systems such as Si. However, for a more complex system, such as MoSe2, the first frequency in the output from ph.x can vary from -50.482019 cm-1 to -60.136795 cm-1. When I run in serial, -60.136707 cm-1 is obtained for the first frequency.
The last discussion of this parallelization issue seems to have been from 21 Mar 2024 (https://www.mail-archive.com/users@lists.quantum-espresso.org/msg44272.html), and prior to that in from 16 Feb 2007 in version 3 (https://www.mail-archive.com/users@lists.quantum-espresso.org/msg10137.html). In the last post it was mentioned that zero-frequency acoustic modes that have non-zero frequencies from ph.x may vary by a few cm-1, but in this case we have found some results which have variation of 2 to 10 cm-1 for most of the modes. Additionally, one of the low-lying frequencies results in an imaginary mode after the Acoustic Sum Rule is applied which is not observed when ph.x is run in serial.
I have provided details about the machines used at the end. All tests are done on CPUs and with one thread.
Here is how I have obtained these results:
1.
A scf calculation is done on the structure.
2.
The output of the scf calculation is copied into different directories, one for each test of the number of processors, such that all calculations have the same starting point for the ph.x calculation.
3.
The batch script’s value for the number of processors for each test is modified, holding all else fixed.
While there is also some variation in scf total energies depending on the number of processors used, it seems within the amount of numerical variation which is unavoidable and expected. So, for the purpose of this test, the number of processors used for the scf is held fixed (56 processors) such that the variation can be attributed entirely to the number of processors used in the ph.x step. My input files are below. The data shown is from one pool (-nk 1). However, it was found when comparing runs with different numbers of processors and/or numbers of pools that the frequency results are determined by the number of processors in the pools group (nproc). This was consistent between both NERSC’s Perlmutter and our local HPC.
Values I obtain from ph.x for QE 7.1 on our university’s HPC, each starting from exactly identical scf output files are in the table below.
+------------+----------------------------------+--------------------------------+-----------------------------------+
| | First frequency in ph.out [cm-1] | [1,1][1,1] of dynamical matrix | First frequency in dyn.out [cm-1] |
+------------+----------------------------------+--------------------------------+-----------------------------------+
| 1 (Serial) | -50.483039 | 0.42773666 | 0.00 |
+------------+----------------------------------+--------------------------------+-----------------------------------+
| 2 | -50.487262 | 0.42773677 | -0.00 |
+------------+----------------------------------+--------------------------------+-----------------------------------+
| 4 | -50.482019 | 0.42773639 | -0.00 |
+------------+----------------------------------+--------------------------------+-----------------------------------+
| 7 | -50.487262 | 0.42773677 | -0.00 |
+------------+----------------------------------+--------------------------------+-----------------------------------+
| 8 | -50.482019 | 0.42773639 | -0.00 |
+------------+----------------------------------+--------------------------------+-----------------------------------+
| 14 | -50.487262 | 0.42773677 | -0.00 |
+------------+----------------------------------+--------------------------------+-----------------------------------+
| 28 | -60.136621 | 0.41509106 | -0.00 |
+------------+----------------------------------+--------------------------------+-----------------------------------+
| 30 | -60.136257 | 0.41509080 | -0.00 |
+------------+----------------------------------+--------------------------------+-----------------------------------+
| 32 | -50.482019 | 0.42773639 | -0.00 |
+------------+----------------------------------+--------------------------------+-----------------------------------+
| 40 | -60.136605 | 0.41509088 | -0.00 |
+------------+----------------------------------+--------------------------------+-----------------------------------+
| 50 | -60.136257 | 0.41509080 | -0.00 |
+------------+----------------------------------+--------------------------------+-----------------------------------+
| 52 | -60.136795 | 0.41509068 | -0.00 |
+------------+----------------------------------+--------------------------------+-----------------------------------+
| 54 | -60.136609 | 0.33587825 | -55.99 |
+------------+----------------------------------+--------------------------------+-----------------------------------+
| 56 | -60.136707 | 0.33587767 | -56.00 |
+------------+----------------------------------+--------------------------------+-----------------------------------+
In many cases, the application of the Acoustic Sum Rule by dynmat.x masks this issue. We noticed a difference in values for the first entry of the dynamical matrix obtained in the 56 and 54 processor tests compared to other runs, and thought it could be the origin of the discrepancy. I edited this single value of dynamical matrix of the 56 processor case to match the value from the 2 processor run (0.42773677 cm-1), and an entirely different value not obtained from other tests, -35.85 cm-1, was obtained for the first frequency in dyn.out. Therefore, while the first entry of the dynamical matrix is one of the most noticeable differences between the outputs from ph.x, there must be at least one other significant difference elsewhere, which we have not been able to pinpoint.
We increased tr2_ph (threshold for self-consistency) from the default of 1.0e-12 to 1.0e-16, as was suggested in the comments of the 21 Mar 2024 post, and we did find that this improved the results. When tr2_ph is decreased to 1e-16, there is still large variation in the ph.x frequencies of the first three modes, around 9 cm-1 in some cases (54 processors versus 28 processors, for example). However, the variation in the 4th and higher frequencies becomes much smaller, now only around 0.2 cm-1. When dynmat is run, the variation between jobs with differing number of processors remains low, around 0.1 cm-1, and no imaginary modes are obtained in this case. The details of these results are provided below, and were conducted on our local HPC.
The following results are obtained with tr2_ph = 1e-14, and all frequencies are reported in cm-1:
+---------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+
| nprocs | 2 | 28 | 32 | 54 | 56 |
+---------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+
| ph.out | freq (1) = -50.487262 | freq (1) = -60.136621 | freq (1) = -50.482019 | freq (1) = -60.136609 | freq (1) = -60.136707 |
| | freq (2) = -41.007113 | freq (2) = -60.136621 | freq (2) = -41.006774 | freq (2) = -60.136609 | freq (2) = -60.136707 |
| | freq (3) = -41.007113 | freq (3) = -50.478600 | freq (3) = -41.006774 | freq (3) = -55.994131 | freq (3) = -55.995244 |
| | freq (4) = 15.484027 | freq (4) = 15.483725 | freq (4) = 15.484137 | freq (4) = -50.470542 | freq (4) = -50.477401 |
| | freq (5) = 15.484027 | freq (5) = 15.483725 | freq (5) = 15.484137 | freq (5) = 15.484171 | freq (5) = 18.326070 |
| | freq (6) = 105.412148 | freq (6) = 105.412121 | freq (6) = 105.412117 | freq (6) = 15.484171 | freq (6) = 18.326070 |
+---------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+
| dyn.out | mode (1) = -0.00 | mode (1) = -0.00 | mode (1) = -0.00 | mode (1) = -55.99 | mode (1) = -56.00 |
| | mode (2) = -0.00 | mode (2) = 0.00 | mode (2) = 0.00 | mode (2) = -0.00 | mode (2) = -0.00 |
| | mode (3) = 0.00 | mode (3) = 0.00 | mode (3) = 0.00 | mode (3) = -0.00 | mode (3) = -0.00 |
| | mode (4) = 15.48 | mode (4) = 15.48 | mode (4) = 15.48 | mode (4) = 0.00 | mode (4) = -0.00 |
| | mode (5) = 15.48 | mode (5) = 15.48 | mode (5) = 15.48 | mode (5) = 15.48 | mode (5) = 18.33 |
| | mode (6) = 105.41 | mode (6) = 105.41 | mode (6) = 105.41 | mode (6) = 15.48 | mode (6) = 18.33 |
+---------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+
The following results are obtained with tr2_ph = 1e-16 , and all frequencies are reported in cm-1:
+---------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+
| nprocs | 2 | 28 | 32 | 54 | 56 |
+---------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+
| ph.out | freq (1) = 4.232489 | freq (1) = 5.581179 | freq (1) = 4.232403 | freq (1) = -3.089218 | freq (1) = -3.089027 |
| | freq (2) = 4.232489 | freq (2) = 5.581179 | freq (2) = 4.232403 | freq (2) = -3.089218 | freq (2) = -3.089027 |
| | freq (3) = 5.875788 | freq (3) = 5.879649 | freq (3) = 5.874594 | freq (3) = 5.884004 | freq (3) = 5.879562 |
| | freq (4) = 108.212603 | freq (4) = 108.010685 | freq (4) = 108.212604 | freq (4) = 108.132624 | freq (4) = 108.132629 |
| | freq (5) = 108.272083 | freq (5) = 108.348110 | freq (5) = 108.272082 | freq (5) = 108.395919 | freq (5) = 108.395934 |
| | freq (6) = 108.272083 | freq (6) = 108.348110 | freq (6) = 108.272082 | freq (6) = 108.395919 | freq (6) = 108.395934 |
+---------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+
| dyn.out | mode (1) = -0.00 | mode (1) = -0.00 | mode (1) = -0.00 | mode (1) = 0.00 | mode (1) = -0.00 |
| | mode (2) = 0.00 | mode (2) = 0.00 | mode (2) = 0.00 | mode (2) = 0.00 | mode (2) = 0.00 |
| | mode (3) = 0.00 | mode (3) = 0.00 | mode (3) = 0.00 | mode (3) = 0.00 | mode (3) = 0.00 |
| | mode (4) = 108.21 | mode (4) = 108.01 | mode (4) = 108.21 | mode (4) = 108.13 | mode (4) = 108.13 |
| | mode (5) = 108.27 | mode (5) = 108.35 | mode (5) = 108.27 | mode (5) = 108.40 | mode (5) = 108.40 |
| | mode (6) = 108.27 | mode (6) = 108.35 | mode (6) = 108.27 | mode (6) = 108.40 | mode (6) = 108.40 |
+---------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+
The LDA pseudopotentials used in these jobs were generated using SG15 ONCVPSP scalar-relativistic version 3.3.1 from http://quantum-simulation.org/potentials/sg15_oncv/upf<http://quantum-simulation.org/potentials/sg15_oncv/upf> for Se, which was modified to turn off non-linear core corrections and to use the LDA functional. The .upf for this modified pseudopotential is available here, for now: https://www.ocf.io/~rleano/docs/QE/Se_SG15-LDA.upf. The Mo pseudopotential is LDA, standard accuracy, NC SR (ONCVPSP) v0.4.1 obtained from PseudoDojo. The issue still persists if the more readily obtained NC SR (ONCVPSP) v0.4.1, with LDA XC pseudopotential and standard accuracy obtained from PseudoDojo is used for Se.
Sincerely,
Remi Leano
PhD Candidate
UC Merced
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
INPUT FILE FOR PW.X
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
&control
calculation = 'scf'
pseudo_dir = '.'
tstress = .true.
etot_conv_thr = 1.0D-6
forc_conv_thr = 1.0D-4
outdir = 'temp'
/
&system
ibrav = 14
celldm(1) = 12.267656
celldm(2) = 1.0
celldm(3) = 1.848496
celldm(4) = 0.0
celldm(5) = 0.0
celldm(6) = -0.5
nat = 12
ntyp = 2
occupations = 'fixed'
ecutwfc = 75.0
/
&electrons
conv_thr = 1.0D-8
/
&ions
/
&cell
press_conv_thr = 1.0D-2
cell_dofree = '2Dxy'
/
ATOMIC_SPECIES
Mo 95.95 Mo.UPF
Se 78.971 Se_SG15-LDA.upf
ATOMIC_POSITIONS {crystal}
Mo 0.1666666558 0.3333333215 0.5000000000
Se 0.3333333575 0.1666666737 0.3617656310
Se 0.3333333575 0.1666666737 0.6382343690
Mo 0.6666666600 0.3333333300 0.5000000000
Se 0.8333333163 0.1666666737 0.3617656310
Se 0.8333333163 0.1666666737 0.6382343690
Mo 0.1666666558 0.8333333342 0.5000000000
Se 0.3333333300 0.6666666600 0.3617655882
Se 0.3333333300 0.6666666600 0.6382344118
Mo 0.6666666685 0.8333333342 0.5000000000
Se 0.8333333163 0.6666666325 0.3617656310
Se 0.8333333163 0.6666666325 0.6382343690
K_POINTS {automatic}
2 2 1 1 1 0
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
INPUT FILE FOR PH.X
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
&inputph
tr2_ph = 1e-12
alpha_mix(1) = 0.7
fildrho = 'PH.drho'
trans = .true.
ldisp = .false.
fildyn = 'PH.dyn1'
epsil = .true.
lraman = .true.
/
0.0 0.0 0.0
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
INPUT FILE FOR DYNMAT.X
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
&input
fildyn = 'PH.dyn1'
asr = 'crystal'
filout = 'PH.modes.dat'
fileig = 'PH.eig.dat'
filxsf = 'PH.axsf'
/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/developers/attachments/20241010/ba8618ce/attachment-0001.html>
More information about the developers
mailing list