[QE-users] QE-GPU: Discrepancy in forces and problem in using OMP threading

Manish Kumar manish.kumar at acads.iiserpune.ac.in
Wed Mar 2 09:39:58 CET 2022


Dear all,

I am using QE-GPU compiled on a 48-core Intel(R) Xeon(R) Platinum 8268 CPU
@ 2.90GHz and four NVIDIA V100 GPU cards. To use all the CPUs, I am using
the OMP_NUM_THREADS variable in the slurm script. The jobs are run with
"mpirun -np [nGPU] pw.x", where nGPU refers to the number of GPUs used. Our
system size (130 electrons and 64 k-points, the input file is given below)
is comparable to some systems in J. Chem. Phys. 152, 154105 (2020);
https://doi.org/10.1063/5.0005082.

I have two issues/questions with QE-GPU:
1. The largest discrepancy in the atomic force between CPU and GPU is
1.34x10^-4 Ry/Bohr. What is the acceptable value for the discrepancy?
2. I am experiencing a significant increase in CPU time when I use multiple
OMP threads for SCF calculations, as you can see below. Could you please
suggest any solution to this and let me know if I am doing anything
incorrectly? Any help would be much appreciated.
The details are as follows:

nGPU=1
--------------------------------
# of Threads                      CPU Time (s)
WALL Time(s)
01                                           254.23
           384.27
02                                           295.45
           466.33
03                                           328.89
           538.62
04                                           348.81
           602.85
08                                           501.31
           943.32
12                                           698.45
           1226.86
16                                           836.71
           1505.39
20                                           905.77
           1645.66
24                                           1094.81
           1973.97
28                                           1208.93
           2278.81
32                                           1403.27
           2570.51
36                                           1688.97
           3068.91
40                                           1820.06
           3306.49
44                                           1905.88
           3603.96
48                                           2163.18
           4088.75
--------------------------------

nGPU=2
--------------------------------
# of Threads                      CPU Time (s)
WALL Time(s)
01                                           226.69
           329.51
02                                           271.29
           336.65
03                                           312.36
           335.24
04                                           341.50
           333.20
06                                           400.42
           328.66
12                                           632.82
           332.90
24                                           992.02
           335.28
48                                           1877.65
          438.40
--------------------------------

nGPU=4
--------------------------------
# of Threads                      CPU Time (s)
WALL Time(s)
01                                           237.48
           373.21
02                                           268.85
           382.92
03                                           311.39
           391.29
04                                           341.14
           391.71
06                                           422.42
           391.13
12                                           632.94
           396.75
24                                           961.57
           474.70
48                                           2509.10
          894.79
--------------------------------

The input file is:
--------------------------------------------
&control
    calculation = 'scf',
    prefix = "cofe2o4"
    outdir = "./t"
    pseudo_dir = "./"
    tstress=.true.
    tprnfor=.true.
/
&system
    ibrav = 2,
     nat = 14,
     ntyp = 4,
    celldm(1) = 15.9647d0
    ecutwfc = 45
    ecutrho = 450
    nspin = 2
    starting_magnetization(1)= 1.0,
    starting_magnetization(3)=1.0,
    starting_magnetization(2)=-1.0,
    occupations = 'smearing',
    degauss = 0.005,
    smearing = 'mv'
    lda_plus_u = .true.,
    lda_plus_u_kind = 0,
    U_projection_type = 'atomic',
    Hubbard_U(1) = 3.5D0
    Hubbard_U(2) = 3.5D0
    Hubbard_U(3) = 3.0D0
/
&electrons
    mixing_mode = 'local-TF'
    mixing_beta = 0.2
    conv_thr = 1.D-7
    electron_maxstep = 250
    diagonalization ='david'
/
&IONS
/
ATOMIC_SPECIES
   Fe1   55.8450000000  Fe.pbe-sp-van_mit.UPF
   Fe2   55.8450000000  Fe.pbe-sp-van_mit.UPF
   Co   58.9332000000  Co.pbe-nd-rrkjus.UPF
    O   15.9994000000  O.pbe-rrkjus.UPF
ATOMIC_POSITIONS crystal
Fe1           0.0000000000        0.5000000000        0.5000000000
Fe1           0.5000000000        0.0000000000        0.5000000000
Co            0.5000000000        0.5000000000        0.0000000000
Co            0.5000000000        0.5000000000        0.5000000000
Fe2           0.1206093444        0.1206093444        0.1293906556
Fe2           0.8793906556        0.8793906556        0.8706093444
O             0.2489473315        0.2489473315        0.2660301248
O             0.2489473315        0.2489473315        0.7360752123
O            -0.2447080455        0.2661185400        0.7392947527
O             0.2447080455        0.7338814600        0.2607052473
O             0.2661185400        0.7552919545       -0.2607052473
O             0.7338814600        0.2447080455        0.2607052473
O             0.7510526685       -0.2489473315        0.2639247877
O             0.7510526685        0.7510526685        0.7339698752
K_POINTS (automatic)
7 7 7 0 0 0
-----------------------------------------------------------

Best regards
Manish Kumar
IISER Pune, India
ᐧ
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20220302/c4a2ae07/attachment.html>


More information about the users mailing list