[QE-users] QE-GPU: Discrepancy in forces and problem in using OMP threading

Filippo Spiga spiga.filippo at gmail.com
Fri Mar 4 10:33:39 CET 2022


Dear Manish,

when you use nGPU=4, the "# of Threads" column specify the aggregate number
of threads? Meaning, are you using OMP_NUM_TRHREADS=48 or
OMP_NUM_TRHREADS=48? From you email it is not clear and, if you
oversubscribe physical cores with threads or processes then performance is
not going to be great.

Also, you must manage bindings properly otherwise MPI processed bind to GPU
on another socket need top cross the awful CPU-to-CPU link. Have a look at
'--map-by' option in mpirun. For 4 GPU, using 4 MPI processes and 12 OpenMP
threads, your mpirun will look like this:

export OMP_NUM_THRTEADS=12
mpirun -np 4 --map-by ppr:4:node:PE=12 ./pw.x

If you are running on a HPC system managed by someone else, try reach out
the User Support and get guidance on correct binding and environment. What
you are observing is very likely not related to QE-GPU but how you are
running your calculations.

HTH

--
Filippo SPIGA ~ http://fspiga.github.io ~ skype: filippo.spiga


On Wed, 2 Mar 2022 at 08:40, Manish Kumar <
manish.kumar at acads.iiserpune.ac.in> wrote:

> Dear all,
>
> I am using QE-GPU compiled on a 48-core Intel(R) Xeon(R) Platinum 8268 CPU
> @ 2.90GHz and four NVIDIA V100 GPU cards. To use all the CPUs, I am using
> the OMP_NUM_THREADS variable in the slurm script. The jobs are run with
> "mpirun -np [nGPU] pw.x", where nGPU refers to the number of GPUs used. Our
> system size (130 electrons and 64 k-points, the input file is given below)
> is comparable to some systems in J. Chem. Phys. 152, 154105 (2020);
> https://doi.org/10.1063/5.0005082.
>
> I have two issues/questions with QE-GPU:
> 1. The largest discrepancy in the atomic force between CPU and GPU is
> 1.34x10^-4 Ry/Bohr. What is the acceptable value for the discrepancy?
> 2. I am experiencing a significant increase in CPU time when I use
> multiple OMP threads for SCF calculations, as you can see below. Could you
> please suggest any solution to this and let me know if I am doing anything
> incorrectly? Any help would be much appreciated.
> The details are as follows:
>
> nGPU=1
> --------------------------------
> # of Threads                      CPU Time (s)
> WALL Time(s)
> 01                                           254.23
>              384.27
> 02                                           295.45
>              466.33
> 03                                           328.89
>              538.62
> 04                                           348.81
>              602.85
> 08                                           501.31
>              943.32
> 12                                           698.45
>              1226.86
> 16                                           836.71
>              1505.39
> 20                                           905.77
>              1645.66
> 24                                           1094.81
>              1973.97
> 28                                           1208.93
>              2278.81
> 32                                           1403.27
>              2570.51
> 36                                           1688.97
>              3068.91
> 40                                           1820.06
>              3306.49
> 44                                           1905.88
>              3603.96
> 48                                           2163.18
>              4088.75
> --------------------------------
>
> nGPU=2
> --------------------------------
> # of Threads                      CPU Time (s)
> WALL Time(s)
> 01                                           226.69
>              329.51
> 02                                           271.29
>              336.65
> 03                                           312.36
>              335.24
> 04                                           341.50
>              333.20
> 06                                           400.42
>              328.66
> 12                                           632.82
>              332.90
> 24                                           992.02
>              335.28
> 48                                           1877.65
>             438.40
> --------------------------------
>
> nGPU=4
> --------------------------------
> # of Threads                      CPU Time (s)
> WALL Time(s)
> 01                                           237.48
>              373.21
> 02                                           268.85
>              382.92
> 03                                           311.39
>              391.29
> 04                                           341.14
>              391.71
> 06                                           422.42
>              391.13
> 12                                           632.94
>              396.75
> 24                                           961.57
>              474.70
> 48                                           2509.10
>             894.79
> --------------------------------
>
> The input file is:
> --------------------------------------------
> &control
>     calculation = 'scf',
>     prefix = "cofe2o4"
>     outdir = "./t"
>     pseudo_dir = "./"
>     tstress=.true.
>     tprnfor=.true.
> /
> &system
>     ibrav = 2,
>      nat = 14,
>      ntyp = 4,
>     celldm(1) = 15.9647d0
>     ecutwfc = 45
>     ecutrho = 450
>     nspin = 2
>     starting_magnetization(1)= 1.0,
>     starting_magnetization(3)=1.0,
>     starting_magnetization(2)=-1.0,
>     occupations = 'smearing',
>     degauss = 0.005,
>     smearing = 'mv'
>     lda_plus_u = .true.,
>     lda_plus_u_kind = 0,
>     U_projection_type = 'atomic',
>     Hubbard_U(1) = 3.5D0
>     Hubbard_U(2) = 3.5D0
>     Hubbard_U(3) = 3.0D0
> /
> &electrons
>     mixing_mode = 'local-TF'
>     mixing_beta = 0.2
>     conv_thr = 1.D-7
>     electron_maxstep = 250
>     diagonalization ='david'
> /
> &IONS
> /
> ATOMIC_SPECIES
>    Fe1   55.8450000000  Fe.pbe-sp-van_mit.UPF
>    Fe2   55.8450000000  Fe.pbe-sp-van_mit.UPF
>    Co   58.9332000000  Co.pbe-nd-rrkjus.UPF
>     O   15.9994000000  O.pbe-rrkjus.UPF
> ATOMIC_POSITIONS crystal
> Fe1           0.0000000000        0.5000000000        0.5000000000
> Fe1           0.5000000000        0.0000000000        0.5000000000
> Co            0.5000000000        0.5000000000        0.0000000000
> Co            0.5000000000        0.5000000000        0.5000000000
> Fe2           0.1206093444        0.1206093444        0.1293906556
> Fe2           0.8793906556        0.8793906556        0.8706093444
> O             0.2489473315        0.2489473315        0.2660301248
> O             0.2489473315        0.2489473315        0.7360752123
> O            -0.2447080455        0.2661185400        0.7392947527
> O             0.2447080455        0.7338814600        0.2607052473
> O             0.2661185400        0.7552919545       -0.2607052473
> O             0.7338814600        0.2447080455        0.2607052473
> O             0.7510526685       -0.2489473315        0.2639247877
> O             0.7510526685        0.7510526685        0.7339698752
> K_POINTS (automatic)
> 7 7 7 0 0 0
> -----------------------------------------------------------
>
> Best regards
> Manish Kumar
> IISER Pune, India
>> _______________________________________________
> Quantum ESPRESSO is supported by MaX (www.max-centre.eu)
> users mailing list users at lists.quantum-espresso.org
> https://lists.quantum-espresso.org/mailman/listinfo/users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20220304/fccdae63/attachment.html>


More information about the users mailing list