[QE-users] QE-GPU: Discrepancy in forces and problem in using OMP threading
Filippo Spiga
spiga.filippo at gmail.com
Fri Mar 4 10:34:43 CET 2022
Ops, typo while typing from the phone...
"are you using OMP_NUM_THREADS=48 or OMP_NUM_THREADS=12?"
(everything else is correct)
--
Filippo SPIGA ~ http://fspiga.github.io ~ skype: filippo.spiga
On Fri, 4 Mar 2022 at 09:33, Filippo Spiga <spiga.filippo at gmail.com> wrote:
> Dear Manish,
>
> when you use nGPU=4, the "# of Threads" column specify the aggregate
> number of threads? Meaning, are you using OMP_NUM_TRHREADS=48 or
> OMP_NUM_TRHREADS=48? From you email it is not clear and, if you
> oversubscribe physical cores with threads or processes then performance is
> not going to be great.
>
> Also, you must manage bindings properly otherwise MPI processed bind to
> GPU on another socket need top cross the awful CPU-to-CPU link. Have a look
> at '--map-by' option in mpirun. For 4 GPU, using 4 MPI processes and 12
> OpenMP threads, your mpirun will look like this:
>
> export OMP_NUM_THRTEADS=12
> mpirun -np 4 --map-by ppr:4:node:PE=12 ./pw.x
>
> If you are running on a HPC system managed by someone else, try reach out
> the User Support and get guidance on correct binding and environment. What
> you are observing is very likely not related to QE-GPU but how you are
> running your calculations.
>
> HTH
>
> --
> Filippo SPIGA ~ http://fspiga.github.io ~ skype: filippo.spiga
>
>
> On Wed, 2 Mar 2022 at 08:40, Manish Kumar <
> manish.kumar at acads.iiserpune.ac.in> wrote:
>
>> Dear all,
>>
>> I am using QE-GPU compiled on a 48-core Intel(R) Xeon(R) Platinum 8268
>> CPU @ 2.90GHz and four NVIDIA V100 GPU cards. To use all the CPUs, I am
>> using the OMP_NUM_THREADS variable in the slurm script. The jobs are run
>> with "mpirun -np [nGPU] pw.x", where nGPU refers to the number of GPUs
>> used. Our system size (130 electrons and 64 k-points, the input file is
>> given below) is comparable to some systems in J. Chem. Phys. 152, 154105
>> (2020); https://doi.org/10.1063/5.0005082.
>>
>> I have two issues/questions with QE-GPU:
>> 1. The largest discrepancy in the atomic force between CPU and GPU is
>> 1.34x10^-4 Ry/Bohr. What is the acceptable value for the discrepancy?
>> 2. I am experiencing a significant increase in CPU time when I use
>> multiple OMP threads for SCF calculations, as you can see below. Could you
>> please suggest any solution to this and let me know if I am doing anything
>> incorrectly? Any help would be much appreciated.
>> The details are as follows:
>>
>> nGPU=1
>> --------------------------------
>> # of Threads CPU Time (s)
>> WALL Time(s)
>> 01 254.23
>> 384.27
>> 02 295.45
>> 466.33
>> 03 328.89
>> 538.62
>> 04 348.81
>> 602.85
>> 08 501.31
>> 943.32
>> 12 698.45
>> 1226.86
>> 16 836.71
>> 1505.39
>> 20 905.77
>> 1645.66
>> 24 1094.81
>> 1973.97
>> 28 1208.93
>> 2278.81
>> 32 1403.27
>> 2570.51
>> 36 1688.97
>> 3068.91
>> 40 1820.06
>> 3306.49
>> 44 1905.88
>> 3603.96
>> 48 2163.18
>> 4088.75
>> --------------------------------
>>
>> nGPU=2
>> --------------------------------
>> # of Threads CPU Time (s)
>> WALL Time(s)
>> 01 226.69
>> 329.51
>> 02 271.29
>> 336.65
>> 03 312.36
>> 335.24
>> 04 341.50
>> 333.20
>> 06 400.42
>> 328.66
>> 12 632.82
>> 332.90
>> 24 992.02
>> 335.28
>> 48 1877.65
>> 438.40
>> --------------------------------
>>
>> nGPU=4
>> --------------------------------
>> # of Threads CPU Time (s)
>> WALL Time(s)
>> 01 237.48
>> 373.21
>> 02 268.85
>> 382.92
>> 03 311.39
>> 391.29
>> 04 341.14
>> 391.71
>> 06 422.42
>> 391.13
>> 12 632.94
>> 396.75
>> 24 961.57
>> 474.70
>> 48 2509.10
>> 894.79
>> --------------------------------
>>
>> The input file is:
>> --------------------------------------------
>> &control
>> calculation = 'scf',
>> prefix = "cofe2o4"
>> outdir = "./t"
>> pseudo_dir = "./"
>> tstress=.true.
>> tprnfor=.true.
>> /
>> &system
>> ibrav = 2,
>> nat = 14,
>> ntyp = 4,
>> celldm(1) = 15.9647d0
>> ecutwfc = 45
>> ecutrho = 450
>> nspin = 2
>> starting_magnetization(1)= 1.0,
>> starting_magnetization(3)=1.0,
>> starting_magnetization(2)=-1.0,
>> occupations = 'smearing',
>> degauss = 0.005,
>> smearing = 'mv'
>> lda_plus_u = .true.,
>> lda_plus_u_kind = 0,
>> U_projection_type = 'atomic',
>> Hubbard_U(1) = 3.5D0
>> Hubbard_U(2) = 3.5D0
>> Hubbard_U(3) = 3.0D0
>> /
>> &electrons
>> mixing_mode = 'local-TF'
>> mixing_beta = 0.2
>> conv_thr = 1.D-7
>> electron_maxstep = 250
>> diagonalization ='david'
>> /
>> &IONS
>> /
>> ATOMIC_SPECIES
>> Fe1 55.8450000000 Fe.pbe-sp-van_mit.UPF
>> Fe2 55.8450000000 Fe.pbe-sp-van_mit.UPF
>> Co 58.9332000000 Co.pbe-nd-rrkjus.UPF
>> O 15.9994000000 O.pbe-rrkjus.UPF
>> ATOMIC_POSITIONS crystal
>> Fe1 0.0000000000 0.5000000000 0.5000000000
>> Fe1 0.5000000000 0.0000000000 0.5000000000
>> Co 0.5000000000 0.5000000000 0.0000000000
>> Co 0.5000000000 0.5000000000 0.5000000000
>> Fe2 0.1206093444 0.1206093444 0.1293906556
>> Fe2 0.8793906556 0.8793906556 0.8706093444
>> O 0.2489473315 0.2489473315 0.2660301248
>> O 0.2489473315 0.2489473315 0.7360752123
>> O -0.2447080455 0.2661185400 0.7392947527
>> O 0.2447080455 0.7338814600 0.2607052473
>> O 0.2661185400 0.7552919545 -0.2607052473
>> O 0.7338814600 0.2447080455 0.2607052473
>> O 0.7510526685 -0.2489473315 0.2639247877
>> O 0.7510526685 0.7510526685 0.7339698752
>> K_POINTS (automatic)
>> 7 7 7 0 0 0
>> -----------------------------------------------------------
>>
>> Best regards
>> Manish Kumar
>> IISER Pune, India
>> ᐧ
>> _______________________________________________
>> Quantum ESPRESSO is supported by MaX (www.max-centre.eu)
>> users mailing list users at lists.quantum-espresso.org
>> https://lists.quantum-espresso.org/mailman/listinfo/users
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20220304/668281e6/attachment.html>
More information about the users
mailing list