[QE-users] QE-GPU: Discrepancy in forces and problem in using OMP threading

Mon Mar 14 18:51:19 CET 2022

  Dear Filippo,

Thank you very much for your reply.

The "# of threads" is the value of OMP_NUM_TRHREADS. I used nGPU=4
and OMP_NUM_TRHREADS=48. I think the combination is not appropriate.
The OMP_NUM_TRHREADS value should not be higher than 12. Am I correct?

On one node, I am able to run the calculation. For a bigger system (388
atoms, 3604 electrons) I used multiple nodes (2 to 4 nodes each with 4
GPUs). The calculation got killed during the force calculation with the
following error messages:

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
     Error in routine addusforce_gpu (1):
     cannot allocate buffers
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

The slurm script (for 2 nodes) for the about calculation is the following:
#-------------------------------------------------
#SBATCH --nodes=2
#SBATCH --gres=gpu:4
#SBATCH --ntasks=8
#SBATCH --ntasks-per-node=4
#SBATCH --cpus-per-task=12

export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK

mpirun -np 8 pw.x -inp input.in
or
mpirun -np 8 --map-by ppr:4:node:PE=12 pw.x -inp input.in
#---------------------------------------------

I cannot solve or understand the root cause of this error. Do you have any
suggestions to resolve it?
Also, I would appreciate your comments on the discrepancy between CPU and
GPU, which I mentioned in my previous email.

Thank you in advance!

Best regards
Manish Kumar
IISER Pune, India

ᐧ

On Fri, Mar 4, 2022 at 3:05 PM Filippo Spiga <spiga.filippo at gmail.com>
wrote:

> Ops, typo while typing from the phone...
>
> "are you using OMP_NUM_THREADS=48 or OMP_NUM_THREADS=12?"
>
> (everything else is correct)
>
> --
> Filippo SPIGA ~ http://fspiga.github.io ~ skype: filippo.spiga
>
>
> On Fri, 4 Mar 2022 at 09:33, Filippo Spiga <spiga.filippo at gmail.com>
> wrote:
>
>> Dear Manish,
>>
>> when you use nGPU=4, the "# of Threads" column specify the aggregate
>> number of threads? Meaning, are you using OMP_NUM_TRHREADS=48 or
>> OMP_NUM_TRHREADS=48? From you email it is not clear and, if you
>> oversubscribe physical cores with threads or processes then performance is
>> not going to be great.
>>
>> Also, you must manage bindings properly otherwise MPI processed bind to
>> GPU on another socket need top cross the awful CPU-to-CPU link. Have a look
>> at '--map-by' option in mpirun. For 4 GPU, using 4 MPI processes and 12
>> OpenMP threads, your mpirun will look like this:
>>
>> export OMP_NUM_THRTEADS=12
>> mpirun -np 4 --map-by ppr:4:node:PE=12 ./pw.x
>>
>> If you are running on a HPC system managed by someone else, try reach out
>> the User Support and get guidance on correct binding and environment. What
>> you are observing is very likely not related to QE-GPU but how you are
>> running your calculations.
>>
>> HTH
>>
>> --
>> Filippo SPIGA ~ http://fspiga.github.io ~ skype: filippo.spiga
>>
>>
>> On Wed, 2 Mar 2022 at 08:40, Manish Kumar <
>> manish.kumar at acads.iiserpune.ac.in> wrote:
>>
>>> Dear all,
>>>
>>> I am using QE-GPU compiled on a 48-core Intel(R) Xeon(R) Platinum 8268
>>> CPU @ 2.90GHz and four NVIDIA V100 GPU cards. To use all the CPUs, I am
>>> using the OMP_NUM_THREADS variable in the slurm script. The jobs are run
>>> with "mpirun -np [nGPU] pw.x", where nGPU refers to the number of GPUs
>>> used. Our system size (130 electrons and 64 k-points, the input file is
>>> given below) is comparable to some systems in J. Chem. Phys. 152, 154105
>>> (2020); https://doi.org/10.1063/5.0005082.
>>>
>>> I have two issues/questions with QE-GPU:
>>> 1. The largest discrepancy in the atomic force between CPU and GPU is
>>> 1.34x10^-4 Ry/Bohr. What is the acceptable value for the discrepancy?
>>> 2. I am experiencing a significant increase in CPU time when I use
>>> multiple OMP threads for SCF calculations, as you can see below. Could you
>>> please suggest any solution to this and let me know if I am doing anything
>>> incorrectly? Any help would be much appreciated.
>>> The details are as follows:
>>>
>>> nGPU=1
>>> --------------------------------
>>> # of Threads                      CPU Time (s)
>>> WALL Time(s)
>>> 01                                           254.23
>>>                384.27
>>> 02                                           295.45
>>>                466.33
>>> 03                                           328.89
>>>                538.62
>>> 04                                           348.81
>>>                602.85
>>> 08                                           501.31
>>>                943.32
>>> 12                                           698.45
>>>                1226.86
>>> 16                                           836.71
>>>                1505.39
>>> 20                                           905.77
>>>                1645.66
>>> 24                                           1094.81
>>>                1973.97
>>> 28                                           1208.93
>>>                2278.81
>>> 32                                           1403.27
>>>                2570.51
>>> 36                                           1688.97
>>>                3068.91
>>> 40                                           1820.06
>>>                3306.49
>>> 44                                           1905.88
>>>                3603.96
>>> 48                                           2163.18
>>>                4088.75
>>> --------------------------------
>>>
>>> nGPU=2
>>> --------------------------------
>>> # of Threads                      CPU Time (s)
>>> WALL Time(s)
>>> 01                                           226.69
>>>                329.51
>>> 02                                           271.29
>>>                336.65
>>> 03                                           312.36
>>>                335.24
>>> 04                                           341.50
>>>                333.20
>>> 06                                           400.42
>>>                328.66
>>> 12                                           632.82
>>>                332.90
>>> 24                                           992.02
>>>                335.28
>>> 48                                           1877.65
>>>               438.40
>>> --------------------------------
>>>
>>> nGPU=4
>>> --------------------------------
>>> # of Threads                      CPU Time (s)
>>> WALL Time(s)
>>> 01                                           237.48
>>>                373.21
>>> 02                                           268.85
>>>                382.92
>>> 03                                           311.39
>>>                391.29
>>> 04                                           341.14
>>>                391.71
>>> 06                                           422.42
>>>                391.13
>>> 12                                           632.94
>>>                396.75
>>> 24                                           961.57
>>>                474.70
>>> 48                                           2509.10
>>>               894.79
>>> --------------------------------
>>>
>>> The input file is:
>>> --------------------------------------------
>>> &control
>>>     calculation = 'scf',
>>>     prefix = "cofe2o4"
>>>     outdir = "./t"
>>>     pseudo_dir = "./"
>>>     tstress=.true.
>>>     tprnfor=.true.
>>> /
>>> &system
>>>     ibrav = 2,
>>>      nat = 14,
>>>      ntyp = 4,
>>>     celldm(1) = 15.9647d0
>>>     ecutwfc = 45
>>>     ecutrho = 450
>>>     nspin = 2
>>>     starting_magnetization(1)= 1.0,
>>>     starting_magnetization(3)=1.0,
>>>     starting_magnetization(2)=-1.0,
>>>     occupations = 'smearing',
>>>     degauss = 0.005,
>>>     smearing = 'mv'
>>>     lda_plus_u = .true.,
>>>     lda_plus_u_kind = 0,
>>>     U_projection_type = 'atomic',
>>>     Hubbard_U(1) = 3.5D0
>>>     Hubbard_U(2) = 3.5D0
>>>     Hubbard_U(3) = 3.0D0
>>> /
>>> &electrons
>>>     mixing_mode = 'local-TF'
>>>     mixing_beta = 0.2
>>>     conv_thr = 1.D-7
>>>     electron_maxstep = 250
>>>     diagonalization ='david'
>>> /
>>> &IONS
>>> /
>>> ATOMIC_SPECIES
>>>    Fe1   55.8450000000  Fe.pbe-sp-van_mit.UPF
>>>    Fe2   55.8450000000  Fe.pbe-sp-van_mit.UPF
>>>    Co   58.9332000000  Co.pbe-nd-rrkjus.UPF
>>>     O   15.9994000000  O.pbe-rrkjus.UPF
>>> ATOMIC_POSITIONS crystal
>>> Fe1           0.0000000000        0.5000000000        0.5000000000
>>> Fe1           0.5000000000        0.0000000000        0.5000000000
>>> Co            0.5000000000        0.5000000000        0.0000000000
>>> Co            0.5000000000        0.5000000000        0.5000000000
>>> Fe2           0.1206093444        0.1206093444        0.1293906556
>>> Fe2           0.8793906556        0.8793906556        0.8706093444
>>> O             0.2489473315        0.2489473315        0.2660301248
>>> O             0.2489473315        0.2489473315        0.7360752123
>>> O            -0.2447080455        0.2661185400        0.7392947527
>>> O             0.2447080455        0.7338814600        0.2607052473
>>> O             0.2661185400        0.7552919545       -0.2607052473
>>> O             0.7338814600        0.2447080455        0.2607052473
>>> O             0.7510526685       -0.2489473315        0.2639247877
>>> O             0.7510526685        0.7510526685        0.7339698752
>>> K_POINTS (automatic)
>>> 7 7 7 0 0 0
>>> -----------------------------------------------------------
>>>
>>> Best regards
>>> Manish Kumar
>>> IISER Pune, India
>>> ᐧ
>>> _______________________________________________
>>> Quantum ESPRESSO is supported by MaX (www.max-centre.eu)
>>> users mailing list users at lists.quantum-espresso.org
>>> https://lists.quantum-espresso.org/mailman/listinfo/users
>>
>> _______________________________________________
>
> The Quantum ESPRESSO community stands by the Ukrainian people and
> expresses its concerns for the devastating effects that the Russian
> military offensive has on their country and on the free and peaceful
> scientific, cultural, and economic cooperation amongst peoples
> _______________________________________________
> Quantum ESPRESSO is supported by MaX (www.max-centre.eu)
> users mailing list users at lists.quantum-espresso.org
> https://lists.quantum-espresso.org/mailman/listinfo/users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20220314/b199376c/attachment.html>