[QE-users] QE-GPU: Discrepancy in forces and problem in using OMP threading
Filippo Spiga
spiga.filippo at gmail.com
Tue Mar 15 22:19:00 CET 2022
Indipendently by the presence of a GPU, it is good practice NOT
oversubscribe physical cores.
So, made up example, if your socket has 128 cores and ypou want to use 16
MPI then The number of OpenMP thread is 8 (128/16). If you specify more,
you oversubscribe and as result performance may suck. It is also good
practice have a MPI:GPU ration of 1:1 or maybe 2:1. But start with 1:1.
Regarding the discrepancy in the atomic force I let the developers comment.
If you really believe it is a bug, open a bug report on the GitLab
https://gitlab.com/QEF/q-e/-/issues and provide everything needed to
reproduce the error.
HTH
--
Filippo SPIGA ~ http://fspiga.github.io ~ skype: filippo.spiga
On Mon, 14 Mar 2022 at 17:51, Manish Kumar <
manish.kumar at acads.iiserpune.ac.in> wrote:
> Dear Filippo,
>
> Thank you very much for your reply.
>
> The "# of threads" is the value of OMP_NUM_TRHREADS. I used nGPU=4
> and OMP_NUM_TRHREADS=48. I think the combination is not appropriate.
> The OMP_NUM_TRHREADS value should not be higher than 12. Am I correct?
>
> On one node, I am able to run the calculation. For a bigger system (388
> atoms, 3604 electrons) I used multiple nodes (2 to 4 nodes each with 4
> GPUs). The calculation got killed during the force calculation with the
> following error messages:
>
> %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
> Error in routine addusforce_gpu (1):
> cannot allocate buffers
> %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
>
> The slurm script (for 2 nodes) for the about calculation is the following:
> #-------------------------------------------------
> #SBATCH --nodes=2
> #SBATCH --gres=gpu:4
> #SBATCH --ntasks=8
> #SBATCH --ntasks-per-node=4
> #SBATCH --cpus-per-task=12
>
> export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
>
> mpirun -np 8 pw.x -inp input.in
> or
> mpirun -np 8 --map-by ppr:4:node:PE=12 pw.x -inp input.in
> #---------------------------------------------
>
> I cannot solve or understand the root cause of this error. Do you have any
> suggestions to resolve it?
> Also, I would appreciate your comments on the discrepancy between CPU and
> GPU, which I mentioned in my previous email.
>
> Thank you in advance!
>
> Best regards
> Manish Kumar
> IISER Pune, India
>
> ᐧ
>
> On Fri, Mar 4, 2022 at 3:05 PM Filippo Spiga <spiga.filippo at gmail.com>
> wrote:
>
>> Ops, typo while typing from the phone...
>>
>> "are you using OMP_NUM_THREADS=48 or OMP_NUM_THREADS=12?"
>>
>> (everything else is correct)
>>
>> --
>> Filippo SPIGA ~ http://fspiga.github.io ~ skype: filippo.spiga
>>
>>
>> On Fri, 4 Mar 2022 at 09:33, Filippo Spiga <spiga.filippo at gmail.com>
>> wrote:
>>
>>> Dear Manish,
>>>
>>> when you use nGPU=4, the "# of Threads" column specify the aggregate
>>> number of threads? Meaning, are you using OMP_NUM_TRHREADS=48 or
>>> OMP_NUM_TRHREADS=48? From you email it is not clear and, if you
>>> oversubscribe physical cores with threads or processes then performance is
>>> not going to be great.
>>>
>>> Also, you must manage bindings properly otherwise MPI processed bind to
>>> GPU on another socket need top cross the awful CPU-to-CPU link. Have a look
>>> at '--map-by' option in mpirun. For 4 GPU, using 4 MPI processes and 12
>>> OpenMP threads, your mpirun will look like this:
>>>
>>> export OMP_NUM_THRTEADS=12
>>> mpirun -np 4 --map-by ppr:4:node:PE=12 ./pw.x
>>>
>>> If you are running on a HPC system managed by someone else, try reach
>>> out the User Support and get guidance on correct binding and environment.
>>> What you are observing is very likely not related to QE-GPU but how you are
>>> running your calculations.
>>>
>>> HTH
>>>
>>> --
>>> Filippo SPIGA ~ http://fspiga.github.io ~ skype: filippo.spiga
>>>
>>>
>>> On Wed, 2 Mar 2022 at 08:40, Manish Kumar <
>>> manish.kumar at acads.iiserpune.ac.in> wrote:
>>>
>>>> Dear all,
>>>>
>>>> I am using QE-GPU compiled on a 48-core Intel(R) Xeon(R) Platinum 8268
>>>> CPU @ 2.90GHz and four NVIDIA V100 GPU cards. To use all the CPUs, I am
>>>> using the OMP_NUM_THREADS variable in the slurm script. The jobs are run
>>>> with "mpirun -np [nGPU] pw.x", where nGPU refers to the number of GPUs
>>>> used. Our system size (130 electrons and 64 k-points, the input file is
>>>> given below) is comparable to some systems in J. Chem. Phys. 152, 154105
>>>> (2020); https://doi.org/10.1063/5.0005082.
>>>>
>>>> I have two issues/questions with QE-GPU:
>>>> 1. The largest discrepancy in the atomic force between CPU and GPU is
>>>> 1.34x10^-4 Ry/Bohr. What is the acceptable value for the discrepancy?
>>>> 2. I am experiencing a significant increase in CPU time when I use
>>>> multiple OMP threads for SCF calculations, as you can see below. Could you
>>>> please suggest any solution to this and let me know if I am doing anything
>>>> incorrectly? Any help would be much appreciated.
>>>> The details are as follows:
>>>>
>>>> nGPU=1
>>>> --------------------------------
>>>> # of Threads CPU Time (s)
>>>> WALL Time(s)
>>>> 01 254.23
>>>> 384.27
>>>> 02 295.45
>>>> 466.33
>>>> 03 328.89
>>>> 538.62
>>>> 04 348.81
>>>> 602.85
>>>> 08 501.31
>>>> 943.32
>>>> 12 698.45
>>>> 1226.86
>>>> 16 836.71
>>>> 1505.39
>>>> 20 905.77
>>>> 1645.66
>>>> 24 1094.81
>>>> 1973.97
>>>> 28 1208.93
>>>> 2278.81
>>>> 32 1403.27
>>>> 2570.51
>>>> 36 1688.97
>>>> 3068.91
>>>> 40 1820.06
>>>> 3306.49
>>>> 44 1905.88
>>>> 3603.96
>>>> 48 2163.18
>>>> 4088.75
>>>> --------------------------------
>>>>
>>>> nGPU=2
>>>> --------------------------------
>>>> # of Threads CPU Time (s)
>>>> WALL Time(s)
>>>> 01 226.69
>>>> 329.51
>>>> 02 271.29
>>>> 336.65
>>>> 03 312.36
>>>> 335.24
>>>> 04 341.50
>>>> 333.20
>>>> 06 400.42
>>>> 328.66
>>>> 12 632.82
>>>> 332.90
>>>> 24 992.02
>>>> 335.28
>>>> 48 1877.65
>>>> 438.40
>>>> --------------------------------
>>>>
>>>> nGPU=4
>>>> --------------------------------
>>>> # of Threads CPU Time (s)
>>>> WALL Time(s)
>>>> 01 237.48
>>>> 373.21
>>>> 02 268.85
>>>> 382.92
>>>> 03 311.39
>>>> 391.29
>>>> 04 341.14
>>>> 391.71
>>>> 06 422.42
>>>> 391.13
>>>> 12 632.94
>>>> 396.75
>>>> 24 961.57
>>>> 474.70
>>>> 48 2509.10
>>>> 894.79
>>>> --------------------------------
>>>>
>>>> The input file is:
>>>> --------------------------------------------
>>>> &control
>>>> calculation = 'scf',
>>>> prefix = "cofe2o4"
>>>> outdir = "./t"
>>>> pseudo_dir = "./"
>>>> tstress=.true.
>>>> tprnfor=.true.
>>>> /
>>>> &system
>>>> ibrav = 2,
>>>> nat = 14,
>>>> ntyp = 4,
>>>> celldm(1) = 15.9647d0
>>>> ecutwfc = 45
>>>> ecutrho = 450
>>>> nspin = 2
>>>> starting_magnetization(1)= 1.0,
>>>> starting_magnetization(3)=1.0,
>>>> starting_magnetization(2)=-1.0,
>>>> occupations = 'smearing',
>>>> degauss = 0.005,
>>>> smearing = 'mv'
>>>> lda_plus_u = .true.,
>>>> lda_plus_u_kind = 0,
>>>> U_projection_type = 'atomic',
>>>> Hubbard_U(1) = 3.5D0
>>>> Hubbard_U(2) = 3.5D0
>>>> Hubbard_U(3) = 3.0D0
>>>> /
>>>> &electrons
>>>> mixing_mode = 'local-TF'
>>>> mixing_beta = 0.2
>>>> conv_thr = 1.D-7
>>>> electron_maxstep = 250
>>>> diagonalization ='david'
>>>> /
>>>> &IONS
>>>> /
>>>> ATOMIC_SPECIES
>>>> Fe1 55.8450000000 Fe.pbe-sp-van_mit.UPF
>>>> Fe2 55.8450000000 Fe.pbe-sp-van_mit.UPF
>>>> Co 58.9332000000 Co.pbe-nd-rrkjus.UPF
>>>> O 15.9994000000 O.pbe-rrkjus.UPF
>>>> ATOMIC_POSITIONS crystal
>>>> Fe1 0.0000000000 0.5000000000 0.5000000000
>>>> Fe1 0.5000000000 0.0000000000 0.5000000000
>>>> Co 0.5000000000 0.5000000000 0.0000000000
>>>> Co 0.5000000000 0.5000000000 0.5000000000
>>>> Fe2 0.1206093444 0.1206093444 0.1293906556
>>>> Fe2 0.8793906556 0.8793906556 0.8706093444
>>>> O 0.2489473315 0.2489473315 0.2660301248
>>>> O 0.2489473315 0.2489473315 0.7360752123
>>>> O -0.2447080455 0.2661185400 0.7392947527
>>>> O 0.2447080455 0.7338814600 0.2607052473
>>>> O 0.2661185400 0.7552919545 -0.2607052473
>>>> O 0.7338814600 0.2447080455 0.2607052473
>>>> O 0.7510526685 -0.2489473315 0.2639247877
>>>> O 0.7510526685 0.7510526685 0.7339698752
>>>> K_POINTS (automatic)
>>>> 7 7 7 0 0 0
>>>> -----------------------------------------------------------
>>>>
>>>> Best regards
>>>> Manish Kumar
>>>> IISER Pune, India
>>>> ᐧ
>>>> _______________________________________________
>>>> Quantum ESPRESSO is supported by MaX (www.max-centre.eu)
>>>> users mailing list users at lists.quantum-espresso.org
>>>> https://lists.quantum-espresso.org/mailman/listinfo/users
>>>
>>> _______________________________________________
>>
>> The Quantum ESPRESSO community stands by the Ukrainian people and
>> expresses its concerns for the devastating effects that the Russian
>> military offensive has on their country and on the free and peaceful
>> scientific, cultural, and economic cooperation amongst peoples
>> _______________________________________________
>> Quantum ESPRESSO is supported by MaX (www.max-centre.eu)
>> users mailing list users at lists.quantum-espresso.org
>> https://lists.quantum-espresso.org/mailman/listinfo/users
>
> _______________________________________________
> The Quantum ESPRESSO community stands by the Ukrainian
> people and expresses its concerns about the devastating
> effects that the Russian military offensive has on their
> country and on the free and peaceful scientific, cultural,
> and economic cooperation amongst peoples
> _______________________________________________
> Quantum ESPRESSO is supported by MaX (www.max-centre.eu)
> users mailing list users at lists.quantum-espresso.org
> https://lists.quantum-espresso.org/mailman/listinfo/users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20220315/0961623f/attachment.html>
More information about the users
mailing list