[QE-users] QE-GPU: Discrepancy in forces and problem in using OMP threading
Manish Kumar
manish.kumar at acads.iiserpune.ac.in
Mon Mar 14 18:51:19 CET 2022
Dear Filippo,
Thank you very much for your reply.
The "# of threads" is the value of OMP_NUM_TRHREADS. I used nGPU=4
and OMP_NUM_TRHREADS=48. I think the combination is not appropriate.
The OMP_NUM_TRHREADS value should not be higher than 12. Am I correct?
On one node, I am able to run the calculation. For a bigger system (388
atoms, 3604 electrons) I used multiple nodes (2 to 4 nodes each with 4
GPUs). The calculation got killed during the force calculation with the
following error messages:
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Error in routine addusforce_gpu (1):
cannot allocate buffers
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
The slurm script (for 2 nodes) for the about calculation is the following:
#-------------------------------------------------
#SBATCH --nodes=2
#SBATCH --gres=gpu:4
#SBATCH --ntasks=8
#SBATCH --ntasks-per-node=4
#SBATCH --cpus-per-task=12
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
mpirun -np 8 pw.x -inp input.in
or
mpirun -np 8 --map-by ppr:4:node:PE=12 pw.x -inp input.in
#---------------------------------------------
I cannot solve or understand the root cause of this error. Do you have any
suggestions to resolve it?
Also, I would appreciate your comments on the discrepancy between CPU and
GPU, which I mentioned in my previous email.
Thank you in advance!
Best regards
Manish Kumar
IISER Pune, India
ᐧ
On Fri, Mar 4, 2022 at 3:05 PM Filippo Spiga <spiga.filippo at gmail.com>
wrote:
> Ops, typo while typing from the phone...
>
> "are you using OMP_NUM_THREADS=48 or OMP_NUM_THREADS=12?"
>
> (everything else is correct)
>
> --
> Filippo SPIGA ~ http://fspiga.github.io ~ skype: filippo.spiga
>
>
> On Fri, 4 Mar 2022 at 09:33, Filippo Spiga <spiga.filippo at gmail.com>
> wrote:
>
>> Dear Manish,
>>
>> when you use nGPU=4, the "# of Threads" column specify the aggregate
>> number of threads? Meaning, are you using OMP_NUM_TRHREADS=48 or
>> OMP_NUM_TRHREADS=48? From you email it is not clear and, if you
>> oversubscribe physical cores with threads or processes then performance is
>> not going to be great.
>>
>> Also, you must manage bindings properly otherwise MPI processed bind to
>> GPU on another socket need top cross the awful CPU-to-CPU link. Have a look
>> at '--map-by' option in mpirun. For 4 GPU, using 4 MPI processes and 12
>> OpenMP threads, your mpirun will look like this:
>>
>> export OMP_NUM_THRTEADS=12
>> mpirun -np 4 --map-by ppr:4:node:PE=12 ./pw.x
>>
>> If you are running on a HPC system managed by someone else, try reach out
>> the User Support and get guidance on correct binding and environment. What
>> you are observing is very likely not related to QE-GPU but how you are
>> running your calculations.
>>
>> HTH
>>
>> --
>> Filippo SPIGA ~ http://fspiga.github.io ~ skype: filippo.spiga
>>
>>
>> On Wed, 2 Mar 2022 at 08:40, Manish Kumar <
>> manish.kumar at acads.iiserpune.ac.in> wrote:
>>
>>> Dear all,
>>>
>>> I am using QE-GPU compiled on a 48-core Intel(R) Xeon(R) Platinum 8268
>>> CPU @ 2.90GHz and four NVIDIA V100 GPU cards. To use all the CPUs, I am
>>> using the OMP_NUM_THREADS variable in the slurm script. The jobs are run
>>> with "mpirun -np [nGPU] pw.x", where nGPU refers to the number of GPUs
>>> used. Our system size (130 electrons and 64 k-points, the input file is
>>> given below) is comparable to some systems in J. Chem. Phys. 152, 154105
>>> (2020); https://doi.org/10.1063/5.0005082.
>>>
>>> I have two issues/questions with QE-GPU:
>>> 1. The largest discrepancy in the atomic force between CPU and GPU is
>>> 1.34x10^-4 Ry/Bohr. What is the acceptable value for the discrepancy?
>>> 2. I am experiencing a significant increase in CPU time when I use
>>> multiple OMP threads for SCF calculations, as you can see below. Could you
>>> please suggest any solution to this and let me know if I am doing anything
>>> incorrectly? Any help would be much appreciated.
>>> The details are as follows:
>>>
>>> nGPU=1
>>> --------------------------------
>>> # of Threads CPU Time (s)
>>> WALL Time(s)
>>> 01 254.23
>>> 384.27
>>> 02 295.45
>>> 466.33
>>> 03 328.89
>>> 538.62
>>> 04 348.81
>>> 602.85
>>> 08 501.31
>>> 943.32
>>> 12 698.45
>>> 1226.86
>>> 16 836.71
>>> 1505.39
>>> 20 905.77
>>> 1645.66
>>> 24 1094.81
>>> 1973.97
>>> 28 1208.93
>>> 2278.81
>>> 32 1403.27
>>> 2570.51
>>> 36 1688.97
>>> 3068.91
>>> 40 1820.06
>>> 3306.49
>>> 44 1905.88
>>> 3603.96
>>> 48 2163.18
>>> 4088.75
>>> --------------------------------
>>>
>>> nGPU=2
>>> --------------------------------
>>> # of Threads CPU Time (s)
>>> WALL Time(s)
>>> 01 226.69
>>> 329.51
>>> 02 271.29
>>> 336.65
>>> 03 312.36
>>> 335.24
>>> 04 341.50
>>> 333.20
>>> 06 400.42
>>> 328.66
>>> 12 632.82
>>> 332.90
>>> 24 992.02
>>> 335.28
>>> 48 1877.65
>>> 438.40
>>> --------------------------------
>>>
>>> nGPU=4
>>> --------------------------------
>>> # of Threads CPU Time (s)
>>> WALL Time(s)
>>> 01 237.48
>>> 373.21
>>> 02 268.85
>>> 382.92
>>> 03 311.39
>>> 391.29
>>> 04 341.14
>>> 391.71
>>> 06 422.42
>>> 391.13
>>> 12 632.94
>>> 396.75
>>> 24 961.57
>>> 474.70
>>> 48 2509.10
>>> 894.79
>>> --------------------------------
>>>
>>> The input file is:
>>> --------------------------------------------
>>> &control
>>> calculation = 'scf',
>>> prefix = "cofe2o4"
>>> outdir = "./t"
>>> pseudo_dir = "./"
>>> tstress=.true.
>>> tprnfor=.true.
>>> /
>>> &system
>>> ibrav = 2,
>>> nat = 14,
>>> ntyp = 4,
>>> celldm(1) = 15.9647d0
>>> ecutwfc = 45
>>> ecutrho = 450
>>> nspin = 2
>>> starting_magnetization(1)= 1.0,
>>> starting_magnetization(3)=1.0,
>>> starting_magnetization(2)=-1.0,
>>> occupations = 'smearing',
>>> degauss = 0.005,
>>> smearing = 'mv'
>>> lda_plus_u = .true.,
>>> lda_plus_u_kind = 0,
>>> U_projection_type = 'atomic',
>>> Hubbard_U(1) = 3.5D0
>>> Hubbard_U(2) = 3.5D0
>>> Hubbard_U(3) = 3.0D0
>>> /
>>> &electrons
>>> mixing_mode = 'local-TF'
>>> mixing_beta = 0.2
>>> conv_thr = 1.D-7
>>> electron_maxstep = 250
>>> diagonalization ='david'
>>> /
>>> &IONS
>>> /
>>> ATOMIC_SPECIES
>>> Fe1 55.8450000000 Fe.pbe-sp-van_mit.UPF
>>> Fe2 55.8450000000 Fe.pbe-sp-van_mit.UPF
>>> Co 58.9332000000 Co.pbe-nd-rrkjus.UPF
>>> O 15.9994000000 O.pbe-rrkjus.UPF
>>> ATOMIC_POSITIONS crystal
>>> Fe1 0.0000000000 0.5000000000 0.5000000000
>>> Fe1 0.5000000000 0.0000000000 0.5000000000
>>> Co 0.5000000000 0.5000000000 0.0000000000
>>> Co 0.5000000000 0.5000000000 0.5000000000
>>> Fe2 0.1206093444 0.1206093444 0.1293906556
>>> Fe2 0.8793906556 0.8793906556 0.8706093444
>>> O 0.2489473315 0.2489473315 0.2660301248
>>> O 0.2489473315 0.2489473315 0.7360752123
>>> O -0.2447080455 0.2661185400 0.7392947527
>>> O 0.2447080455 0.7338814600 0.2607052473
>>> O 0.2661185400 0.7552919545 -0.2607052473
>>> O 0.7338814600 0.2447080455 0.2607052473
>>> O 0.7510526685 -0.2489473315 0.2639247877
>>> O 0.7510526685 0.7510526685 0.7339698752
>>> K_POINTS (automatic)
>>> 7 7 7 0 0 0
>>> -----------------------------------------------------------
>>>
>>> Best regards
>>> Manish Kumar
>>> IISER Pune, India
>>> ᐧ
>>> _______________________________________________
>>> Quantum ESPRESSO is supported by MaX (www.max-centre.eu)
>>> users mailing list users at lists.quantum-espresso.org
>>> https://lists.quantum-espresso.org/mailman/listinfo/users
>>
>> _______________________________________________
>
> The Quantum ESPRESSO community stands by the Ukrainian people and
> expresses its concerns for the devastating effects that the Russian
> military offensive has on their country and on the free and peaceful
> scientific, cultural, and economic cooperation amongst peoples
> _______________________________________________
> Quantum ESPRESSO is supported by MaX (www.max-centre.eu)
> users mailing list users at lists.quantum-espresso.org
> https://lists.quantum-espresso.org/mailman/listinfo/users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20220314/b199376c/attachment.html>
More information about the users
mailing list