<div dir="ltr"><div><div>Dear all, </div><div><br></div><div>I am using QE-GPU compiled on a 48-core Intel(R) Xeon(R) Platinum 8268 CPU @ 2.90GHz and four NVIDIA V100 GPU cards. To use all the CPUs, I am using the OMP_NUM_THREADS variable in the slurm script. The jobs are run with "mpirun -np [nGPU] pw.x", where nGPU refers to the number of GPUs used. Our system size (130 electrons and 64 k-points, the input file is given below) is comparable to some systems in J. Chem. Phys. 152, 154105 (2020); <a href="https://doi.org/10.1063/5.0005082" target="_blank">https://doi.org/10.1063/5.0005082</a>.  </div><div><br></div><div>I have two issues/questions with QE-GPU:</div><div>1. The largest discrepancy in the atomic force between CPU and GPU is 1.34x10^-4 Ry/Bohr. What is the acceptable value for the discrepancy?  </div><div>2. I am experiencing a significant increase in CPU time when I use multiple OMP threads for SCF calculations, as you can see below. Could you please suggest any solution to this and let me know if I am doing anything incorrectly? Any help would be much appreciated.</div><div>The details are as follows:<br></div><div><br></div><div>nGPU=1<br></div><div>--------------------------------<br></div><div># of Threads                      CPU Time (s)                          WALL Time(s)<br></div><div>01                                           254.23                                    384.27<br>02                                           295.45                                    466.33<br>03                                           328.89                                    538.62<br>04                                           348.81                                    602.85<br>08                                           501.31                                    943.32<br>12                                           698.45                                    1226.86<br>16                                           836.71                                    1505.39<br>20                                           905.77                                    1645.66<br>24                                           1094.81                                   1973.97<br>28                                           1208.93                                   2278.81<br>32                                           1403.27                                   2570.51<br>36                                           1688.97                                   3068.91<br>40                                           1820.06                                   3306.49<br>44                                           1905.88                                   3603.96<br>48                                           2163.18                                   4088.75<br></div><div>--------------------------------<br></div><div><br></div><div>nGPU=2</div><div>--------------------------------<br></div><div># of Threads                      CPU Time (s)                          WALL Time(s)<br></div><div>01                                           226.69                                    329.51<br>02                                           271.29                                    336.65<br>03                                           312.36                                    335.24<br>04                                           341.50                                    333.20<br>06                                           400.42                                    328.66<br>12                                           632.82                                    332.90<br>24                                           992.02                                    335.28<br>48                                           1877.65                                  438.40<br></div><div>--------------------------------<br></div><div><br></div><div>nGPU=4</div><div>--------------------------------</div><div># of Threads                      CPU Time (s)                          WALL Time(s)<br></div><div>01                                           237.48                                    373.21<br>02                                           268.85                                    382.92<br>03                                           311.39                                    391.29<br>04                                           341.14                                    391.71<br>06                                           422.42                                    391.13<br>12                                           632.94                                    396.75<br>24                                           961.57                                    474.70<br>48                                           2509.10                                  894.79<br></div><div>--------------------------------<br></div><div><br></div><div>The input file is:</div><div>--------------------------------------------<br></div><div>&control<br>    calculation = 'scf',<br>    prefix = "cofe2o4"<br>    outdir = "./t" <br>    pseudo_dir = "./"<br>    tstress=.true.<br>    tprnfor=.true.<br>/<br>&system<br>    ibrav = 2,<br>     nat = 14,<br>     ntyp = 4,<br>    celldm(1) = 15.9647d0<br>    ecutwfc = 45<br>    ecutrho = 450<br>    nspin = 2<br>    starting_magnetization(1)= 1.0,<br>    starting_magnetization(3)=1.0,<br>    starting_magnetization(2)=-1.0,<br>    occupations = 'smearing',<br>    degauss = 0.005,<br>    smearing = 'mv'<br>    lda_plus_u = .true.,<br>    lda_plus_u_kind = 0,<br>    U_projection_type = 'atomic',<br>    Hubbard_U(1) = 3.5D0<br>    Hubbard_U(2) = 3.5D0<br>    Hubbard_U(3) = 3.0D0<br>/<br>&electrons<br>    mixing_mode = 'local-TF'<br>    mixing_beta = 0.2<br>    conv_thr = 1.D-7<br>    electron_maxstep = 250<br>    diagonalization ='david'<br>/<br>&IONS<br>/<br>ATOMIC_SPECIES<br>   Fe1   55.8450000000  Fe.pbe-sp-van_mit.UPF<br>   Fe2   55.8450000000  Fe.pbe-sp-van_mit.UPF<br>   Co   58.9332000000  Co.pbe-nd-rrkjus.UPF<br>    O   15.9994000000  O.pbe-rrkjus.UPF<br>ATOMIC_POSITIONS crystal<br>Fe1           0.0000000000        0.5000000000        0.5000000000<br>Fe1           0.5000000000        0.0000000000        0.5000000000<br>Co            0.5000000000        0.5000000000        0.0000000000<br>Co            0.5000000000        0.5000000000        0.5000000000<br>Fe2           0.1206093444        0.1206093444        0.1293906556<br>Fe2           0.8793906556        0.8793906556        0.8706093444<br>O             0.2489473315        0.2489473315        0.2660301248<br>O             0.2489473315        0.2489473315        0.7360752123<br>O            -0.2447080455        0.2661185400        0.7392947527<br>O             0.2447080455        0.7338814600        0.2607052473<br>O             0.2661185400        0.7552919545       -0.2607052473<br>O             0.7338814600        0.2447080455        0.2607052473<br>O             0.7510526685       -0.2489473315        0.2639247877<br>O             0.7510526685        0.7510526685        0.7339698752<br>K_POINTS (automatic)<br>7 7 7 0 0 0<br></div><div>-----------------------------------------------------------</div><div><br></div><div>Best regards</div><div>Manish Kumar</div><div>IISER Pune, India</div></div></div><div hspace="streak-pt-mark" style="max-height:1px"><img alt="" style="width:0px;max-height:0px;overflow:hidden" src="https://mailfoogae.appspot.com/t?sender=abWFuaXNoLmt1bWFyQGFjYWRzLmlpc2VycHVuZS5hYy5pbg%3D%3D&type=zerocontent&guid=ad58b12e-0d30-4961-b195-ef64b1ad9946"><font color="#ffffff" size="1">ᐧ</font></div>