[QE-users] [QE-GPU] computing force and stress - time cost

Mon Jan 4 18:22:55 CET 2021

Dear Paolo, Pietro, and Iurii,

Thanks for your comments and considerations.

i) The rest of the output file is missing because the job is still running.
Since the total force is 0.003, I guess it will end after 1 or 2 cycles.

ii) I used " mpirun -np 2 pw.x -input input.in > output.out" for running
the input file. Since I have two graphic cards installed on the mainboard,
I should use 2 threads to get the best performance (to the best of my
knowledge).

iii) The --enable-openmp option is applied to config. make.inc and
config.log files are attached.

iv) DFT+U with the 'ortho-atomic' Hubbard is consistent with further
calculation by means of hp.x. So, I think I have no other choice.

v) a 5x5 graphene input file is used for performance test:
mpirun -np N pw.x -input scf.in > scfN.out where N is: 1,2,3,4,8,12,16, and
24 (my system has 48 threads)
here is the result:
grep 'PWSCF        :' *
scf01.out:     PWSCF        :   4m57.17s CPU   6m14.82s WALL
scf02.out:     PWSCF        :   5m30.54s CPU   6m11.66s WALL
scf03.out:     PWSCF        :   5m38.04s CPU   6m27.56s WALL
scf04.out:     PWSCF        :   5m23.23s CPU   6m 7.65s WALL
scf08.out:     PWSCF        :   5m48.06s CPU   6m46.10s WALL
scf12.out:     PWSCF        :   6m45.36s CPU   8m11.66s WALL
scf16.out:     PWSCF        :   6m37.10s CPU   8m 4.57s WALL
scf24.out:     PWSCF        :   9m 8.57s CPU  11m44.82s WALL

input and outputs are attached (test.rar).
Although this calculation is not that big to truly evaluate the
performance, it shows that using more threads results in greater cpu and
wall time.

Any help will be greatly appreciated.

On Mon, Jan 4, 2021 at 2:37 PM Iurii TIMROV <iurii.timrov at epfl.ch> wrote:

> Dear Mohammad,
>
>
> > 2. You are using DFT+U, more precisely
> >     U_projection_type = 'ortho-atomic'
> > The GPU acceleration of this functionality is limited and some portions
> > of the algorithm will still run on the CPU. In addition, I fear that the
> > evaluation of forces with that projection method scales pretty bad with
> > the number of atoms, but I let the experts (and developers) of this new
> > functionality further comment this last point.
>
>
> Yes, indeed, in DFT+U with the 'ortho-atomic' Hubbard manifold the
> calculations of Hubbard forces and stress take much more time than with the
> 'atomic' Hubbard manifold. See Phys. Rev. B 102, 235159 (2020), in
> particular see Appendix C.
>
>
> As Pietro said, the ortho-atomic Hubbard forces and stress were ported to
> the GPU version of QE but this is still not optimal and can be improved in
> the future.
>
>
> Greetings,
>
> Iurii
>
>
> --
> Dr. Iurii TIMROV
> Postdoctoral Researcher
> STI - IMX - THEOS and NCCR - MARVEL
> Swiss Federal Institute of Technology Lausanne (EPFL)
> CH-1015 Lausanne, Switzerland
> +41 21 69 34 881
> http://people.epfl.ch/265334
> ------------------------------
> *From:* users <users-bounces at lists.quantum-espresso.org> on behalf of
> Pietro Bonfa' <pietro.bonfa at unipr.it>
> *Sent:* Monday, January 4, 2021 11:52:31 AM
> *To:* users at lists.quantum-espresso.org
> *Subject:* Re: [QE-users] [QE-GPU] computing force and stress - time cost
>
> Dear Mohammad,
>
> Paolo is right, but a couple of comments can already be made:
>
> 1. you are not using OpenMP parallelism, and I believe you have more
> than 2 cores in your system. In order to achieve a decent speedup (at
> least in the SCF) it's mandatory to enable openmp and exploit the whole
> CPU power.
>
> 2. You are using DFT+U, more precisely
>
>      U_projection_type = 'ortho-atomic'
>
> The GPU acceleration of this functionality is limited and some portions
> of the algorithm will still run on the CPU. In addition, I fear that the
> evaluation of forces with that projection method scales pretty bad with
> the number of atoms, but I let the experts (and developers) of this new
> functionality further comment this last point.
>
> Best regards and happy new year,
> Pietro
>
>
>
>
> On 1/4/21 10:07 AM, Paolo Giannozzi wrote:
> > The most important piece of information (the final time report) is not
> > contained in your 45Mb output.
> >
> > Paolo
> >
> > On Mon, Jan 4, 2021 at 6:33 AM Mohammad Moaddeli
> > <mohammad.moaddeli at gmail.com <mailto:mohammad.moaddeli at gmail.com
> <mohammad.moaddeli at gmail.com>>> wrote:
> >
> >     Dear Pietro,
> >
> >     It takes about 22 hours to perform an scf (about 2 hours to perform
> >     diagonalization until convergence is achieved, and about 20 hours to
> >     compute force and stress).
> >     Here is the google drive link containing input and output files:
> >
> https://drive.google.com/file/d/1DFtLqFvrc8CFo1_q_jjnMFErvXpWEAHB/view?usp=sharing
> >     <
> https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdrive.google.com%2Ffile%2Fd%2F1DFtLqFvrc8CFo1_q_jjnMFErvXpWEAHB%2Fview%3Fusp%3Dsharing&data=04%7C01%7Cpietro.bonfa%40unipr.it%7Ce34d20b789e54091c08008d8b0906452%7Cbb064bc5b7a841ecbabed7beb3faeb1c%7C0%7C0%7C637453483257890985%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&sdata=s5PSyKLXmuHXgzPSj9xcsCqPXt%2BmKPw9mj%2BZACc9sSk%3D&reserved=0
> >
> >
> >     hp.x will be performed after vc-relax is done.
> >
> >     Thanks in advance,
> >     Mohammad
> >
> >     On Mon, Jan 4, 2021 at 2:38 AM Pietro Bonfa' <pietro.bonfa at unipr.it
> >     <mailto:pietro.bonfa at unipr.it <pietro.bonfa at unipr.it>>> wrote:
> >
> >         Dear Mohammad,
> >
> >         the performance of the GPU code depends dramatically on the
> >         portions of
> >         computation that are still performed on the CPU. Only a portion
> >         of all
> >         contributions to forces have been accelerated, and what is left
> >         out may
> >         be optimized for MPI parallelism rather than openmp.
> >
> >         That being said, the behavior that you report is definitively
> >         unusual.
> >         Would you mind sharing input and output files?
> >
> >         Best regards,
> >         Pietro
> >
> >
> >
> >         On 1/3/21 8:52 AM, Mohammad Moaddeli wrote:
> >          > Dear all,
> >          >
> >          > GPU enabled QE v.6.7 is compiled on a VOLTA card. I am trying
> >         to run a
> >          > vc-relax for a bulk containing 48 atoms. Although
> >         diagonalization
> >          > (davidson) is about 3x faster than CPU, it takes a lot of
> >         time (a couple
> >          > of hours) to compute force and stress. Is this something
> >         related to the
> >          > code itself?
> >          >
> >          > Best,
> >          >
> >          > Mohammad Moaddeli
> >          > ShirazU
> >          >
> >          > _______________________________________________
> >          > Quantum ESPRESSO is supported by MaX (www.max-centre.eu
> >         <
> https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.max-centre.eu%2F&data=04%7C01%7Cpietro.bonfa%40unipr.it%7Ce34d20b789e54091c08008d8b0906452%7Cbb064bc5b7a841ecbabed7beb3faeb1c%7C0%7C0%7C637453483257900986%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&sdata=ayiwKOVdIHx6hQ4VpfrjS1rTXOEiHF3ZWgGTRVbawCo%3D&reserved=0
> >)
> >          > users mailing list users at lists.quantum-espresso.org
> >         <mailto:users at lists.quantum-espresso.org
> <users at lists.quantum-espresso.org>>
> >          > https://lists.quantum-espresso.org/mailman/listinfo/users
> >         <
> https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.quantum-espresso.org%2Fmailman%2Flistinfo%2Fusers&data=04%7C01%7Cpietro.bonfa%40unipr.it%7Ce34d20b789e54091c08008d8b0906452%7Cbb064bc5b7a841ecbabed7beb3faeb1c%7C0%7C0%7C637453483257900986%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&sdata=nnTzluNl%2FpWGpQ3HPy0Zm%2Bjvt4qUlNIW5mNyncKQWoU%3D&reserved=0
> >
> >          >
> >         _______________________________________________
> >         Quantum ESPRESSO is supported by MaX (www.max-centre.eu
> >         <
> https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.max-centre.eu%2F&data=04%7C01%7Cpietro.bonfa%40unipr.it%7Ce34d20b789e54091c08008d8b0906452%7Cbb064bc5b7a841ecbabed7beb3faeb1c%7C0%7C0%7C637453483257910979%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&sdata=VDmrELj9IT2pHITCuC6xEAj%2Bc9PJuhhij7noi0qPqsM%3D&reserved=0
> >)
> >         users mailing list users at lists.quantum-espresso.org
> >         <mailto:users at lists.quantum-espresso.org
> <users at lists.quantum-espresso.org>>
> >         https://lists.quantum-espresso.org/mailman/listinfo/users
> >         <
> https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.quantum-espresso.org%2Fmailman%2Flistinfo%2Fusers&data=04%7C01%7Cpietro.bonfa%40unipr.it%7Ce34d20b789e54091c08008d8b0906452%7Cbb064bc5b7a841ecbabed7beb3faeb1c%7C0%7C0%7C637453483257920970%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&sdata=IBKLu4F2OC%2BLtjc0eeKxBt1m1dMEzLizqXZ1EbH3%2FSI%3D&reserved=0
> >
> >
> >     _______________________________________________
> >     Quantum ESPRESSO is supported by MaX (www.max-centre.eu
> >     <
> https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.max-centre.eu%2F&data=04%7C01%7Cpietro.bonfa%40unipr.it%7Ce34d20b789e54091c08008d8b0906452%7Cbb064bc5b7a841ecbabed7beb3faeb1c%7C0%7C0%7C637453483257920970%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&sdata=OYY1he5ImqA5ypHVCP0oxQLCMVoqVa3hXbngEJRmZhw%3D&reserved=0
> >)
> >     users mailing list users at lists.quantum-espresso.org
> >     <mailto:users at lists.quantum-espresso.org
> <users at lists.quantum-espresso.org>>
> >     https://lists.quantum-espresso.org/mailman/listinfo/users
> >     <
> https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.quantum-espresso.org%2Fmailman%2Flistinfo%2Fusers&data=04%7C01%7Cpietro.bonfa%40unipr.it%7Ce34d20b789e54091c08008d8b0906452%7Cbb064bc5b7a841ecbabed7beb3faeb1c%7C0%7C0%7C637453483257930965%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&sdata=TliW5iLzHDjHe1J2j51oBqQAi2iQ8VIb12QQesVXmZA%3D&reserved=0
> >
> >
> >
> >
> > --
> > Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e Fisiche,
> > Univ. Udine, via delle Scienze 208, 33100 Udine, Italy
> > Phone +39-0432-558216, fax +39-0432-558222
> >
> >
> > _______________________________________________
> > Quantum ESPRESSO is supported by MaX (www.max-centre.eu)
> > users mailing list users at lists.quantum-espresso.org
> > https://lists.quantum-espresso.org/mailman/listinfo/users
> >
> _______________________________________________
> Quantum ESPRESSO is supported by MaX (www.max-centre.eu)
> users mailing list users at lists.quantum-espresso.org
> https://lists.quantum-espresso.org/mailman/listinfo/users
> _______________________________________________
> Quantum ESPRESSO is supported by MaX (www.max-centre.eu)
> users mailing list users at lists.quantum-espresso.org
> https://lists.quantum-espresso.org/mailman/listinfo/users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20210104/d74b8070/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: config.log
Type: application/octet-stream
Size: 18239 bytes
Desc: not available
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20210104/d74b8070/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: make.inc
Type: application/octet-stream
Size: 6533 bytes
Desc: not available
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20210104/d74b8070/attachment-0001.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test.rar
Type: application/octet-stream
Size: 125276 bytes
Desc: not available
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20210104/d74b8070/attachment-0002.obj>