[QE-users] Optimal pw command line for large systems and only Gamma point

Mon May 13 17:26:59 CEST 2024

Dear Antonio

> The actual time spent per scf cycle is about 33 minutes.

This is not so bad. :-)

> The relevant parameters in the input file are the following:

Some relevant parameters are not shown.

>     input_dft= 'pz'
>     ecutwfc= 25

Which kind of pseudopotential? You didn't set ecutrho...
What about ibrav and celldm?
I suppose that you really want to perform LDA calculations for some reason.

>     occupations= 'smearing'
>     smearing= 'cold'
>     degauss= 0.05 ! I know it's quite large, but necessary to  
> stabilize the SCF at this preliminary stage (no geometry step done  
> yet)
>     mixing_beta= 0.4

If you want to stabilize the scf it is better to use a Gaussian  
smearing and to reduce degauss (to 0.01) and mixing beta (to 0.1 or  
even 0.05~0.01). In the case of a relax calculation with a difficult  
first step, try to use scf_must_converge=.false. and a reasonable  
electron_maxstep (30~50). It often helps when the scf is not  
completely going astray.

>     nbnd= 2010
>
>     diagonalization= 'ppcg'

davidson should be faster.

> And, if possible, also to reduce the number of nodes?

>      Estimated total dynamical RAM >    1441.34 GB

you may try with 7-8 nodes according to this estimate.

HTH
Giuseppe

Quoting Antonio Cammarata via users <users at lists.quantum-espresso.org>:

> I did some tests. For 1000 Si atoms, I use 2010 bands because I need  
> to get the band gap value; moreover, being a cluster, the surface  
> states of the truncated bonds might close the gap, especially at the  
> first steps of the geometry optimization, so it's better I use few  
> empty bands. I managed to run the calculation by using 10 nodes and  
> a max of 40 cores per node. My question now is: can you suggest me  
> optimal command line options and/or input settings to speed up the  
> calculation? And, if possible, also to reduce the number of nodes?  
> The relevant parameters in the input file are the following:
>
>     input_dft= 'pz'
>     ecutwfc= 25
>     occupations= 'smearing'
>     smearing= 'cold'
>     degauss= 0.05 ! I know it's quite large, but necessary to  
> stabilize the SCF at this preliminary stage (no geometry step done  
> yet)
>     nbnd= 2010
>
>     diagonalization= 'ppcg'
>     mixing_mode= 'plain'
>     mixing_beta= 0.4
>
> The actual time spent per scf cycle is about 33 minutes. I use QE v.  
> 7.3 compiled with openmpi and scalapack. I have access to the intel  
> compilers too but I did some tests and the difference is just tens  
> of seconds. I have only the Gamma point; please, here you have some  
> info about the grid and the estimated RAM usage:
>
>      Dense  grid: 24616397 G-vectors     FFT dimensions: ( 375, 375, 375)
>      Dynamical RAM for                 wfc:     235.91 MB
>      Dynamical RAM for     wfc (w. buffer):     235.91 MB
>      Dynamical RAM for           str. fact:       0.94 MB
>      Dynamical RAM for           local pot:       0.00 MB
>      Dynamical RAM for          nlocal pot:    2112.67 MB
>      Dynamical RAM for                qrad:       0.80 MB
>      Dynamical RAM for          rho,v,vnew:       6.04 MB
>      Dynamical RAM for               rhoin:       2.01 MB
>      Dynamical RAM for            rho*nmix:      15.03 MB
>      Dynamical RAM for           G-vectors:       3.99 MB
>      Dynamical RAM for          h,s,v(r/c):       0.46 MB
>      Dynamical RAM for          <psi|beta>:     552.06 MB
>      Dynamical RAM for      wfcinit/wfcrot:    1305.21 MB
>      Estimated static dynamical RAM per process >       2.31 GB
>      Estimated max dynamical RAM per process >       3.60 GB
>      Estimated total dynamical RAM >    1441.34 GB
>
> Thanks a lot in advance for your kind help.
>
> All the best
>
> Antonio
>
>
> On 10. 05. 24 12:01, Paolo Giannozzi wrote:
>> On 5/10/24 08:58, Antonio Cammarata via users wrote:
>>
>>> pw.x -nk 1 -nt 1 -nb 1 -nd 768 -inp qe.in > qe.out
>>
>> too many processors for linear-algebra parallelization. 1000 Si  
>> atoms = 2000 bands (assuming an insulator with no spin  
>> polarization). Use a few tens of processors at most
>>
>>> "some processors have no G-vectors for symmetrization".
>>
>> which sounds strange to me: with the Gamma point symmetrization is  
>> not even needed
>>
>>
>>>       Dense  grid: 30754065 G-vectors FFT dimensions: ( 400, 400, 400)
>>
>> This is what a 256-atom Si supercell with 30 Ry cutoff yields:
>>
>>      Dense  grid:   825897 G-vectors     FFT dimensions: ( 162, 162, 162)
>>
>> I guess you may reduce the size of your supercell
>>
>> Paolo
>>
>>>       Dynamical RAM for wfc:     153.50 MB
>>>       Dynamical RAM for     wfc (w. buffer):     153.50 MB
>>>       Dynamical RAM for           str. fact:       0.61 MB
>>>       Dynamical RAM for           local pot:       0.00 MB
>>>       Dynamical RAM for          nlocal pot:    1374.66 MB
>>>       Dynamical RAM for                qrad:       0.87 MB
>>>       Dynamical RAM for          rho,v,vnew:       5.50 MB
>>>       Dynamical RAM for               rhoin:       1.83 MB
>>>       Dynamical RAM for            rho*nmix:       9.78 MB
>>>       Dynamical RAM for           G-vectors:       2.60 MB
>>>       Dynamical RAM for          h,s,v(r/c):       0.25 MB
>>>       Dynamical RAM for          <psi|beta>:     552.06 MB
>>>       Dynamical RAM for      wfcinit/wfcrot:     977.20 MB
>>>       Estimated static dynamical RAM per process >       1.51 GB
>>>       Estimated max dynamical RAM per process >       2.47 GB
>>>       Estimated total dynamical RAM >    1900.41 GB
>>>
>>> I managed to run the simulation with 512 atoms, cg diagonalization  
>>> and 3 nodes on the same machine with command line
>>>
>>> pw.x -nk 1 -nt 1 -nd 484 -inp qe.in > qe.out
>>>
>>> Please, do you have any suggestion on how to set optimal  
>>> parallelization parameters to avoid the memory issue and run the  
>>> calculation? I am also planning to run simulations on nanoclusters  
>>> with more than 1000 atoms.
>>>
>>> Thanks a lot in advance for your kind help.
>>>
>>> Antonio
>>>
>>>
>>
> -- 
> _______________________________________________
> Antonio Cammarata, PhD in Physics
> Associate Professor in Applied Physics
> Advanced Materials Group
> Department of Control Engineering - KN:G-204
> Faculty of Electrical Engineering
> Czech Technical University in Prague
> Karlovo Náměstí, 13
> 121 35, Prague 2, Czech Republic
> Phone: +420 224 35 5711
> Fax:   +420 224 91 8646
> ORCID: orcid.org/0000-0002-5691-0682
> WoS ResearcherID: A-4883-2014
>
> _______________________________________________
> The Quantum ESPRESSO community stands by the Ukrainian
> people and expresses its concerns about the devastating
> effects that the Russian military offensive has on their
> country and on the free and peaceful scientific, cultural,
> and economic cooperation amongst peoples
> _______________________________________________
> Quantum ESPRESSO is supported by MaX (www.max-centre.eu)
> users mailing list users at lists.quantum-espresso.org
> https://lists.quantum-espresso.org/mailman/listinfo/users

GIUSEPPE MATTIOLI
CNR - ISTITUTO DI STRUTTURA DELLA MATERIA
Via Salaria Km 29,300 - C.P. 10
I-00015 - Monterotondo Scalo (RM)
Mob (*preferred*) +39 373 7305625
Tel + 39 06 90672342 - Fax +39 06 90672316
E-mail: <giuseppe.mattioli at ism.cnr.it>