[QE-users] Optimal pw command line for large systems and only Gamma point
Giuseppe Mattioli
giuseppe.mattioli at ism.cnr.it
Mon May 13 17:26:59 CEST 2024
Dear Antonio
> The actual time spent per scf cycle is about 33 minutes.
This is not so bad. :-)
> The relevant parameters in the input file are the following:
Some relevant parameters are not shown.
> input_dft= 'pz'
> ecutwfc= 25
Which kind of pseudopotential? You didn't set ecutrho...
What about ibrav and celldm?
I suppose that you really want to perform LDA calculations for some reason.
> occupations= 'smearing'
> smearing= 'cold'
> degauss= 0.05 ! I know it's quite large, but necessary to
> stabilize the SCF at this preliminary stage (no geometry step done
> yet)
> mixing_beta= 0.4
If you want to stabilize the scf it is better to use a Gaussian
smearing and to reduce degauss (to 0.01) and mixing beta (to 0.1 or
even 0.05~0.01). In the case of a relax calculation with a difficult
first step, try to use scf_must_converge=.false. and a reasonable
electron_maxstep (30~50). It often helps when the scf is not
completely going astray.
> nbnd= 2010
>
> diagonalization= 'ppcg'
davidson should be faster.
> And, if possible, also to reduce the number of nodes?
> Estimated total dynamical RAM > 1441.34 GB
you may try with 7-8 nodes according to this estimate.
HTH
Giuseppe
Quoting Antonio Cammarata via users <users at lists.quantum-espresso.org>:
> I did some tests. For 1000 Si atoms, I use 2010 bands because I need
> to get the band gap value; moreover, being a cluster, the surface
> states of the truncated bonds might close the gap, especially at the
> first steps of the geometry optimization, so it's better I use few
> empty bands. I managed to run the calculation by using 10 nodes and
> a max of 40 cores per node. My question now is: can you suggest me
> optimal command line options and/or input settings to speed up the
> calculation? And, if possible, also to reduce the number of nodes?
> The relevant parameters in the input file are the following:
>
> input_dft= 'pz'
> ecutwfc= 25
> occupations= 'smearing'
> smearing= 'cold'
> degauss= 0.05 ! I know it's quite large, but necessary to
> stabilize the SCF at this preliminary stage (no geometry step done
> yet)
> nbnd= 2010
>
> diagonalization= 'ppcg'
> mixing_mode= 'plain'
> mixing_beta= 0.4
>
> The actual time spent per scf cycle is about 33 minutes. I use QE v.
> 7.3 compiled with openmpi and scalapack. I have access to the intel
> compilers too but I did some tests and the difference is just tens
> of seconds. I have only the Gamma point; please, here you have some
> info about the grid and the estimated RAM usage:
>
> Dense grid: 24616397 G-vectors FFT dimensions: ( 375, 375, 375)
> Dynamical RAM for wfc: 235.91 MB
> Dynamical RAM for wfc (w. buffer): 235.91 MB
> Dynamical RAM for str. fact: 0.94 MB
> Dynamical RAM for local pot: 0.00 MB
> Dynamical RAM for nlocal pot: 2112.67 MB
> Dynamical RAM for qrad: 0.80 MB
> Dynamical RAM for rho,v,vnew: 6.04 MB
> Dynamical RAM for rhoin: 2.01 MB
> Dynamical RAM for rho*nmix: 15.03 MB
> Dynamical RAM for G-vectors: 3.99 MB
> Dynamical RAM for h,s,v(r/c): 0.46 MB
> Dynamical RAM for <psi|beta>: 552.06 MB
> Dynamical RAM for wfcinit/wfcrot: 1305.21 MB
> Estimated static dynamical RAM per process > 2.31 GB
> Estimated max dynamical RAM per process > 3.60 GB
> Estimated total dynamical RAM > 1441.34 GB
>
> Thanks a lot in advance for your kind help.
>
> All the best
>
> Antonio
>
>
> On 10. 05. 24 12:01, Paolo Giannozzi wrote:
>> On 5/10/24 08:58, Antonio Cammarata via users wrote:
>>
>>> pw.x -nk 1 -nt 1 -nb 1 -nd 768 -inp qe.in > qe.out
>>
>> too many processors for linear-algebra parallelization. 1000 Si
>> atoms = 2000 bands (assuming an insulator with no spin
>> polarization). Use a few tens of processors at most
>>
>>> "some processors have no G-vectors for symmetrization".
>>
>> which sounds strange to me: with the Gamma point symmetrization is
>> not even needed
>>
>>
>>> Dense grid: 30754065 G-vectors FFT dimensions: ( 400, 400, 400)
>>
>> This is what a 256-atom Si supercell with 30 Ry cutoff yields:
>>
>> Dense grid: 825897 G-vectors FFT dimensions: ( 162, 162, 162)
>>
>> I guess you may reduce the size of your supercell
>>
>> Paolo
>>
>>> Dynamical RAM for wfc: 153.50 MB
>>> Dynamical RAM for wfc (w. buffer): 153.50 MB
>>> Dynamical RAM for str. fact: 0.61 MB
>>> Dynamical RAM for local pot: 0.00 MB
>>> Dynamical RAM for nlocal pot: 1374.66 MB
>>> Dynamical RAM for qrad: 0.87 MB
>>> Dynamical RAM for rho,v,vnew: 5.50 MB
>>> Dynamical RAM for rhoin: 1.83 MB
>>> Dynamical RAM for rho*nmix: 9.78 MB
>>> Dynamical RAM for G-vectors: 2.60 MB
>>> Dynamical RAM for h,s,v(r/c): 0.25 MB
>>> Dynamical RAM for <psi|beta>: 552.06 MB
>>> Dynamical RAM for wfcinit/wfcrot: 977.20 MB
>>> Estimated static dynamical RAM per process > 1.51 GB
>>> Estimated max dynamical RAM per process > 2.47 GB
>>> Estimated total dynamical RAM > 1900.41 GB
>>>
>>> I managed to run the simulation with 512 atoms, cg diagonalization
>>> and 3 nodes on the same machine with command line
>>>
>>> pw.x -nk 1 -nt 1 -nd 484 -inp qe.in > qe.out
>>>
>>> Please, do you have any suggestion on how to set optimal
>>> parallelization parameters to avoid the memory issue and run the
>>> calculation? I am also planning to run simulations on nanoclusters
>>> with more than 1000 atoms.
>>>
>>> Thanks a lot in advance for your kind help.
>>>
>>> Antonio
>>>
>>>
>>
> --
> _______________________________________________
> Antonio Cammarata, PhD in Physics
> Associate Professor in Applied Physics
> Advanced Materials Group
> Department of Control Engineering - KN:G-204
> Faculty of Electrical Engineering
> Czech Technical University in Prague
> Karlovo Náměstí, 13
> 121 35, Prague 2, Czech Republic
> Phone: +420 224 35 5711
> Fax: +420 224 91 8646
> ORCID: orcid.org/0000-0002-5691-0682
> WoS ResearcherID: A-4883-2014
>
> _______________________________________________
> The Quantum ESPRESSO community stands by the Ukrainian
> people and expresses its concerns about the devastating
> effects that the Russian military offensive has on their
> country and on the free and peaceful scientific, cultural,
> and economic cooperation amongst peoples
> _______________________________________________
> Quantum ESPRESSO is supported by MaX (www.max-centre.eu)
> users mailing list users at lists.quantum-espresso.org
> https://lists.quantum-espresso.org/mailman/listinfo/users
GIUSEPPE MATTIOLI
CNR - ISTITUTO DI STRUTTURA DELLA MATERIA
Via Salaria Km 29,300 - C.P. 10
I-00015 - Monterotondo Scalo (RM)
Mob (*preferred*) +39 373 7305625
Tel + 39 06 90672342 - Fax +39 06 90672316
E-mail: <giuseppe.mattioli at ism.cnr.it>
More information about the users
mailing list