[QE-users] Optimal pw command line for large systems and only Gamma point

Fri May 10 12:18:23 CEST 2024

Thanks to both Paolo and Giuseppe for your answers.

I cannot reduce the size of the system as I need to consider a large 
nanocluster; also, as I mentioned, I will need to consider even larger 
systems. All the systems are not periodic, so I will always have only 
the Gamma point.

I submitted the calculation with 12 nodes and 12 cores per node and it 
didn't stop because of the memory issue as usual, so I believe that this 
is the right route to follow. Also, I used no options in the command 
line at the moment as suggested by Giuseppe. Now the question is how to 
find the optimal number of cores and the right command line options to 
speed-up the calculation but without having memory issues.

Thanks a lot in advance for your suggestions on this.

All the best

Antonio

Il 10. 05. 24 12:01, Paolo Giannozzi ha scritto:
> On 5/10/24 08:58, Antonio Cammarata via users wrote:
>
>> pw.x -nk 1 -nt 1 -nb 1 -nd 768 -inp qe.in > qe.out
>
> too many processors for linear-algebra parallelization. 1000 Si atoms 
> = 2000 bands (assuming an insulator with no spin polarization). Use a 
> few tens of processors at most
>
>> "some processors have no G-vectors for symmetrization". 
>
> which sounds strange to me: with the Gamma point symmetrization is not 
> even needed
>
>
>>       Dense  grid: 30754065 G-vectors FFT dimensions: ( 400, 400, 400)
>
> This is what a 256-atom Si supercell with 30 Ry cutoff yields:
>
>      Dense  grid:   825897 G-vectors     FFT dimensions: ( 162, 162, 162)
>
> I guess you may reduce the size of your supercell
>
> Paolo
>
>>       Dynamical RAM for wfc:     153.50 MB
>>       Dynamical RAM for     wfc (w. buffer):     153.50 MB
>>       Dynamical RAM for           str. fact:       0.61 MB
>>       Dynamical RAM for           local pot:       0.00 MB
>>       Dynamical RAM for          nlocal pot:    1374.66 MB
>>       Dynamical RAM for                qrad:       0.87 MB
>>       Dynamical RAM for          rho,v,vnew:       5.50 MB
>>       Dynamical RAM for               rhoin:       1.83 MB
>>       Dynamical RAM for            rho*nmix:       9.78 MB
>>       Dynamical RAM for           G-vectors:       2.60 MB
>>       Dynamical RAM for          h,s,v(r/c):       0.25 MB
>>       Dynamical RAM for          <psi|beta>:     552.06 MB
>>       Dynamical RAM for      wfcinit/wfcrot:     977.20 MB
>>       Estimated static dynamical RAM per process >       1.51 GB
>>       Estimated max dynamical RAM per process >       2.47 GB
>>       Estimated total dynamical RAM >    1900.41 GB
>>
>> I managed to run the simulation with 512 atoms, cg diagonalization 
>> and 3 nodes on the same machine with command line
>>
>> pw.x -nk 1 -nt 1 -nd 484 -inp qe.in > qe.out
>>
>> Please, do you have any suggestion on how to set optimal 
>> parallelization parameters to avoid the memory issue and run the 
>> calculation? I am also planning to run simulations on nanoclusters 
>> with more than 1000 atoms.
>>
>> Thanks a lot in advance for your kind help.
>>
>> Antonio
>>
>>
>
-- 
_______________________________________________
Antonio Cammarata, PhD in Physics
Associate Professor in Applied Physics
Advanced Materials Group
Department of Control Engineering - KN:G-204
Faculty of Electrical Engineering
Czech Technical University in Prague
Karlovo Náměstí, 13
121 35, Prague 2, Czech Republic
Phone: +420 224 35 5711
Fax:   +420 224 91 8646
ORCID: orcid.org/0000-0002-5691-0682
ResercherID: A-4883-2014