[QE-users] Optimal pw command line for large systems and only Gamma point

Antonio Cammarata cammaant at fel.cvut.cz
Fri May 10 08:58:54 CEST 2024


Dear all,

I have a silicon nanocluster with 1000 atoms with 1 1 1 k-mesh (only 
Gamma point). I cannot manage to run the calculation due to memory 
issue. I use a computational cluster with 128 core/node and 200 GB RAM 
per node. I am using PWSCF v.7.3. In the input I set ecutwfc= 29 and cg 
diagonalization to save memory. According to

https://www.quantum-espresso.org/Doc/user_guide/node20.html

I tried several command line parameters, the last being

pw.x -nk 1 -nt 1 -nb 1 -nd 768 -inp qe.in > qe.out

for a run on 6 nodes. I tried up to 12 nodes but after 7 nodes I get the 
warning message "some processors have no G-vectors for symmetrization". 
Here some info that may be relevant for the issue

      Dense  grid: 30754065 G-vectors     FFT dimensions: ( 400, 400, 400)
      Dynamical RAM for                 wfc:     153.50 MB
      Dynamical RAM for     wfc (w. buffer):     153.50 MB
      Dynamical RAM for           str. fact:       0.61 MB
      Dynamical RAM for           local pot:       0.00 MB
      Dynamical RAM for          nlocal pot:    1374.66 MB
      Dynamical RAM for                qrad:       0.87 MB
      Dynamical RAM for          rho,v,vnew:       5.50 MB
      Dynamical RAM for               rhoin:       1.83 MB
      Dynamical RAM for            rho*nmix:       9.78 MB
      Dynamical RAM for           G-vectors:       2.60 MB
      Dynamical RAM for          h,s,v(r/c):       0.25 MB
      Dynamical RAM for          <psi|beta>:     552.06 MB
      Dynamical RAM for      wfcinit/wfcrot:     977.20 MB
      Estimated static dynamical RAM per process >       1.51 GB
      Estimated max dynamical RAM per process >       2.47 GB
      Estimated total dynamical RAM >    1900.41 GB

I managed to run the simulation with 512 atoms, cg diagonalization and 3 
nodes on the same machine with command line

pw.x -nk 1 -nt 1 -nd 484 -inp qe.in > qe.out

Please, do you have any suggestion on how to set optimal parallelization 
parameters to avoid the memory issue and run the calculation? I am also 
planning to run simulations on nanoclusters with more than 1000 atoms.

Thanks a lot in advance for your kind help.

Antonio


-- 
_______________________________________________
Antonio Cammarata, PhD in Physics
Associate Professor in Applied Physics
Advanced Materials Group
Department of Control Engineering - KN:G-204
Faculty of Electrical Engineering
Czech Technical University in Prague
Karlovo Náměstí, 13
121 35, Prague 2, Czech Republic
Phone: +420 224 35 5711
Fax:   +420 224 91 8646
ORCID: orcid.org/0000-0002-5691-0682
ResercherID: A-4883-2014



More information about the users mailing list