[QE-users] DFT+U+V neighbour allocation extremely slow and not parallelized

Timrov Iurii iurii.timrov at psi.ch
Mon May 5 16:01:29 CEST 2025


Dear Julien,

To solve both problems, I would define a new atomic type for the couple of interest and apply the inter-site V between these two "Hubbard atoms". All the other atoms will be treated as non-Hubbard, and hence the bottleneck in alloc_neigh should disappear.

Greetings,
Iurii

----------------------------------------------------------
Dr. Iurii TIMROV
Tenure-track scientist
Laboratory for Materials Simulations (LMS)
Paul Scherrer Institut (PSI)
CH-5232 Villigen, Switzerland
+41 56 310 62 14
https://www.psi.ch/en/lms/people/iurii-timrov
________________________________
From: users <users-bounces at lists.quantum-espresso.org> on behalf of julien_barbaud at sjtu.edu.cn <julien_barbaud at sjtu.edu.cn>
Sent: Monday, May 5, 2025 01:10
To: users at lists.quantum-espresso.org <users at lists.quantum-espresso.org>
Subject: [QE-users] DFT+U+V neighbour allocation extremely slow and not parallelized


Dear QE users,



I am currently trying to run calculations on a 324-atom crystal system using DFT+U+V (I’m running a recompiled version of QE 7.2 with an increased natx parameter). More specifically, I am trying to use only the V parameter on the combination of two specific atomic orbitals which are supposed to form a polaronic dimer (the Hubbard potential is helping to localize the polaronic charge on the dimer after tuning). This worked well on smaller systems.

The issue that I’m facing with the large system is that the calculation is taking very long to initiate (~20 000 seconds), while the iterations themselves are relatively fast (~3000 seconds).This seems strange that the preliminary calculations take so much longer than the actual iterations. On smaller systems (96 atoms), I was not observing that trend, the bottleneck was the iterations as expected and there was barely any initialization time.

The most concerning aspect is that this bottleneck does not seem to be effectively distributed in parallel, because no matter how many nodes I use, the pre-iteration calculations always seem to take about 20 000 seconds (meanwhile, the iteration time does decrease when using more nodes)

The routine taking up all that time is the Hubbard routine “alloc_neigh”. I thought it might have been due to some memory issues, but looking at the memory usage of the nodes during the job, it seems to have only used about 30% of memory according to slurm “seff” utility. I have also experimented with different IO settings to try to reduce memory usage without success.



Is there a way to speed up that part of the calculation, or at least to distribute it over several nodes? Am I doing something wrong in my input?

Additionally, it seems that the program is calculating Hubbard projectors for every single atom of the species, even though I am only applying a V parameter on two hand-picked atoms, which seems very wasteful indeed. Is there a way to force the program to drop the Hubbard calculations on the atoms of the same species which do not receive a V value?



I have attached the input and output file of an example job (charge has been set to 0 in that particular one, but same problem happens when adding the polaronic charge)



Thanks in advance!

Julien


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20250505/89e4346a/attachment.html>


More information about the users mailing list