[QE-users] Memory requirements of projwfc.x in k-resolved case
Thomas Brumme
thomas.brumme at uni-leipzig.de
Tue Aug 14 16:48:40 CEST 2018
Dear all,
OK, I did some small test using a modified example 4 from the PP examples.
Essentially, instead of using the band path given there I used (random):
K_POINTS crystal_b
8
0 0 0 10
1 0 0 10
1 1 0 10
1 1 1 10
0 0 0 10
0 1 0 10
0 1 1 10
0 0 0 1
And I changed the pseudo between
Pt.pz-n-rrkjus_psl.0.1.UPF
Pt.pz-n-kjpaw_psl.0.1.UPF
Pt.rel-pz-n-rrkjus_psl.0.1.UPF
Pt.rel-pz-n-kjpaw_psl.0.1.UPF
Finally, I calculated the projected wave functions by using projwfc.x with:
&PROJWFC
prefix='Pt',
outdir='$TMP_DIR/'
ngauss = 0,
degauss = 0.01,
Emin = 8,
Emax = 40,
DeltaE = 0.01,
lsym = .false.,
kresolveddos = .true.,
filproj = 'pt.band.dat.proj',
/
Or without the kresolveddos flag set to true, i.e., deleting the last 2
lines above.
In the case of kresolveddos = .true. I always observe that the memory
used by one process (4 in total) increases to nearly twice the value of
the others. For example:
for paw (logged with top)
28710 tbrumme 20 0 877776 42480 22016 R 242.0 0.3 0:21.21
projwfc.x
28710 tbrumme 20 0 909748 74836 22592 R 98.0 0.5 0:25.11
projwfc.x
for rel-paw
28921 tbrumme 20 0 888844 52388 21380 R 227.5 0.3 0:35.94
projwfc.x
28921 tbrumme 20 0 920608 86028 22476 R 100.0 0.5 0:40.07
projwfc.x
for us
29285 tbrumme 20 0 870516 34372 21304 R 219.6 0.2 0:23.30
projwfc.x
29285 tbrumme 20 0 906888 71848 22272 R 98.0 0.4 0:25.95
projwfc.x
for rel-us
29102 tbrumme 20 0 878620 43500 21980 R 223.5 0.3 0:34.56
projwfc.x
29102 tbrumme 20 0 914472 79604 22324 R 102.0 0.5 0:39.10
projwfc.x
This also happens in a serial calculation, but does not happen when
calculating with kresolveddos=.false. For the bands calculation I can
see a maximum memory usage like (for rel paw):
28850 tbrumme 20 0 892820 51972 21256 R 178.4 0.3 1:55.15 pw.x
which is comparable to the memory usage before the sudden increase. The
output of the estimated memory usage in the bands run tells me that I
will need a maximum of 7.72 MB per process and 30.89 MB total for us
potentials. The 34 MB given above (before the increase) is already more
than the estimate - but OK, I know that it's just an estimate and the
estimation of the usage was improved in a recent commit. Yet, at the end
the one task uses even twice this estimate. Judging from the PID I think
it is the master process (ionode ?!).
In my large calculation of MoS2 on MoS2 projwfc.x does not even reach
the point of writing DOS per atom, i.e., *.pdos_atm#* and thus the crash
must be before. So, one way of reducing the memory usage would obviously
be to reduce the number of k points and apparently also reducing the
number of energy points does help. And it turns out that this DeltaE
crucially affects the used memory by one process... So while writing
this email I found a solution - more or less.
To cut a long story short:
If someone experience the same problem, i.e., memory problems for
projwfc.x, try reducing the deltae
Cheerio
Thomas Brumme
On 08/14/18 12:03, Thomas Brumme wrote:
> Dear all,
>
> I'm struggling to project the wave functions on atoms in the
> k-resolved case.
> The job always crashes because of the memory limit. The system itself
> is quite
> large - 2 layers of MoS2 but rotated, total of 138 atoms. The band
> structure
> calculation for 151 k points finished without problems using 1.72 GB RAM
> maximum per core (100 cores in total). Starting the projwfc.x run with
> the
> same settings (100 cores, 2 GB RAM per core) the job is killed because it
> exceeds the memory. Increasing to 8 GB per core does not solve the
> problem.
>
> What are the exact memory requirements for projwfc.x for the k-resolved
> case? I read in the forums that it shouldn't be more than the
> corresponding
> scf or bands run, should it? Then why does those runs finish and the
> projwfc.x
> not? I'm using version 6.2.1 compiled with the old xml format (as I
> started the
> calculation when the new XML was not there yet and had to stop in
> between)
> Furthermore, the normal (scf and bands) run are parallelized via the
> standard
> R & G space devision on 100 cores. Um, and I'm using the relativistic
> PBE paw
> pseudos of the pslibrary, 55 Ry and 440 Ry cutoffs.
>
> Is the code reading in the wave functions of all k points at once,
> i.e., would
> it help to reduce the number of k points?
>
> Regards
>
> Thomas
>
--
Dr. rer. nat. Thomas Brumme
Wilhelm-Ostwald-Institute for Physical and Theoretical Chemistry
Leipzig University
Phillipp-Rosenthal-Strasse 31
04103 Leipzig
Tel: +49 (0)341 97 36456
email: thomas.brumme at uni-leipzig.de
More information about the users
mailing list