[QE-users] Memory requirements of projwfc.x in k-resolved case

Thomas Brumme thomas.brumme at uni-leipzig.de
Tue Aug 14 16:48:40 CEST 2018


Dear all,

OK, I did some small test using a modified example 4 from the PP examples.
Essentially, instead of using the band path given there I used (random):

K_POINTS crystal_b
8
0 0 0 10
1 0 0 10
1 1 0 10
1 1 1 10
0 0 0 10
0 1 0 10
0 1 1 10
0 0 0 1

And I changed the pseudo between

Pt.pz-n-rrkjus_psl.0.1.UPF
Pt.pz-n-kjpaw_psl.0.1.UPF

Pt.rel-pz-n-rrkjus_psl.0.1.UPF
Pt.rel-pz-n-kjpaw_psl.0.1.UPF

Finally, I calculated the projected wave functions by using projwfc.x with:

&PROJWFC
     prefix='Pt',
     outdir='$TMP_DIR/'
     ngauss = 0,
     degauss = 0.01,
     Emin = 8,
     Emax = 40,
     DeltaE = 0.01,
     lsym = .false.,
     kresolveddos = .true.,
     filproj = 'pt.band.dat.proj',
/

Or without the kresolveddos flag set to true, i.e., deleting the last 2 
lines above.

In the case of kresolveddos = .true. I always observe that the memory 
used by one process (4 in total) increases to nearly twice the value of 
the others. For example:

for paw (logged with top)
28710 tbrumme   20   0  877776  42480  22016 R 242.0  0.3   0:21.21 
projwfc.x
28710 tbrumme   20   0  909748  74836  22592 R  98.0  0.5   0:25.11 
projwfc.x

for rel-paw
28921 tbrumme   20   0  888844  52388  21380 R 227.5  0.3   0:35.94 
projwfc.x
28921 tbrumme   20   0  920608  86028  22476 R 100.0  0.5   0:40.07 
projwfc.x

for us
29285 tbrumme   20   0  870516  34372  21304 R 219.6  0.2   0:23.30 
projwfc.x
29285 tbrumme   20   0  906888  71848  22272 R  98.0  0.4   0:25.95 
projwfc.x

for rel-us
29102 tbrumme   20   0  878620  43500  21980 R 223.5  0.3   0:34.56 
projwfc.x
29102 tbrumme   20   0  914472  79604  22324 R 102.0  0.5   0:39.10 
projwfc.x

This also happens in a serial calculation, but does not happen when 
calculating with kresolveddos=.false. For the bands calculation I can 
see a maximum memory usage like (for rel paw):
28850 tbrumme   20   0  892820  51972  21256 R 178.4  0.3   1:55.15 pw.x

which is comparable to the memory usage before the sudden increase. The 
output of the estimated memory usage in the bands run tells me that I 
will need a maximum of 7.72 MB per process and 30.89 MB total for us 
potentials. The 34 MB given above (before the increase) is already more 
than the estimate - but OK, I know that it's just an estimate and the 
estimation of the usage was improved in a recent commit. Yet, at the end 
the one task uses even twice this estimate. Judging from the PID I think 
it is the master process (ionode ?!).

In my large calculation of MoS2 on MoS2 projwfc.x does not even reach 
the point of writing DOS per atom, i.e., *.pdos_atm#* and thus the crash 
must be before. So, one way of reducing the memory usage would obviously 
be to reduce the number of k points and apparently also reducing the 
number of energy points does help. And it turns out that this DeltaE 
crucially affects the used memory by one process... So while writing 
this email I found a solution - more or less.

To cut a long story short:

If someone experience the same problem, i.e., memory problems for 
projwfc.x, try reducing the deltae

Cheerio

Thomas Brumme

On 08/14/18 12:03, Thomas Brumme wrote:
> Dear all,
>
> I'm struggling to project the wave functions on atoms in the 
> k-resolved case.
> The job always crashes because of the memory limit. The system itself 
> is quite
> large - 2 layers of MoS2 but rotated, total of 138 atoms. The band 
> structure
> calculation for 151 k points finished without problems using 1.72 GB RAM
> maximum per core (100 cores in total). Starting the projwfc.x run with 
> the
> same settings (100 cores, 2 GB RAM per core) the job is killed because it
> exceeds the memory. Increasing to 8 GB per core does not solve the 
> problem.
>
> What are the exact memory requirements for projwfc.x for the k-resolved
> case? I read in the forums that it shouldn't be more than the 
> corresponding
> scf or bands run, should it? Then why does those runs finish and the 
> projwfc.x
> not? I'm using version 6.2.1 compiled with the old xml format (as I 
> started the
> calculation when the new XML was not there yet and had to stop in 
> between)
> Furthermore, the normal (scf and bands) run are parallelized via the 
> standard
> R & G space devision on 100 cores. Um, and I'm using the relativistic 
> PBE paw
> pseudos of the pslibrary, 55 Ry and 440 Ry cutoffs.
>
> Is the code reading in the wave functions of all k points at once, 
> i.e., would
> it help to reduce the number of k points?
>
> Regards
>
> Thomas
>

-- 
Dr. rer. nat. Thomas Brumme
Wilhelm-Ostwald-Institute for Physical and Theoretical Chemistry
Leipzig University
Phillipp-Rosenthal-Strasse 31
04103 Leipzig

Tel:  +49 (0)341 97 36456

email: thomas.brumme at uni-leipzig.de



More information about the users mailing list