[Pw_forum] Large difference between the CPU and wall time in the c_bands and sum_bands routines

Thu Jun 29 20:03:57 CEST 2017

Dear Prof. Paolo and Prof. Lorenzo,

Thank you for your thoughtful replies. I have carefully examined the
program, and run it using the processor pools as well. It turns out that
the memory swapping was the culprit; the total memory available for my
usage was 32 GB. However, the program was using a virtual memory of over 90
GB, which was the bottleneck.

Thankfully, I was able to reduce the memory usage of the program by using
the disk_io='high' and changing the mixing_ndim flag to 4, and now, the
program seems to run fine.

I would like to express my gratitude to you for helping me in this matter.

Yours Sincerely,
M Harshavardhan
Fourth Year Undergraduate
Engineering Physics
IIT Madras

On Thu, Jun 29, 2017 at 3:22 PM, Paolo Giannozzi <p.giannozzi at gmail.com>
wrote:

> Not sure it is a MPI problem: "fft_scatter" is where most of the
> communications take place, but its wall and cpu time are not so
> different. I think it is a problem of "swapping": the code requires
> (much) more memory than it is available, spending most of the time
> reading from disk the arrays it needs,  writing to disk those it
> doesn't need any longer. If disk_io='high', it might also be a problem
> of I/O.
>
> Paolo
>
> On Thu, Jun 29, 2017 at 10:48 AM, Lorenzo Paulatto
> <lorenzo.paulatto at impmc.upmc.fr> wrote:
> > [re-sending to mailing list as I answered privately by mistake]
> >
> > Hello,
> >
> > On 29/06/17 09:57, Harsha Vardhan wrote:
> >> I have observed that the c_bands and sum_bands routines are taking up
> >> a huge amount of wall time, as compared to the CPU time. I am
> >> attaching the time report for the completed calculation below:
> >>
> >
> > the high wall times indicates a lot of MPI communication, which means
> > that your simulation will probably run faster with less CPUs. Are you
> > using as many pools as possible? Pool parallelism requires less
> > communication. Here is an example syntax:
> >
> > mpirun -np 16 pw.x -npool 16 -in input
> >
> > The number of pool must be smaller than the number of CPUs and of the
> > number of k-points.
> >
> > Also, having npool = n_kpoints - small_number is not a good idea, as
> > most CPUs will have one k-point, while only small_number will have two,
> > slowing everyone down (it would be more efficient to use less CPUS, i.e.
> > npool=ncpus=n_kpoints/2)
> >
> > If you are already at maximum number of pools, you can try to reduce the
> > number of MPI process and use openmp instead, be sure that the code is
> > compiled with the --enable-openmp option and set the variable
> > OMP_NUM_THREADS to the ratio ncpus/n_mpi_processes, e.g. with 16 CPUs:
> >
> > export OMP_NUM_THREADS=4
> >
> > mpirun -x OMP_NUM_THREADS -np 4 pw.x -npool 4 -in input
> >
> >
> > Finally, are you sure that you need a 9x9x1 grid of kpoints for an 8x8x1
> > supercell of graphene? This would be equivalent to using a 72x72x1 grid
> > in the unit cell, which is quite enormous.
> >
> >
> > hth
> >
> > --
> > Dr. Lorenzo Paulatto
> > IdR @ IMPMC -- CNRS & Université Paris 6
> > phone: +33 (0)1 442 79822 / skype: paulatz
> > www:   http://www-int.impmc.upmc.fr/~paulatto/
> > mail:  23-24/423 Boîte courrier 115, 4 place Jussieu 75252 Paris Cédex 05
> >
> > _______________________________________________
> > Pw_forum mailing list
> > Pw_forum at pwscf.org
> > http://pwscf.org/mailman/listinfo/pw_forum
>
>
>
> --
> Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e Fisiche,
> Univ. Udine, via delle Scienze 208, 33100 Udine, Italy
> Phone +39-0432-558216, fax +39-0432-558222
>
> _______________________________________________
> Pw_forum mailing list
> Pw_forum at pwscf.org
> http://pwscf.org/mailman/listinfo/pw_forum
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20170629/50527e87/attachment.html>