[Pw_forum] Large difference between the CPU and wall time in the c_bands and sum_bands routines
Lorenzo Paulatto
lorenzo.paulatto at impmc.upmc.fr
Thu Jun 29 10:48:33 CEST 2017
[re-sending to mailing list as I answered privately by mistake]
Hello,
On 29/06/17 09:57, Harsha Vardhan wrote:
> I have observed that the c_bands and sum_bands routines are taking up
> a huge amount of wall time, as compared to the CPU time. I am
> attaching the time report for the completed calculation below:
>
the high wall times indicates a lot of MPI communication, which means
that your simulation will probably run faster with less CPUs. Are you
using as many pools as possible? Pool parallelism requires less
communication. Here is an example syntax:
mpirun -np 16 pw.x -npool 16 -in input
The number of pool must be smaller than the number of CPUs and of the
number of k-points.
Also, having npool = n_kpoints - small_number is not a good idea, as
most CPUs will have one k-point, while only small_number will have two,
slowing everyone down (it would be more efficient to use less CPUS, i.e.
npool=ncpus=n_kpoints/2)
If you are already at maximum number of pools, you can try to reduce the
number of MPI process and use openmp instead, be sure that the code is
compiled with the --enable-openmp option and set the variable
OMP_NUM_THREADS to the ratio ncpus/n_mpi_processes, e.g. with 16 CPUs:
export OMP_NUM_THREADS=4
mpirun -x OMP_NUM_THREADS -np 4 pw.x -npool 4 -in input
Finally, are you sure that you need a 9x9x1 grid of kpoints for an 8x8x1
supercell of graphene? This would be equivalent to using a 72x72x1 grid
in the unit cell, which is quite enormous.
hth
--
Dr. Lorenzo Paulatto
IdR @ IMPMC -- CNRS & Université Paris 6
phone: +33 (0)1 442 79822 / skype: paulatz
www: http://www-int.impmc.upmc.fr/~paulatto/
mail: 23-24/423 Boîte courrier 115, 4 place Jussieu 75252 Paris Cédex 05
More information about the users
mailing list