[Pw_forum] Large difference between the CPU and wall time in the c_bands and sum_bands routines

Thu Jun 29 10:48:33 CEST 2017

[re-sending to mailing list as I answered privately by mistake]

Hello,

On 29/06/17 09:57, Harsha Vardhan wrote:
> I have observed that the c_bands and sum_bands routines are taking up 
> a huge amount of wall time, as compared to the CPU time. I am 
> attaching the time report for the completed calculation below:
>

the high wall times indicates a lot of MPI communication, which means 
that your simulation will probably run faster with less CPUs. Are you 
using as many pools as possible? Pool parallelism requires less 
communication. Here is an example syntax:

mpirun -np 16 pw.x -npool 16 -in input

The number of pool must be smaller than the number of CPUs and of the 
number of k-points.

Also, having npool = n_kpoints - small_number is not a good idea, as 
most CPUs will have one k-point, while only small_number will have two, 
slowing everyone down (it would be more efficient to use less CPUS, i.e. 
npool=ncpus=n_kpoints/2)

If you are already at maximum number of pools, you can try to reduce the 
number of MPI process and use openmp instead, be sure that the code is 
compiled with the --enable-openmp option and set the variable 
OMP_NUM_THREADS to the ratio ncpus/n_mpi_processes, e.g. with 16 CPUs:

export OMP_NUM_THREADS=4

mpirun -x OMP_NUM_THREADS -np 4 pw.x -npool 4 -in input

Finally, are you sure that you need a 9x9x1 grid of kpoints for an 8x8x1 
supercell of graphene? This would be equivalent to using a 72x72x1 grid 
in the unit cell, which is quite enormous.

hth

-- 
Dr. Lorenzo Paulatto
IdR @ IMPMC -- CNRS & Université Paris 6
phone: +33 (0)1 442 79822 / skype: paulatz
www:   http://www-int.impmc.upmc.fr/~paulatto/
mail:  23-24/423 Boîte courrier 115, 4 place Jussieu 75252 Paris Cédex 05