[Pw_forum] PW.x homogeneous electric field berry phase calculation in trigonal cell
Louis Fry-Bouriaux
ellf at leeds.ac.uk
Tue Feb 14 02:45:05 CET 2017
Thanks Lorenzo I hope so too, I think the best references are Examples 4 and 10, I have this tendency to just go ahead once I get something working, need to work on that :P
Indeed I have reproduced almost exactly what you have said. What I can confirm when using bp_c_phase (no electric field):
- all gdir work, only gdir=3 has a notable improvement in performance.
- when gdir=3, up to 4 processors scaling is good, on 8 it is terrible it actually takes longer, WALL time is notably larger than CPU time.
- the call to 'CALL mp_sum(aux_g(:), intra_bgrp_comm )' is made when gdir != 3.
My current understanding is that mp_sum takes the trace of the 'aux_g' matrix, whereas for gdir=3 there is significantly less code that ends up building the matrix 'aux' which is finally used to build 'mat'. The matrix 'evc' represents the wavefunctions built using plane waves, but 'evc' is used in many files. Since bp_c_phase is executed last, 'evc' has already been built and is only read in this file. With this and comparing the output I notice that performance when gdir=3 is better for almost all routines.. I will continue debugging tomorrow on the 8 processor machine where the differences are much more noticeable.. Do you think I should contact Paolo Giannozzi directly to better understand what is going on here?
Thanks so much [??]
Louis
________________________________
From: pw_forum-bounces at pwscf.org <pw_forum-bounces at pwscf.org> on behalf of Lorenzo Paulatto <lorenzo.paulatto at impmc.upmc.fr>
Sent: 13 February 2017 13:04:22
To: PWSCF Forum
Subject: Re: [Pw_forum] PW.x homogeneous electric field berry phase calculation in trigonal cell
On Monday, February 13, 2017 11:43:08 AM CET Louis Fry-Bouriaux wrote:
> Finally when you were talking about the bottleneck, I suppose you were
> talking about the efield code, my impression so far is this is not a
> problem using 4 processors, I will also test using 8 and compare the time
> taken. I have no idea how fast it 'should' be with proper parallisation,
> assuming it is possible to parallelise.
When you increase the number of CPUs, you would expect the time to decreased
linearly, if over a certain number of CPUs it stops decreasing or if it
decreases slower than linear, it is a bottleneck. This will always happen
eventually, but with berry/lefield it happens much sooner.
Thank you for reporting back! I hope this information will be useful to future
users
--
Dr. Lorenzo Paulatto
IdR @ IMPMC -- CNRS & Université Paris 6
phone: +33 (0)1 442 79822 / skype: paulatz
www: http://www-int.impmc.upmc.fr/~paulatto/
mail: 23-24/423 Boîte courrier 115, 4 place Jussieu 75252 Paris Cédex 05
_______________________________________________
Pw_forum mailing list
Pw_forum at pwscf.org
http://pwscf.org/mailman/listinfo/pw_forum
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20170214/2d8ed32a/attachment.html>
More information about the users
mailing list