<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<div class="moz-cite-prefix">Dear Pascal,<br>
since you have more than one k-point, you could try to have each
pool within one node, so that only inter-pool communication occurs
over infiniband; for instance if you have 4 k-points you may try
to use 4 pools on 4 nodes (or possibly 2 pools on 2 nodes, or 4
pools on 2 nodes, etc).<br>
This kind of parallelization should scale pretty well, if your
system allows it (i.e. you have enough kpoints and your system
fits in RAM). Then, you can try to optimize the parallelization
using the other parallelization options.<br>
<br>
If you manage to do some scaling tests using the pools, could you
please report your results on this mailing list?<br>
<br>
Thanks, and best regards,<br>
<br>
Giovanni Pizzi<br>
<br>
<br>
On 02/05/2013 10:12 PM, pascal boulet wrote:<br>
</div>
<blockquote cite="mid:511175B3.5080508@univ-amu.fr" type="cite">Dear
all,
<br>
<br>
I have a basic question about parallelism and scaling factor.
<br>
<br>
First, I am running calculations on a cubic system with 58 atoms
<br>
(alat=19.5652 a.u.), 540 electrons (324 KS states) and few
k-points
<br>
(4x4x4 grid=4 k-points), on 32 cores (4 nodes) but I can submit on
many
<br>
more.
<br>
<br>
I guess the best thing to do is to parallelize the calculation on
the
<br>
bands but maybe also on the FFTs. We have an infiniband
interconnection
<br>
network between the nodes.
<br>
<br>
What would you suggest as values for image/pools/ntg/bands ?
<br>
<br>
I have made a SCF test calculation on 16 and 32 cores. For the SCF
cycle
<br>
(13 steps) I get the following timing:
<br>
For 16 cores: total cpu time spent up to now is 22362.4 secs
<br>
For 32 cores: total cpu time spent up to now is 17932.6 secs
<br>
<br>
The speedup is "only" 25%. I would have expected a better speedup
for
<br>
such a small number of cores. Am i wrong? What is your experience?
<br>
<br>
(For additional information, if helpful: QE 5.0.1 has been
compiled with
<br>
openMPI, intel 12.1 and FFTW 3.2.2.)
<br>
<br>
thank you for your answers.
<br>
<br>
Regards
<br>
Pascal
<br>
<br>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
<pre wrap="">_______________________________________________
Pw_forum mailing list
<a class="moz-txt-link-abbreviated" href="mailto:Pw_forum@pwscf.org">Pw_forum@pwscf.org</a>
<a class="moz-txt-link-freetext" href="http://pwscf.org/mailman/listinfo/pw_forum">http://pwscf.org/mailman/listinfo/pw_forum</a></pre>
</blockquote>
<br>
<br>
<pre class="moz-signature" cols="72">--
Giovanni Pizzi
Post-doctoral Research Scientist
EPFL STI IMX THEOS
MXC 340 (Bâtiment MXC)
Station 12
CH-1015 Lausanne (Switzerland)
Phone: +41 21 69 31124</pre>
</body>
</html>