<html>

  <head>

    <meta content="text/html; charset=ISO-8859-1"

      http-equiv="Content-Type">

  </head>

  <body text="#000000" bgcolor="#FFFFFF">

    <div class="moz-cite-prefix">Dear Pascal,<br>

      since you have more than one k-point, you could try to have each

      pool within one node, so that only inter-pool communication occurs

      over infiniband; for instance if you have 4 k-points you may try

      to use 4 pools on 4 nodes (or possibly 2 pools on 2 nodes, or 4

      pools on 2 nodes, etc).<br>

      This kind of parallelization should scale pretty well, if your

      system allows it (i.e. you have  enough kpoints and your system

      fits in RAM). Then, you can try to optimize the parallelization

      using the other parallelization options.<br>

      <br>

      If you manage to do some scaling tests using the pools, could you

      please report your results on this mailing list?<br>

      <br>

      Thanks, and best regards,<br>

      <br>

      Giovanni Pizzi<br>

      <br>

      <br>

      On 02/05/2013 10:12 PM, pascal boulet wrote:<br>

    </div>

    <blockquote cite="mid:511175B3.5080508@univ-amu.fr" type="cite">Dear

      all,

      <br>

      <br>

      I have a basic question about parallelism and scaling factor.

      <br>

      <br>

      First, I am running calculations on a cubic system with 58 atoms

      <br>

      (alat=19.5652  a.u.), 540 electrons (324 KS states) and few

      k-points

      <br>

      (4x4x4 grid=4 k-points), on 32 cores (4 nodes) but I can submit on

      many

      <br>

      more.

      <br>

      <br>

      I guess the best thing to do is to parallelize the calculation on

      the

      <br>

      bands but maybe also on the FFTs. We have an infiniband

      interconnection

      <br>

      network between the nodes.

      <br>

      <br>

      What would you suggest as values for image/pools/ntg/bands ?

      <br>

      <br>

      I have made a SCF test calculation on 16 and 32 cores. For the SCF

      cycle

      <br>

      (13 steps) I get the following timing:

      <br>

      For 16 cores: total cpu time spent up to now is    22362.4 secs

      <br>

      For 32 cores: total cpu time spent up to now is    17932.6 secs

      <br>

      <br>

      The speedup is "only" 25%. I would have expected a better speedup

      for

      <br>

      such a small number of cores. Am i wrong? What is your experience?

      <br>

      <br>

      (For additional information, if helpful: QE 5.0.1 has been

      compiled with

      <br>

      openMPI, intel 12.1 and FFTW 3.2.2.)

      <br>

      <br>

      thank you for your answers.

      <br>

      <br>

      Regards

      <br>

      Pascal

      <br>

      <br>

      <br>

      <fieldset class="mimeAttachmentHeader"></fieldset>

      <br>

      <pre wrap="">_______________________________________________

Pw_forum mailing list

<a class="moz-txt-link-abbreviated" href="mailto:Pw_forum@pwscf.org">Pw_forum@pwscf.org</a>

<a class="moz-txt-link-freetext" href="http://pwscf.org/mailman/listinfo/pw_forum">http://pwscf.org/mailman/listinfo/pw_forum</a></pre>

    </blockquote>

    <br>

    <br>

    <pre class="moz-signature" cols="72">-- 

Giovanni Pizzi

Post-doctoral Research Scientist

EPFL STI IMX THEOS

MXC 340 (Bâtiment MXC)

Station 12

CH-1015 Lausanne (Switzerland)

Phone: +41 21 69 31124</pre>

  </body>

</html>