<html>

  <head>

    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

  </head>

  <body text="#000000" bgcolor="#FFFFFF">

    <p><font size="-1">ehm ... <br>

      </font></p>

    <p><font size="-1">I should have read the guide before answering,

        sorry Guido, <br>

      </font></p>

    <p><font size="-1">I surely would have been more helpfull <br>

      </font></p>

    <p><font size="-1">following the indications that the  cineca module

        qe/6.3_knl   prints out when it is loaded  and using the

        bind-cpu option of srun  <br>

      </font></p>

    <p><font size="-1">srun --bind-cpu=cores pw.x < pwin > pwout <br>

      </font></p>

    <p>the 6.3  works smoothly also for cpu-per-task=2  and

      tasks-per-node=68 <br>

    </p>

    <div class="moz-cite-prefix">Pietro <br>

    </div>

    <div class="moz-cite-prefix"><br>

    </div>

    <div class="moz-cite-prefix">On 31/01/19 14:41, Guido Fratesi wrote:<br>

    </div>

    <blockquote type="cite"

      cite="mid:a383ceac-5067-4ddf-e1e6-10fff0d39fdf@unimi.it">Dear

      Pietro, Paolo and Davide,

      <br>

      <br>

      thank you for your hints. Indeed by changing the number of CPUs

      the calculation *may* converge also with QE6.3. For example:

      <br>

      <br>

      2pools-x-34cpus-x-2omp (ie #MPIxOpenMP cores = #cpus)

      <br>

      2pools-x-8cpus-x-2omp

      <br>

      6pools-x-8cpus-x-2omp

      <br>

      <br>

      are OK, but

      <br>

      <br>

      2pools-x-68cpus-x-1omp (ie #MPIxOpenMP cores = #cpus)

      <br>

      <br>

      does not converge again, although I'm not asking for more tasks

      than cpus (see Pietro's comment). Also, KNL nodes in A2 should

      support hyperthreading (4x)

<a class="moz-txt-link-freetext" href="https://wiki.u-gov.it/confluence/display/SCAIUS/UG3.1%3A+MARCONI+UserGuide#UG3.1:MARCONIUserGuide-SystemArchitecture">https://wiki.u-gov.it/confluence/display/SCAIUS/UG3.1%3A+MARCONI+UserGuide#UG3.1:MARCONIUserGuide-SystemArchitecture</a>

      so I would not expect that asking for a number of threads that is

      twice the number of allocated cpu's would be a problem - nor it is

      for QE6.0 and for the inputs with the molecule/surface.

      <br>

      <br>

      I though this could be related to the size of the system since I

      had no problems with the heavier molecule/surface case; however,

      the problem is also present for larger, clean-Au(111), unit cells.

      <br>

      <br>

      I can now circumvent the issue, thank you. I'd also be curious to

      know what is the reason...

      <br>

      <br>

      Guido

      <br>

      <br>

      <br>

      <br>

      <br>

    </blockquote>

  </body>

</html>