<html>

  <head>

    <meta content="text/html; charset=windows-1252"

      http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    <div class="moz-cite-prefix">Dear Sergio,<br>

      <br>

      we are obviously addressing al that issues on different

      architectures with different vendors,<br>

      and here it come the point, architectures are not converging!<br>

      As you know there are two main basic designs Homogeneous and

      Heterogeneous (a.k.a. accelerated),<br>

      with some, like Intel, that oscillate between both (KNC 

      Heterogeneous, KNL Homogeneous, <br>

      and recently announced Stratix X FPGA Heterogeneous again).<br>

      Is not that easy to have a code coping with all of them in an

      effective way, especially because<br>

      some of the best tools for new architectures are not standard

      (CUDA) and this is a real pity,<br>

      and make me complain with Nvidia all the time for them not

      supporting a standard paradigm<br>

      (yes I know, there is OpenACC, new OpenMP feature, OpenCL ... but

      CUDA remains by far more effective),<br>

      this is a sort of disruption for community of developers like

      ourself.<br>

      <br>

      Nevertheless to reduce this complexity we recently encapsulate the

      two main computational kernels<br>

      (parallel FFT and Linear Algebra) into self contained libraries

      (FFTXlib and LAXlib) including a small<br>

      app (please read README files included in the two library) that

      allow one to experiment and<br>

      best tune all the parameters for parallelization, vectorization,

      tasking etc..).<br>

      <br>

      To play with the two libraries you need to know very little about

      the physics of the QE,<br>

      and are the ideal for persons like you that need to look into

      optimization stuff.<br>

      In particular any improvements in these two libraries are

      immediately transferred into the QE<br>

      main codes (and other as well).<br>

      <br>

      If you want to know more about our next developments, we are

      working with<br>

      non blocking MPI collectives and task based parallelism to try to

      overlap<br>

      communications and computations within the FFT.<br>

      Most recent (not production) advancements in FFT lib could be

      found at:<br>

      <pre wrap=""><a class="moz-txt-link-freetext" href="https://github.com/fabioaffinito/FFTXlib">https://github.com/fabioaffinito/FFTXlib</a></pre>

      <br>

      <br>

      Another interesting exercise could be to review the LAXlib

      following<br>

      closely the advancement in dense linear algorithms promoted by

      Dongarra et all<br>

<a class="moz-txt-link-freetext" href="http://insidehpc.com/2016/10/jack-dongarra-presents-adaptive-linear-solvers-and-eigensolvers/">http://insidehpc.com/2016/10/jack-dongarra-presents-adaptive-linear-solvers-and-eigensolvers/</a><br>

      <br>

      From the point of view of the paradigms we are supporting open

      initiatives,<br>

      especially in close collaboration with BSC and different

      standardization committees (like OpenMP),<br>

      or the recently announced effort promoted by AMD to open source

      software and drivers<br>

      for heterogeneous architectures:

      <a class="moz-txt-link-freetext" href="https://radeonopencompute.github.io/">https://radeonopencompute.github.io/</a><br>

      <br>

      <br>

      best,<br>

      carlo<br>

      <br>

      <br>

      Il 18/10/2016 16:14, Sérgio Caldas ha scritto:<br>

    </div>

    <blockquote

      cite="mid:8F7362D8-9CCA-409A-8837-F611DFC76180@gmail.com"

      type="cite">

      <meta http-equiv="Content-Type" content="text/html;

        charset=windows-1252">

      <div class=""><font class="" size="3">Hi!</font></div>

      <div class=""><font class="" size="3"><br class="">

        </font></div>

      <div class=""><font class="" size="3">I'm Sérgio Caldas, an MSc

          student in Informatics Engineering at University of Minho,

          Braga, Portugal. <span class="">The key area of specialisation

            during my master courses were on parallel computing, with a

            strong focus on efficient & performance engineering on

            heterogeneous systems. For my master thesis the theme

            applies these competences to computational physics, where

            I’m supposed to help a senior physics researcher to tune his

            work on the determination of electronic and optical

            properties of materials, using Quantum Espresso tool in our

            departamental cluster. This cluster has nodes with several

            generations of dual multicore Xeons and some nodes with Xeon

            Phi (both KNC and KNL) and GPUs (both Fermi and Kepler, and

            soon Pascal). </span></font></div>

      <div class=""><span style="color: rgb(131, 17, 0); font-size:

          12pt; font-family: Calibri, Arial, Helvetica, sans-serif;"

          class=""><br class="">

        </span></div>

      <div class=""><font class="" size="3">I have some queries on the

          QE, namely how far QE development has reached in these areas

          (vectorisation, data/task parallelism on both

          shared/distributed memory, data locality). </font></div>

      <div class=""><font class="" size="3"><br class="">

        </font></div>

      <div class=""><font class="" size="3">For example:<br class="">

          <font class=""> - QE<span class=""> is already exploring

              vector operations (AVX/AVX-2 or AVX-512)?</span></font></font></div>

      <div class=""><font class="" size="3"><font class=""> - t</font><span

            class="">he tool is ready for multicore / many-core devices?</span></font></div>

      <div class=""><font class="" size="3"> - how is the scheduling

          between multicore-devices and the accelerator  devices, such

          that both type of devices are simultaneously used?</font></div>

      <div class=""><font class="" size="3"> - for distributed memory,

          the tool is already taking advantage of low-latency

          interconnection topologies, such as Myrinet or Infiniband?</font></div>

      <div class=""><font class="" size="3"> - how can I have access to

          beta versions where this advanced capabilities are being

          explored?</font></div>

      <div class=""><font class="" size="3"> - do you have suggestions

          of areas that still need to be improved, so that I can address

          those areas and improve both the quality of my work and the

          overall QE performance?</font></div>

      <div class=""><font class="" size="3"><br class="">

        </font></div>

      <div class=""><font class="" size="3"><font class="">I would also

            be grateful if you could suggest documentation (preferably

            papers) to get some of these answers or any other

            documentation to complement my </font><font class="">knowledge</font><span

            class=""> on QE.</span></font></div>

      <div class=""><span class=""><font class="" size="3"><br class="">

          </font></span></div>

      <div class=""><span class=""><font class="" size="3"><font

              class="">Thanking you in advance, yours s</font><span

              class="">incerely</span></font></span></div>

      <div class=""><font class="" size="3">Sergio Caldas</font></div>

      <br>

      <fieldset class="mimeAttachmentHeader"></fieldset>

      <br>

      <pre wrap="">_______________________________________________

Q-e-developers mailing list

<a class="moz-txt-link-abbreviated" href="mailto:Q-e-developers@qe-forge.org">Q-e-developers@qe-forge.org</a>

<a class="moz-txt-link-freetext" href="http://qe-forge.org/mailman/listinfo/q-e-developers">http://qe-forge.org/mailman/listinfo/q-e-developers</a>

</pre>

    </blockquote>

    <br>

    <p><br>

    </p>

    <pre class="moz-signature" cols="72">-- 

Ph.D. Carlo Cavazzoni

SuperComputing Applications and Innovation Department

CINECA - Via Magnanelli 6/3, 40033 Casalecchio di Reno (Bologna)

Tel: +39 051 6171411  Fax: +39 051 6132198

<a class="moz-txt-link-abbreviated" href="http://www.cineca.it">www.cineca.it</a></pre>

  </body>

</html>