Dear Carlo, i use mkl in multithread and sequential. Talking with Ivan, ideed is possible an affinity issue.<br><br>thank you very much<br><br>Layla<br><br><div class="gmail_quote">2012/10/1 Carlo Cavazzoni <span dir="ltr"><<a href="mailto:c.cavazzoni@cineca.it" target="_blank">c.cavazzoni@cineca.it</a>></span><br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

  
  <div bgcolor="#FFFFFF" text="#000000">

    <div>Il 01/10/2012 14:16, Layla Martin-Samos

      ha scritto:<br>

    </div><div class="im">

    <blockquote type="cite">Ciao Ivan, I explain myself very badly, first look at

      this:<br>

      <br>

      computer   mpi process     threads    ndiag   complex/gamma_only  

      time for diaghg    version   Libs<br>

      <div><br>

         jade         32             1          1      complex

        (cdiaghg)                                  27.44 s     

        4.3.2     sequential<br>

         jade         32             1          1      complex

        (cdiaghg)                                 > 10 min    

        4.3.2     threads<br>

         jade         32             1          1      complex

        (cdiaghg)                                 > 10 min    

        5.0.1     threads<br>

      </div>

    </blockquote>

    <br></div>

    This is no sense, something really weird is happening. If Jade is

    intel based, then you<br>

    have to link mkl multi-threads, which, as far as I know (tested on

    our cluster) are<br>

    very good, and no performance lost (of this amount)<br>

    have ever been reported or registered between single thread and

    multithreads.<br>

    Print out the environment and, being an SGI, check affinity and CPU

    sets.<br>

    <br>

    <br>

    carlo<div><div class="h5"><br>

    <br>

    <blockquote type="cite">

      <div>

      </div>

      <br>

      it is exactly the same job, just the libs have changed.<br>

      <br>

      For BGQ I run with bg_size=128, threads 4. My concern is that

      diaghg is 3 times slower than jade but with 4 times more mpi +

      threads. I was wondering if it is convenient to use threads in the

      diag or the libs are less efficient? boh! <br>

      <br>

      <div class="gmail_quote">2012/10/1 Ivan Girotto <span dir="ltr"><<a href="mailto:igirotto@ictp.it" target="_blank">igirotto@ictp.it</a>></span><br>

        <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

          <div bgcolor="#FFFFFF" text="#000000"> Hi Layla,<br>

            <br>

            I have never tried with 1thread as it's not recommended on

            BG/Q. At least 2threads x MPI process. <br>

            On the other hand I have some doubt about the way you are

            running the jobs? How big is the BG size? How many processes

            are you running per node in each of the 2 cases?<br>

            <br>

            I have some doubt about the question itself too. You are

            saying that you see a slow down comparing 4Threads Vs 1

            Threads. But from the table below you report only data with

            4 threads for BG/Q.<br>

            Have you perhaps switched the headers of the two columns

            Threads and Ndiag? <br>

            <br>

            It's expected that the SGI architecture based on Intel

            processors is faster than BG/Q. <br>

            <br>

            Ivan

            <div>

              <div><br>

                <br>

                On 01/10/2012 13:17, Layla Martin-Samos wrote: </div>

            </div>

            <blockquote type="cite">

              <div>

                <div>Dear all, I have made some test

                  calculations on Fermi and Jade, for a 107 atoms

                  system, 70 Ry cutoff for wfc, 285 occupied bands and 1

                  kpoint. What the results seems to show is that the

                  diagonalization with multithreads lib seems to

                  considerably slowdown the diagonalization time (diaghd

                  is called 33 times on all the jobs and the final

                  results are identical). The compiled cineca version

                  gives identical time and results than 5.0.1.  Note

                  that jade in sequential is faster than BGQ. I am

                  continuing some other tests on jade, unfortunatelly

                  the runs stay a lot of time in the queue, the machine

                  is full and even for a 10 min job with 32 cores you

                  wait more than 3 hours. As attachement I put the two

                  make.sys for jade. <br>

                  <br>

                  <br>

                  omputer   mpi process     threads    ndiag  

                  complex/gamma_only   time for diaghg    version   Libs<br>

                  <br>

                   bgq         128             4          1      complex

                  (cdiaghg)                                 69.28 s     

                  5.0.1     threads<br>

                  bgq          128             4          1      complex

                  (cdiaghg)                                 69.14 s     

                  4.3.2     threads<br>

                  <br>

                   jade         32             1          1      complex

                  (cdiaghg)                                  27.44

                  s      4.3.2     sequential<br>

                   jade         32             1          1      complex

                  (cdiaghg)                                 > 10

                  min     4.3.2     threads<br>

                   jade         32             1          1      complex

                  (cdiaghg)                                 > 10

                  min     5.0.1     threads<br>

                  <br>

                  <br>

                  bgq          128             4          4      complex

                  (cdiaghg)                                310.52 s     

                  5.0.1    threads<br>

                  <br>

                  bgq          128             4          4      gamma

                  (rdiaghg)                                    73.87

                  s      5.0.1    threads<br>

                  bgq          128             4          4      gamma

                  (rdiaghg)                                    73.71

                  s      4.3.2    threads<br>

                  <br>

                  bgq          128             4          1      gamma

                  (rdiaghg)                               CRASH 2 it    

                  5.0.1    threads<br>

                  bgq          128             4          1      gamma

                  (rdiaghg)                               CRASH 2 it    

                  4.3.2    threads<br>

                  <br>

                  <br>

                  did someone observe a similar behavior? <br>

                  <br>

                  cheers <br>

                  <br>

                  Layla<br>

                  <br>

                  <br>

                  <br>

                  <br>

                  <br>

                  <fieldset></fieldset>

                  <br>

                </div>

              </div>

              <pre>_______________________________________________

Q-e-developers mailing list

<a href="mailto:Q-e-developers@qe-forge.org" target="_blank">Q-e-developers@qe-forge.org</a>

<a href="http://qe-forge.org/mailman/listinfo/q-e-developers" target="_blank">http://qe-forge.org/mailman/listinfo/q-e-developers</a><span><font color="#888888">

</font></span></pre>

              <span><font color="#888888"> </font></span></blockquote>

            <span><font color="#888888"> <br>

                <pre cols="72">-- 


Ivan Girotto - <a href="mailto:igirotto@ictp.it" target="_blank">igirotto@ictp.it</a>

High Performance Computing Specialist

Information & Communication Technology Section

The Abdus Salam - <a href="http://www.ictp.it" target="_blank">www.ictp.it</a>

International Centre for Theoretical Physics

Strada Costiera, 11 - 34151 Trieste - IT

Tel <a href="tel:%2B39.040.2240.484" value="+390402240484" target="_blank">+39.040.2240.484</a>

Fax <a href="tel:%2B39.040.2240.249" value="+390402240249" target="_blank">+39.040.2240.249</a>

</pre>

              </font></span></div>

          <br>

          _______________________________________________<br>

          Q-e-developers mailing list<br>

          <a href="mailto:Q-e-developers@qe-forge.org" target="_blank">Q-e-developers@qe-forge.org</a><br>

          <a href="http://qe-forge.org/mailman/listinfo/q-e-developers" target="_blank">http://qe-forge.org/mailman/listinfo/q-e-developers</a><br>

          <br>

        </blockquote>

      </div>

      <br>

      <br>

      <fieldset></fieldset>

      <br>

      <pre>_______________________________________________

Q-e-developers mailing list

<a href="mailto:Q-e-developers@qe-forge.org" target="_blank">Q-e-developers@qe-forge.org</a>

<a href="http://qe-forge.org/mailman/listinfo/q-e-developers" target="_blank">http://qe-forge.org/mailman/listinfo/q-e-developers</a>

</pre>

    </blockquote>

    <br>

    <br>

    </div></div><span class="HOEnZb"><font color="#888888"><pre cols="72">-- 

Ph.D. Carlo Cavazzoni

SuperComputing Applications and Innovation Department

CINECA - Via Magnanelli 6/3, 40033 Casalecchio di Reno (Bologna)

Tel: <a href="tel:%2B39%20051%206171411" value="+390516171411" target="_blank">+39 051 6171411</a>  Fax: <a href="tel:%2B39%20051%206132198" value="+390516132198" target="_blank">+39 051 6132198</a>

<a href="http://www.cineca.it" target="_blank">www.cineca.it</a></pre>

  </font></span></div>


<br>_______________________________________________<br>

Q-e-developers mailing list<br>

<a href="mailto:Q-e-developers@qe-forge.org">Q-e-developers@qe-forge.org</a><br>

<a href="http://qe-forge.org/mailman/listinfo/q-e-developers" target="_blank">http://qe-forge.org/mailman/listinfo/q-e-developers</a><br>

<br></blockquote></div><br>