[Q-e-developers] diagonalization with multithreads libs slow: comparison between Jade (cines) and Fermi (cineca)

Carlo Cavazzoni c.cavazzoni at cineca.it
Mon Oct 1 14:32:56 CEST 2012


Il 01/10/2012 14:16, Layla Martin-Samos ha scritto:
> Ciao Ivan, I explain myself very badly, first look at this:
>
> computer   mpi process     threads    ndiag   complex/gamma_only time 
> for diaghg    version   Libs
>
>  jade         32             1          1      complex 
> (cdiaghg)                                  27.44 s 4.3.2     sequential
>  jade         32             1          1      complex 
> (cdiaghg)                                 > 10 min 4.3.2     threads
>  jade         32             1          1      complex 
> (cdiaghg)                                 > 10 min 5.0.1     threads

This is no sense, something really weird is happening. If Jade is intel 
based, then you
have to link mkl multi-threads, which, as far as I know (tested on our 
cluster) are
very good, and no performance lost (of this amount)
have ever been reported or registered between single thread and 
multithreads.
Print out the environment and, being an SGI, check affinity and CPU sets.


carlo

>
> it is exactly the same job, just the libs have changed.
>
> For BGQ I run with bg_size=128, threads 4. My concern is that diaghg 
> is 3 times slower than jade but with 4 times more mpi + threads. I was 
> wondering if it is convenient to use threads in the diag or the libs 
> are less efficient? boh!
>
> 2012/10/1 Ivan Girotto <igirotto at ictp.it <mailto:igirotto at ictp.it>>
>
>     Hi Layla,
>
>     I have never tried with 1thread as it's not recommended on BG/Q.
>     At least 2threads x MPI process.
>     On the other hand I have some doubt about the way you are running
>     the jobs? How big is the BG size? How many processes are you
>     running per node in each of the 2 cases?
>
>     I have some doubt about the question itself too. You are saying
>     that you see a slow down comparing 4Threads Vs 1 Threads. But from
>     the table below you report only data with 4 threads for BG/Q.
>     Have you perhaps switched the headers of the two columns Threads
>     and Ndiag?
>
>     It's expected that the SGI architecture based on Intel processors
>     is faster than BG/Q.
>
>     Ivan
>
>
>     On 01/10/2012 13:17, Layla Martin-Samos wrote:
>>     Dear all, I have made some test calculations on Fermi and Jade,
>>     for a 107 atoms system, 70 Ry cutoff for wfc, 285 occupied bands
>>     and 1 kpoint. What the results seems to show is that the
>>     diagonalization with multithreads lib seems to considerably
>>     slowdown the diagonalization time (diaghd is called 33 times on
>>     all the jobs and the final results are identical). The compiled
>>     cineca version gives identical time and results than 5.0.1.  Note
>>     that jade in sequential is faster than BGQ. I am continuing some
>>     other tests on jade, unfortunatelly the runs stay a lot of time
>>     in the queue, the machine is full and even for a 10 min job with
>>     32 cores you wait more than 3 hours. As attachement I put the two
>>     make.sys for jade.
>>
>>
>>     omputer   mpi process     threads    ndiag complex/gamma_only  
>>     time for diaghg    version   Libs
>>
>>      bgq         128             4          1      complex
>>     (cdiaghg)                                 69.28 s 5.0.1     threads
>>     bgq          128             4          1      complex
>>     (cdiaghg)                                 69.14 s 4.3.2     threads
>>
>>      jade         32             1          1      complex
>>     (cdiaghg)                                  27.44 s      4.3.2    
>>     sequential
>>      jade         32             1          1      complex
>>     (cdiaghg)                                 > 10 min     4.3.2    
>>     threads
>>      jade         32             1          1      complex
>>     (cdiaghg)                                 > 10 min     5.0.1    
>>     threads
>>
>>
>>     bgq          128             4          4      complex
>>     (cdiaghg)                                310.52 s 5.0.1    threads
>>
>>     bgq          128             4          4      gamma
>>     (rdiaghg)                                    73.87 s     
>>     5.0.1    threads
>>     bgq          128             4          4      gamma
>>     (rdiaghg)                                    73.71 s     
>>     4.3.2    threads
>>
>>     bgq          128             4          1      gamma
>>     (rdiaghg)                               CRASH 2 it 5.0.1    threads
>>     bgq          128             4          1      gamma
>>     (rdiaghg)                               CRASH 2 it 4.3.2    threads
>>
>>
>>     did someone observe a similar behavior?
>>
>>     cheers
>>
>>     Layla
>>
>>
>>
>>
>>
>>
>>     _______________________________________________
>>     Q-e-developers mailing list
>>     Q-e-developers at qe-forge.org  <mailto:Q-e-developers at qe-forge.org>
>>     http://qe-forge.org/mailman/listinfo/q-e-developers
>
>     -- 
>
>     Ivan Girotto -igirotto at ictp.it  <mailto:igirotto at ictp.it>
>     High Performance Computing Specialist
>     Information & Communication Technology Section
>     The Abdus Salam -www.ictp.it  <http://www.ictp.it>
>     International Centre for Theoretical Physics
>     Strada Costiera, 11 - 34151 Trieste - IT
>     Tel+39.040.2240.484  <tel:%2B39.040.2240.484>
>     Fax+39.040.2240.249  <tel:%2B39.040.2240.249>
>
>
>     _______________________________________________
>     Q-e-developers mailing list
>     Q-e-developers at qe-forge.org <mailto:Q-e-developers at qe-forge.org>
>     http://qe-forge.org/mailman/listinfo/q-e-developers
>
>
>
>
> _______________________________________________
> Q-e-developers mailing list
> Q-e-developers at qe-forge.org
> http://qe-forge.org/mailman/listinfo/q-e-developers


-- 
Ph.D. Carlo Cavazzoni
SuperComputing Applications and Innovation Department
CINECA - Via Magnanelli 6/3, 40033 Casalecchio di Reno (Bologna)
Tel: +39 051 6171411  Fax: +39 051 6132198
www.cineca.it

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/developers/attachments/20121001/39ed2663/attachment.html>


More information about the developers mailing list