[Q-e-developers] diagonalization with multithreads libs slow: comparison between Jade (cines) and Fermi (cineca)

Layla Martin-Samos lmartinsamos at gmail.com
Mon Oct 1 14:41:07 CEST 2012


Dear Carlo, i use mkl in multithread and sequential. Talking with Ivan,
ideed is possible an affinity issue.

thank you very much

Layla

2012/10/1 Carlo Cavazzoni <c.cavazzoni at cineca.it>

>  Il 01/10/2012 14:16, Layla Martin-Samos ha scritto:
>
> Ciao Ivan, I explain myself very badly, first look at this:
>
> computer   mpi process     threads    ndiag   complex/gamma_only   time
> for diaghg    version   Libs
>
>  jade         32             1          1      complex
> (cdiaghg)                                  27.44 s      4.3.2     sequential
>  jade         32             1          1      complex
> (cdiaghg)                                 > 10 min     4.3.2     threads
>  jade         32             1          1      complex
> (cdiaghg)                                 > 10 min     5.0.1     threads
>
>
> This is no sense, something really weird is happening. If Jade is intel
> based, then you
> have to link mkl multi-threads, which, as far as I know (tested on our
> cluster) are
> very good, and no performance lost (of this amount)
> have ever been reported or registered between single thread and
> multithreads.
> Print out the environment and, being an SGI, check affinity and CPU sets.
>
>
> carlo
>
>
>
> it is exactly the same job, just the libs have changed.
>
> For BGQ I run with bg_size=128, threads 4. My concern is that diaghg is 3
> times slower than jade but with 4 times more mpi + threads. I was wondering
> if it is convenient to use threads in the diag or the libs are less
> efficient? boh!
>
> 2012/10/1 Ivan Girotto <igirotto at ictp.it>
>
>>  Hi Layla,
>>
>> I have never tried with 1thread as it's not recommended on BG/Q. At least
>> 2threads x MPI process.
>> On the other hand I have some doubt about the way you are running the
>> jobs? How big is the BG size? How many processes are you running per node
>> in each of the 2 cases?
>>
>> I have some doubt about the question itself too. You are saying that you
>> see a slow down comparing 4Threads Vs 1 Threads. But from the table below
>> you report only data with 4 threads for BG/Q.
>> Have you perhaps switched the headers of the two columns Threads and
>> Ndiag?
>>
>> It's expected that the SGI architecture based on Intel processors is
>> faster than BG/Q.
>>
>> Ivan
>>
>>
>> On 01/10/2012 13:17, Layla Martin-Samos wrote:
>>
>>  Dear all, I have made some test calculations on Fermi and Jade, for a
>> 107 atoms system, 70 Ry cutoff for wfc, 285 occupied bands and 1 kpoint.
>> What the results seems to show is that the diagonalization with
>> multithreads lib seems to considerably slowdown the diagonalization time
>> (diaghd is called 33 times on all the jobs and the final results are
>> identical). The compiled cineca version gives identical time and results
>> than 5.0.1.  Note that jade in sequential is faster than BGQ. I am
>> continuing some other tests on jade, unfortunatelly the runs stay a lot of
>> time in the queue, the machine is full and even for a 10 min job with 32
>> cores you wait more than 3 hours. As attachement I put the two make.sys for
>> jade.
>>
>>
>> omputer   mpi process     threads    ndiag   complex/gamma_only   time
>> for diaghg    version   Libs
>>
>>  bgq         128             4          1      complex (cdiaghg)
>>                        69.28 s      5.0.1     threads
>> bgq          128             4          1      complex
>> (cdiaghg)                                 69.14 s      4.3.2     threads
>>
>>  jade         32             1          1      complex
>> (cdiaghg)                                  27.44 s      4.3.2     sequential
>>  jade         32             1          1      complex
>> (cdiaghg)                                 > 10 min     4.3.2     threads
>>  jade         32             1          1      complex
>> (cdiaghg)                                 > 10 min     5.0.1     threads
>>
>>
>> bgq          128             4          4      complex
>> (cdiaghg)                                310.52 s      5.0.1    threads
>>
>> bgq          128             4          4      gamma
>> (rdiaghg)                                    73.87 s      5.0.1    threads
>> bgq          128             4          4      gamma
>> (rdiaghg)                                    73.71 s      4.3.2    threads
>>
>> bgq          128             4          1      gamma (rdiaghg)
>>                     CRASH 2 it     5.0.1    threads
>> bgq          128             4          1      gamma
>> (rdiaghg)                               CRASH 2 it     4.3.2    threads
>>
>>
>> did someone observe a similar behavior?
>>
>> cheers
>>
>> Layla
>>
>>
>>
>>
>>
>>
>>  _______________________________________________
>> Q-e-developers mailing listQ-e-developers at qe-forge.orghttp://qe-forge.org/mailman/listinfo/q-e-developers
>>
>>
>> --
>>
>> Ivan Girotto - igirotto at ictp.it
>> High Performance Computing Specialist
>> Information & Communication Technology Section
>> The Abdus Salam - www.ictp.it
>> International Centre for Theoretical Physics
>> Strada Costiera, 11 - 34151 Trieste - IT
>> Tel +39.040.2240.484
>> Fax +39.040.2240.249
>>
>>
>> _______________________________________________
>> Q-e-developers mailing list
>> Q-e-developers at qe-forge.org
>> http://qe-forge.org/mailman/listinfo/q-e-developers
>>
>>
>
>
> _______________________________________________
> Q-e-developers mailing listQ-e-developers at qe-forge.orghttp://qe-forge.org/mailman/listinfo/q-e-developers
>
>
>
> --
> Ph.D. Carlo Cavazzoni
> SuperComputing Applications and Innovation Department
> CINECA - Via Magnanelli 6/3, 40033 Casalecchio di Reno (Bologna)
> Tel: +39 051 6171411  Fax: +39 051 6132198www.cineca.it
>
>
> _______________________________________________
> Q-e-developers mailing list
> Q-e-developers at qe-forge.org
> http://qe-forge.org/mailman/listinfo/q-e-developers
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/developers/attachments/20121001/14c05ac0/attachment.html>


More information about the developers mailing list