[Q-e-developers] diagonalization with multithreads libs slow: comparison between Jade (cines) and Fermi (cineca)

Layla Martin-Samos lmartinsamos at gmail.com
Mon Oct 1 14:21:17 CEST 2012


Thank you FAbbio, I will try elpa and let you know.

cheers

Layla

2012/10/1 Layla Martin-Samos <lmartinsamos at gmail.com>

> Ciao Ivan, I explain myself very badly, first look at this:
>
> computer   mpi process     threads    ndiag   complex/gamma_only   time
> for diaghg    version   Libs
>
>
>  jade         32             1          1      complex
> (cdiaghg)                                  27.44 s      4.3.2     sequential
>  jade         32             1          1      complex
> (cdiaghg)                                 > 10 min     4.3.2     threads
>  jade         32             1          1      complex
> (cdiaghg)                                 > 10 min     5.0.1     threads
>
> it is exactly the same job, just the libs have changed.
>
> For BGQ I run with bg_size=128, threads 4. My concern is that diaghg is 3
> times slower than jade but with 4 times more mpi + threads. I was wondering
> if it is convenient to use threads in the diag or the libs are less
> efficient? boh!
>
> 2012/10/1 Ivan Girotto <igirotto at ictp.it>
>
>>  Hi Layla,
>>
>> I have never tried with 1thread as it's not recommended on BG/Q. At least
>> 2threads x MPI process.
>> On the other hand I have some doubt about the way you are running the
>> jobs? How big is the BG size? How many processes are you running per node
>> in each of the 2 cases?
>>
>> I have some doubt about the question itself too. You are saying that you
>> see a slow down comparing 4Threads Vs 1 Threads. But from the table below
>> you report only data with 4 threads for BG/Q.
>> Have you perhaps switched the headers of the two columns Threads and
>> Ndiag?
>>
>> It's expected that the SGI architecture based on Intel processors is
>> faster than BG/Q.
>>
>> Ivan
>>
>>
>> On 01/10/2012 13:17, Layla Martin-Samos wrote:
>>
>> Dear all, I have made some test calculations on Fermi and Jade, for a 107
>> atoms system, 70 Ry cutoff for wfc, 285 occupied bands and 1 kpoint. What
>> the results seems to show is that the diagonalization with multithreads lib
>> seems to considerably slowdown the diagonalization time (diaghd is called
>> 33 times on all the jobs and the final results are identical). The compiled
>> cineca version gives identical time and results than 5.0.1.  Note that jade
>> in sequential is faster than BGQ. I am continuing some other tests on jade,
>> unfortunatelly the runs stay a lot of time in the queue, the machine is
>> full and even for a 10 min job with 32 cores you wait more than 3 hours. As
>> attachement I put the two make.sys for jade.
>>
>>
>> omputer   mpi process     threads    ndiag   complex/gamma_only   time
>> for diaghg    version   Libs
>>
>>  bgq         128             4          1      complex (cdiaghg)
>>                        69.28 s      5.0.1     threads
>> bgq          128             4          1      complex
>> (cdiaghg)                                 69.14 s      4.3.2     threads
>>
>>  jade         32             1          1      complex
>> (cdiaghg)                                  27.44 s      4.3.2     sequential
>>  jade         32             1          1      complex
>> (cdiaghg)                                 > 10 min     4.3.2     threads
>>  jade         32             1          1      complex
>> (cdiaghg)                                 > 10 min     5.0.1     threads
>>
>>
>> bgq          128             4          4      complex
>> (cdiaghg)                                310.52 s      5.0.1    threads
>>
>> bgq          128             4          4      gamma
>> (rdiaghg)                                    73.87 s      5.0.1    threads
>> bgq          128             4          4      gamma
>> (rdiaghg)                                    73.71 s      4.3.2    threads
>>
>> bgq          128             4          1      gamma (rdiaghg)
>>                     CRASH 2 it     5.0.1    threads
>> bgq          128             4          1      gamma
>> (rdiaghg)                               CRASH 2 it     4.3.2    threads
>>
>>
>> did someone observe a similar behavior?
>>
>> cheers
>>
>> Layla
>>
>>
>>
>>
>>
>>
>> _______________________________________________
>> Q-e-developers mailing listQ-e-developers at qe-forge.orghttp://qe-forge.org/mailman/listinfo/q-e-developers
>>
>>
>> --
>>
>> Ivan Girotto - igirotto at ictp.it
>> High Performance Computing Specialist
>> Information & Communication Technology Section
>> The Abdus Salam - www.ictp.it
>> International Centre for Theoretical Physics
>> Strada Costiera, 11 - 34151 Trieste - IT
>> Tel +39.040.2240.484
>> Fax +39.040.2240.249
>>
>>
>> _______________________________________________
>> Q-e-developers mailing list
>> Q-e-developers at qe-forge.org
>> http://qe-forge.org/mailman/listinfo/q-e-developers
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/developers/attachments/20121001/8ce72bf3/attachment.html>


More information about the developers mailing list