[Q-e-developers] diagonalization with multithreads libs slow: comparison between Jade (cines) and Fermi (cineca)

Ivan Girotto igirotto at ictp.it
Mon Oct 1 14:03:05 CEST 2012


Hi Layla,

I have never tried with 1thread as it's not recommended on BG/Q. At 
least 2threads x MPI process.
On the other hand I have some doubt about the way you are running the 
jobs? How big is the BG size? How many processes are you running per 
node in each of the 2 cases?

I have some doubt about the question itself too. You are saying that you 
see a slow down comparing 4Threads Vs 1 Threads. But from the table 
below you report only data with 4 threads for BG/Q.
Have you perhaps switched the headers of the two columns Threads and Ndiag?

It's expected that the SGI architecture based on Intel processors is 
faster than BG/Q.

Ivan

On 01/10/2012 13:17, Layla Martin-Samos wrote:
> Dear all, I have made some test calculations on Fermi and Jade, for a 
> 107 atoms system, 70 Ry cutoff for wfc, 285 occupied bands and 1 
> kpoint. What the results seems to show is that the diagonalization 
> with multithreads lib seems to considerably slowdown the 
> diagonalization time (diaghd is called 33 times on all the jobs and 
> the final results are identical). The compiled cineca version gives 
> identical time and results than 5.0.1.  Note that jade in sequential 
> is faster than BGQ. I am continuing some other tests on jade, 
> unfortunatelly the runs stay a lot of time in the queue, the machine 
> is full and even for a 10 min job with 32 cores you wait more than 3 
> hours. As attachement I put the two make.sys for jade.
>
>
> omputer   mpi process     threads    ndiag   complex/gamma_only   time 
> for diaghg    version   Libs
>
>  bgq         128             4          1      complex (cdiaghg)      
>                            69.28 s      5.0.1     threads
> bgq          128             4          1      complex 
> (cdiaghg)                                 69.14 s      4.3.2     threads
>
>  jade         32             1          1      complex 
> (cdiaghg)                                  27.44 s      4.3.2     
> sequential
>  jade         32             1          1      complex (cdiaghg) > 10 
> min     4.3.2     threads
>  jade         32             1          1      complex (cdiaghg) > 10 
> min     5.0.1     threads
>
>
> bgq          128             4          4      complex 
> (cdiaghg)                                310.52 s      5.0.1    threads
>
> bgq          128             4          4      gamma 
> (rdiaghg)                                    73.87 s      5.0.1    threads
> bgq          128             4          4      gamma 
> (rdiaghg)                                    73.71 s      4.3.2    threads
>
> bgq          128             4          1      gamma (rdiaghg)         
>                       CRASH 2 it     5.0.1    threads
> bgq          128             4          1      gamma 
> (rdiaghg)                               CRASH 2 it     4.3.2    threads
>
>
> did someone observe a similar behavior?
>
> cheers
>
> Layla
>
>
>
>
>
>
> _______________________________________________
> Q-e-developers mailing list
> Q-e-developers at qe-forge.org
> http://qe-forge.org/mailman/listinfo/q-e-developers

-- 

Ivan Girotto - igirotto at ictp.it
High Performance Computing Specialist
Information&  Communication Technology Section
The Abdus Salam - www.ictp.it
International Centre for Theoretical Physics
Strada Costiera, 11 - 34151 Trieste - IT
Tel +39.040.2240.484
Fax +39.040.2240.249

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/developers/attachments/20121001/c1fef572/attachment.html>


More information about the developers mailing list