[Q-e-developers] diagonalization with multithreads libs slow: comparison between Jade (cines) and Fermi (cineca)

Mon Oct 1 14:16:40 CEST 2012

Ciao Ivan, I explain myself very badly, first look at this:

computer   mpi process     threads    ndiag   complex/gamma_only   time for
diaghg    version   Libs

 jade         32             1          1      complex
(cdiaghg)                                  27.44 s      4.3.2     sequential
 jade         32             1          1      complex
(cdiaghg)                                 > 10 min     4.3.2     threads
 jade         32             1          1      complex
(cdiaghg)                                 > 10 min     5.0.1     threads

it is exactly the same job, just the libs have changed.

For BGQ I run with bg_size=128, threads 4. My concern is that diaghg is 3
times slower than jade but with 4 times more mpi + threads. I was wondering
if it is convenient to use threads in the diag or the libs are less
efficient? boh!

2012/10/1 Ivan Girotto <igirotto at ictp.it>

>  Hi Layla,
>
> I have never tried with 1thread as it's not recommended on BG/Q. At least
> 2threads x MPI process.
> On the other hand I have some doubt about the way you are running the
> jobs? How big is the BG size? How many processes are you running per node
> in each of the 2 cases?
>
> I have some doubt about the question itself too. You are saying that you
> see a slow down comparing 4Threads Vs 1 Threads. But from the table below
> you report only data with 4 threads for BG/Q.
> Have you perhaps switched the headers of the two columns Threads and
> Ndiag?
>
> It's expected that the SGI architecture based on Intel processors is
> faster than BG/Q.
>
> Ivan
>
>
> On 01/10/2012 13:17, Layla Martin-Samos wrote:
>
> Dear all, I have made some test calculations on Fermi and Jade, for a 107
> atoms system, 70 Ry cutoff for wfc, 285 occupied bands and 1 kpoint. What
> the results seems to show is that the diagonalization with multithreads lib
> seems to considerably slowdown the diagonalization time (diaghd is called
> 33 times on all the jobs and the final results are identical). The compiled
> cineca version gives identical time and results than 5.0.1.  Note that jade
> in sequential is faster than BGQ. I am continuing some other tests on jade,
> unfortunatelly the runs stay a lot of time in the queue, the machine is
> full and even for a 10 min job with 32 cores you wait more than 3 hours. As
> attachement I put the two make.sys for jade.
>
>
> omputer   mpi process     threads    ndiag   complex/gamma_only   time for
> diaghg    version   Libs
>
>  bgq         128             4          1      complex (cdiaghg)
>                        69.28 s      5.0.1     threads
> bgq          128             4          1      complex
> (cdiaghg)                                 69.14 s      4.3.2     threads
>
>  jade         32             1          1      complex
> (cdiaghg)                                  27.44 s      4.3.2     sequential
>  jade         32             1          1      complex
> (cdiaghg)                                 > 10 min     4.3.2     threads
>  jade         32             1          1      complex
> (cdiaghg)                                 > 10 min     5.0.1     threads
>
>
> bgq          128             4          4      complex
> (cdiaghg)                                310.52 s      5.0.1    threads
>
> bgq          128             4          4      gamma
> (rdiaghg)                                    73.87 s      5.0.1    threads
> bgq          128             4          4      gamma
> (rdiaghg)                                    73.71 s      4.3.2    threads
>
> bgq          128             4          1      gamma (rdiaghg)
>                   CRASH 2 it     5.0.1    threads
> bgq          128             4          1      gamma
> (rdiaghg)                               CRASH 2 it     4.3.2    threads
>
>
> did someone observe a similar behavior?
>
> cheers
>
> Layla
>
>
>
>
>
>
> _______________________________________________
> Q-e-developers mailing listQ-e-developers at qe-forge.orghttp://qe-forge.org/mailman/listinfo/q-e-developers
>
>
> --
>
> Ivan Girotto - igirotto at ictp.it
> High Performance Computing Specialist
> Information & Communication Technology Section
> The Abdus Salam - www.ictp.it
> International Centre for Theoretical Physics
> Strada Costiera, 11 - 34151 Trieste - IT
> Tel +39.040.2240.484
> Fax +39.040.2240.249
>
>
> _______________________________________________
> Q-e-developers mailing list
> Q-e-developers at qe-forge.org
> http://qe-forge.org/mailman/listinfo/q-e-developers
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/developers/attachments/20121001/f074f478/attachment.html>