[Q-e-developers] diagonalization with multithreads libs slow: comparison between Jade (cines) and Fermi (cineca)
Layla Martin-Samos
lmartinsamos at gmail.com
Mon Oct 1 14:16:40 CEST 2012
Ciao Ivan, I explain myself very badly, first look at this:
computer mpi process threads ndiag complex/gamma_only time for
diaghg version Libs
jade 32 1 1 complex
(cdiaghg) 27.44 s 4.3.2 sequential
jade 32 1 1 complex
(cdiaghg) > 10 min 4.3.2 threads
jade 32 1 1 complex
(cdiaghg) > 10 min 5.0.1 threads
it is exactly the same job, just the libs have changed.
For BGQ I run with bg_size=128, threads 4. My concern is that diaghg is 3
times slower than jade but with 4 times more mpi + threads. I was wondering
if it is convenient to use threads in the diag or the libs are less
efficient? boh!
2012/10/1 Ivan Girotto <igirotto at ictp.it>
> Hi Layla,
>
> I have never tried with 1thread as it's not recommended on BG/Q. At least
> 2threads x MPI process.
> On the other hand I have some doubt about the way you are running the
> jobs? How big is the BG size? How many processes are you running per node
> in each of the 2 cases?
>
> I have some doubt about the question itself too. You are saying that you
> see a slow down comparing 4Threads Vs 1 Threads. But from the table below
> you report only data with 4 threads for BG/Q.
> Have you perhaps switched the headers of the two columns Threads and
> Ndiag?
>
> It's expected that the SGI architecture based on Intel processors is
> faster than BG/Q.
>
> Ivan
>
>
> On 01/10/2012 13:17, Layla Martin-Samos wrote:
>
> Dear all, I have made some test calculations on Fermi and Jade, for a 107
> atoms system, 70 Ry cutoff for wfc, 285 occupied bands and 1 kpoint. What
> the results seems to show is that the diagonalization with multithreads lib
> seems to considerably slowdown the diagonalization time (diaghd is called
> 33 times on all the jobs and the final results are identical). The compiled
> cineca version gives identical time and results than 5.0.1. Note that jade
> in sequential is faster than BGQ. I am continuing some other tests on jade,
> unfortunatelly the runs stay a lot of time in the queue, the machine is
> full and even for a 10 min job with 32 cores you wait more than 3 hours. As
> attachement I put the two make.sys for jade.
>
>
> omputer mpi process threads ndiag complex/gamma_only time for
> diaghg version Libs
>
> bgq 128 4 1 complex (cdiaghg)
> 69.28 s 5.0.1 threads
> bgq 128 4 1 complex
> (cdiaghg) 69.14 s 4.3.2 threads
>
> jade 32 1 1 complex
> (cdiaghg) 27.44 s 4.3.2 sequential
> jade 32 1 1 complex
> (cdiaghg) > 10 min 4.3.2 threads
> jade 32 1 1 complex
> (cdiaghg) > 10 min 5.0.1 threads
>
>
> bgq 128 4 4 complex
> (cdiaghg) 310.52 s 5.0.1 threads
>
> bgq 128 4 4 gamma
> (rdiaghg) 73.87 s 5.0.1 threads
> bgq 128 4 4 gamma
> (rdiaghg) 73.71 s 4.3.2 threads
>
> bgq 128 4 1 gamma (rdiaghg)
> CRASH 2 it 5.0.1 threads
> bgq 128 4 1 gamma
> (rdiaghg) CRASH 2 it 4.3.2 threads
>
>
> did someone observe a similar behavior?
>
> cheers
>
> Layla
>
>
>
>
>
>
> _______________________________________________
> Q-e-developers mailing listQ-e-developers at qe-forge.orghttp://qe-forge.org/mailman/listinfo/q-e-developers
>
>
> --
>
> Ivan Girotto - igirotto at ictp.it
> High Performance Computing Specialist
> Information & Communication Technology Section
> The Abdus Salam - www.ictp.it
> International Centre for Theoretical Physics
> Strada Costiera, 11 - 34151 Trieste - IT
> Tel +39.040.2240.484
> Fax +39.040.2240.249
>
>
> _______________________________________________
> Q-e-developers mailing list
> Q-e-developers at qe-forge.org
> http://qe-forge.org/mailman/listinfo/q-e-developers
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/developers/attachments/20121001/f074f478/attachment.html>
More information about the developers
mailing list