<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
Hi Layla,<br>
<br>
I have never tried with 1thread as it's not recommended on BG/Q. At
least 2threads x MPI process. <br>
On the other hand I have some doubt about the way you are running
the jobs? How big is the BG size? How many processes are you running
per node in each of the 2 cases?<br>
<br>
I have some doubt about the question itself too. You are saying that
you see a slow down comparing 4Threads Vs 1 Threads. But from the
table below you report only data with 4 threads for BG/Q.<br>
Have you perhaps switched the headers of the two columns Threads and
Ndiag? <br>
<br>
It's expected that the SGI architecture based on Intel processors is
faster than BG/Q. <br>
<br>
Ivan<br>
<br>
On 01/10/2012 13:17, Layla Martin-Samos wrote:
<blockquote
cite="mid:CAGCSmJQ-8MM32O9tnTjmsmSWxCFAjek8qjZxSduJygcdV6+dPA@mail.gmail.com"
type="cite">Dear all, I have made some test calculations on Fermi
and Jade, for a 107 atoms system, 70 Ry cutoff for wfc, 285
occupied bands and 1 kpoint. What the results seems to show is
that the diagonalization with multithreads lib seems to
considerably slowdown the diagonalization time (diaghd is called
33 times on all the jobs and the final results are identical). The
compiled cineca version gives identical time and results than
5.0.1. Note that jade in sequential is faster than BGQ. I am
continuing some other tests on jade, unfortunatelly the runs stay
a lot of time in the queue, the machine is full and even for a 10
min job with 32 cores you wait more than 3 hours. As attachement I
put the two make.sys for jade. <br>
<br>
<br>
omputer mpi process threads ndiag complex/gamma_only
time for diaghg version Libs<br>
<br>
bgq 128 4 1 complex
(cdiaghg) 69.28 s 5.0.1
threads<br>
bgq 128 4 1 complex
(cdiaghg) 69.14 s 4.3.2
threads<br>
<br>
jade 32 1 1 complex
(cdiaghg) 27.44 s 4.3.2
sequential<br>
jade 32 1 1 complex
(cdiaghg) > 10 min
4.3.2 threads<br>
jade 32 1 1 complex
(cdiaghg) > 10 min
5.0.1 threads<br>
<br>
<br>
bgq 128 4 4 complex
(cdiaghg) 310.52 s 5.0.1
threads<br>
<br>
bgq 128 4 4 gamma
(rdiaghg) 73.87 s 5.0.1
threads<br>
bgq 128 4 4 gamma
(rdiaghg) 73.71 s 4.3.2
threads<br>
<br>
bgq 128 4 1 gamma
(rdiaghg) CRASH 2 it 5.0.1
threads<br>
bgq 128 4 1 gamma
(rdiaghg) CRASH 2 it 4.3.2
threads<br>
<br>
<br>
did someone observe a similar behavior? <br>
<br>
cheers <br>
<br>
Layla<br>
<br>
<br>
<br>
<br>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
<pre wrap="">_______________________________________________
Q-e-developers mailing list
<a class="moz-txt-link-abbreviated" href="mailto:Q-e-developers@qe-forge.org">Q-e-developers@qe-forge.org</a>
<a class="moz-txt-link-freetext" href="http://qe-forge.org/mailman/listinfo/q-e-developers">http://qe-forge.org/mailman/listinfo/q-e-developers</a>
</pre>
</blockquote>
<br>
<pre class="moz-signature" cols="72">--
Ivan Girotto - <a class="moz-txt-link-abbreviated" href="mailto:igirotto@ictp.it">igirotto@ictp.it</a>
High Performance Computing Specialist
Information & Communication Technology Section
The Abdus Salam - <a class="moz-txt-link-abbreviated" href="http://www.ictp.it">www.ictp.it</a>
International Centre for Theoretical Physics
Strada Costiera, 11 - 34151 Trieste - IT
Tel +39.040.2240.484
Fax +39.040.2240.249
</pre>
</body>
</html>