Dear all, I have made some test calculations on Fermi and Jade, for a 107 atoms system, 70 Ry cutoff for wfc, 285 occupied bands and 1 kpoint. What the results seems to show is that the diagonalization with multithreads lib seems to considerably slowdown the diagonalization time (diaghd is called 33 times on all the jobs and the final results are identical). The compiled cineca version gives identical time and results than 5.0.1. Note that jade in sequential is faster than BGQ. I am continuing some other tests on jade, unfortunatelly the runs stay a lot of time in the queue, the machine is full and even for a 10 min job with 32 cores you wait more than 3 hours. As attachement I put the two make.sys for jade. <br>
<br><br>omputer mpi process threads ndiag complex/gamma_only time for diaghg version Libs<br><br> bgq 128 4 1 complex (cdiaghg) 69.28 s 5.0.1 threads<br>
bgq 128 4 1 complex (cdiaghg) 69.14 s 4.3.2 threads<br><br> jade 32 1 1 complex (cdiaghg) 27.44 s 4.3.2 sequential<br>
jade 32 1 1 complex (cdiaghg) > 10 min 4.3.2 threads<br> jade 32 1 1 complex (cdiaghg) > 10 min 5.0.1 threads<br>
<br><br>bgq 128 4 4 complex (cdiaghg) 310.52 s 5.0.1 threads<br><br>bgq 128 4 4 gamma (rdiaghg) 73.87 s 5.0.1 threads<br>
bgq 128 4 4 gamma (rdiaghg) 73.71 s 4.3.2 threads<br><br>bgq 128 4 1 gamma (rdiaghg) CRASH 2 it 5.0.1 threads<br>
bgq 128 4 1 gamma (rdiaghg) CRASH 2 it 4.3.2 threads<br><br><br>did someone observe a similar behavior? <br><br>cheers <br><br>Layla<br><br><br><br><br>