[Q-e-developers] Scalability of CP on BGQ (FERMI)
Filippo Spiga
spiga.filippo at gmail.com
Tue Jul 31 06:54:58 CEST 2012
Dear Carlo,
On Jul 30, 2012, at 6:14 PM, Carlo Cavazzoni wrote:
> Number of | Number of | sec/ | OpenMP | command line
> real cores | virtual cores | iteration | threads | parameters
> 4096 | 8192 | 231 | 4 | -nbgrp 2 -ntg 4
> -ndiag 256
> 8192 | 16384 | 160 | 4 | -nbgrp 4 -ntg 4
> -ndiag 1024
> 16384 | 32768 | 131 | 4 | -nbgrp 4 -ntg 4
> -ndiag 1024
> 32768 | 65536 | 86 | 4 | -nbgrp 8 -ntg 4
> -ndiag 2048
benchmarking GPU PWscf on medium/big systems (>500 atoms) I found in the PW code several spots where adding OpenMP will improve the performance (of those sections) of a factor (at least) 2. I haven't committed anything yet. However, it is interesting to evaluate the OpenMP efficiency/scalability. I see you did tests using 4 OpenMP (I assume 8 MPI per A2 chip, 2 OpenMP thread per physical core, 2 GByte per RAM each task. correct?). What about 8 OpenMP threads? or 16? Is it worth to go over 4 OpenMP threads?
> (Volunteer are welcome too!)
I am more than happy to help (-:
F.
--
Mr. Filippo SPIGA (穗安駒), HPC and GPU Technologist <spiga.filippo_at_gmail.com>
website: http://filippospiga.me ~ skype: filippo.spiga
«Nobody will drive us out of Cantor's paradise.» ~ David Hilbert
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/developers/attachments/20120731/472f4004/attachment.html>
More information about the developers
mailing list