[Pw_forum] k-points parallelization in pwscf 4.2.1
sclauzer at sissa.it
Mon Feb 14 18:13:30 CET 2011
On 02/14/2011 04:21 PM, Davide Sangalli wrote:
> OK. I think it could be a "memory - cache" related problem.
> I did the same test with lower cut-off (still 6 CPUs).
> Now my serial run used 0.2 % of the memory and my parallel around 1.3%
> The parallelization over the fft grid is still faster, but the kpts
> parallelization now is faster than the serial run.
> Serial: PWSCF : 1m 3.14s CPU
> time, 1m 4.05s WALL time
> fft grid parallelization: PWSCF : 15.46s CPU time,
> 16.36s WALL time
> kpts parallelization: PWSCF : 35.98s CPU time,
> 36.47s WALL time
> Thank you and best regards,
Maybe I didn't explain myself. I was referring to the complete timings,
also with the separate contribution from all the subroutines.
> On 02/14/2011 02:49 PM, Gabriele Sclauzero wrote:
>> Dear Davide,
>> it might be a memory-contention problem, since CPU cache sizes are
>> of the order of a few MB. Please provide the detailed timings at the
>> end of the runs (that are the first thing one should look at in order
>> to interpret these kind of speedup tests).
>> Next time please take a few seconds to sign your post using full name
>> and affiliation.
>> Il giorno 14/feb/2011, alle ore 12.18, Davide Sangalli ha scritto:
>>> Thank you for the answer.
>>> I did a check to be sure, but these jobs use only few MB of memory.
>>> The serial run uses just 2.5% of my node memory (so around 15% for the
>>> run on 6 CPUs).
>>> It does not seems to me that this could be the problem.
>>> Moreover in the fft parallelization the memory was not distributed
>>> Is it possible that pwscf is not properly compiled?
>>> Is there any other check that you would suggest to do?
>>> Best regards,
>>> Largest allocated arrays est. size (Mb) dimensions
>>> Kohn-Sham Wavefunctions 5.68 Mb ( 6422, 58)
>>> NL pseudopotentials 13.33 Mb ( 6422, 136)
>>> Each V/rho on FFT grid 7.81 Mb ( 512000)
>>> Each G-vector array 1.76 Mb ( 230753)
>>> G-vector shells 0.09 Mb ( 12319)
>>> Largest temporary arrays est. size (Mb) dimensions
>>> Auxiliary wavefunctions 22.73 Mb ( 6422, 232)
>>> Each subspace H/S matrix 0.82 Mb ( 232, 232)
>>> Each <psi_i|beta_j> matrix 0.12 Mb ( 136, 58)
>>> Arrays for rho mixing 62.50 Mb ( 512000, 8)
>>> writing wfc files to a dedicated directory
>>> On 02/14/2011 11:34 AM, Paolo Giannozzi wrote:
>>>> Davide Sangalli wrote:
>>>>> What could my problem be?
>>>> the only reason I can think of is that k-point parallelization doesn't
>>>> (and cannot) distribute memory, so the total memory requirement will
>>>> be npools*(size of serial execution). If you run on the same node six
>>>> instances of a large executable, memory conflicts may slow down more
>>>> than parallelization can speed up.
>>> Pw_forum mailing list
>>> Pw_forum at pwscf.org <mailto:Pw_forum at pwscf.org>
>> § Gabriele Sclauzero, EPFL SB ITP CSEA
>> / PH H2 462, Station 3, CH-1015 Lausanne/
>> Pw_forum mailing list
>> Pw_forum at pwscf.org
> Davide Sangalli
> MDM labs, IMM, CNR.
> Agrate (MI) Italy
> Pw_forum mailing list
> Pw_forum at pwscf.org
Gabriele Sclauzero, EPFL SB ITP CSEA
PH H2 462, Station 3, CH-1015 Lausanne
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the users