[Pw_forum] k-points parallelization in pwscf 4.2.1
Gabriele Sclauzero
sclauzer at sissa.it
Mon Feb 14 18:13:30 CET 2011
On 02/14/2011 04:21 PM, Davide Sangalli wrote:
> OK. I think it could be a "memory - cache" related problem.
>
> I did the same test with lower cut-off (still 6 CPUs).
> Now my serial run used 0.2 % of the memory and my parallel around 1.3%
>
> The parallelization over the fft grid is still faster, but the kpts
> parallelization now is faster than the serial run.
> Serial: PWSCF : 1m 3.14s CPU
> time, 1m 4.05s WALL time
> fft grid parallelization: PWSCF : 15.46s CPU time,
> 16.36s WALL time
> kpts parallelization: PWSCF : 35.98s CPU time,
> 36.47s WALL time
>
> Thank you and best regards,
> Davide
Maybe I didn't explain myself. I was referring to the complete timings,
also with the separate contribution from all the subroutines.
GS
>
>
> On 02/14/2011 02:49 PM, Gabriele Sclauzero wrote:
>> Dear Davide,
>>
>> it might be a memory-contention problem, since CPU cache sizes are
>> of the order of a few MB. Please provide the detailed timings at the
>> end of the runs (that are the first thing one should look at in order
>> to interpret these kind of speedup tests).
>>
>> Next time please take a few seconds to sign your post using full name
>> and affiliation.
>>
>> Regards,
>>
>> GS
>>
>>
>> Il giorno 14/feb/2011, alle ore 12.18, Davide Sangalli ha scritto:
>>
>>> Thank you for the answer.
>>>
>>> I did a check to be sure, but these jobs use only few MB of memory.
>>> The serial run uses just 2.5% of my node memory (so around 15% for the
>>> run on 6 CPUs).
>>> It does not seems to me that this could be the problem.
>>> Moreover in the fft parallelization the memory was not distributed
>>> neither.
>>>
>>> Is it possible that pwscf is not properly compiled?
>>> Is there any other check that you would suggest to do?
>>>
>>> Best regards,
>>> Davide
>>>
>>> *************************************
>>> Largest allocated arrays est. size (Mb) dimensions
>>> Kohn-Sham Wavefunctions 5.68 Mb ( 6422, 58)
>>> NL pseudopotentials 13.33 Mb ( 6422, 136)
>>> Each V/rho on FFT grid 7.81 Mb ( 512000)
>>> Each G-vector array 1.76 Mb ( 230753)
>>> G-vector shells 0.09 Mb ( 12319)
>>> Largest temporary arrays est. size (Mb) dimensions
>>> Auxiliary wavefunctions 22.73 Mb ( 6422, 232)
>>> Each subspace H/S matrix 0.82 Mb ( 232, 232)
>>> Each <psi_i|beta_j> matrix 0.12 Mb ( 136, 58)
>>> Arrays for rho mixing 62.50 Mb ( 512000, 8)
>>> writing wfc files to a dedicated directory
>>>
>>>
>>> On 02/14/2011 11:34 AM, Paolo Giannozzi wrote:
>>>> Davide Sangalli wrote:
>>>>
>>>>> What could my problem be?
>>>> the only reason I can think of is that k-point parallelization doesn't
>>>> (and cannot) distribute memory, so the total memory requirement will
>>>> be npools*(size of serial execution). If you run on the same node six
>>>> instances of a large executable, memory conflicts may slow down more
>>>> than parallelization can speed up.
>>>>
>>>> P.
>>> _______________________________________________
>>> Pw_forum mailing list
>>> Pw_forum at pwscf.org <mailto:Pw_forum at pwscf.org>
>>> http://www.democritos.it/mailman/listinfo/pw_forum
>>
>>
>> § Gabriele Sclauzero, EPFL SB ITP CSEA
>> / PH H2 462, Station 3, CH-1015 Lausanne/
>>
>>
>> _______________________________________________
>> Pw_forum mailing list
>> Pw_forum at pwscf.org
>> http://www.democritos.it/mailman/listinfo/pw_forum
>
>
> Davide Sangalli
> MDM labs, IMM, CNR.
> Agrate (MI) Italy
>
>
> _______________________________________________
> Pw_forum mailing list
> Pw_forum at pwscf.org
> http://www.democritos.it/mailman/listinfo/pw_forum
--
Gabriele Sclauzero, EPFL SB ITP CSEA
PH H2 462, Station 3, CH-1015 Lausanne
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20110214/6d7a75c2/attachment.html>
More information about the users
mailing list