[Pw_forum] density files
Silviu Zilberman
silviu at Princeton.EDU
Thu Feb 9 00:46:03 CET 2006
Axel,
Thank you so much for the tip, the '-assume buffered_io' did the magic.
Now all the restart info is dumped in ~53 sec when I use the 4
processors on each node. Using only 3 processors per node (fixed total
number of procs) results in a very minor improvement.
Thanks! Silviu.
Axel Kohlmeyer wrote:
> On Wed, 8 Feb 2006, Silviu Zilberman wrote:
>
> hi,
>
> SZ> Paolo Giannozzi wrote:
> SZ> > On Wednesday 08 February 2006 12:18, Silviu Zilberman wrote:
> SZ> >
> SZ> >
> SZ> >> I have been running calculations on Lemieux which is an alpha cluster
> SZ> >> super computer in Pittsburgh. For some reason that is still mysterious
> SZ> >> to me, writing these density files on the scratch space took very long
> SZ> >> time, ~30 (!) minutes for a 68MB file.
> SZ> >>
> SZ> >
> SZ> > maybe you should rename your machine "Lepire" :-)
>
> paolo, you should be careful with remarks like this.
> people in pittsburgh take sports _very_ seriously and
> don't like others making fun of the names of their idol
> from the pittsburgh penguins. you may get away with it
> since the steelers just won the superbowl last weekend...
>
>
> SZ> > The charge density file should be written only when the wavefunctions
> SZ> > are written, every "isave" steps and at the end of the run. If it is written
> SZ> > at each time step, this is definitely wrong.
> SZ> >
> SZ> The charge density is written only every isave steps, and I set it to a
> SZ> very large number to avoid this time-consuming i/o. But even if I write
> SZ> it just once at the end of the calculation, it would still require ~90
> SZ> minutes for 3 files, which is completely crazy, given that the maximal
> SZ> time allocated per job is 12 hours on this supercomputer.
>
> lemieux is an alpha and the dec compiler has the unfortunate property
> to do synchronous i/o by default. this will have a desasterous effect
> on a networked filesystem used by (too) many users.
> please try compiling with '-assume buffered_io' and let me know
> if that helps.
>
> SZ> > The charge density in a parallel calculation is collected to a single node
> SZ> > and written from there. Since it is not wise to collect it into a single
> SZ> > array, each slice from each processor is collected and written. Maybe
> SZ> > this algorithm is not optimal (maybe it is even "pessimal"). You should
> SZ> > try to understand where exactly the machine spends all this time and
> SZ> > why
>
> another recommendation from the PSC staff is only use 3 processors per
> node for the actual job (which generally is the performance limit for
> memory bandwidth consuming jobs like DFT/PW/PP codes) so that there is
> some cpu capacity left for asynchrous operation (e.g. kernel i/o and
> the MPI and NFS threads).
>
> regards,
> axel.
>
> SZ> >
> SZ> I may do it, but for the time being, these files are not very useful for
> SZ> me. I can change the code to respect again the disk_io parameter and
> SZ> avoid writing these files all together. However I would like to know
> SZ> first if there was some reasoning behind dumping these files by default
> SZ> without user control over it.
> SZ>
> SZ> Thanks, Silviu.
> SZ>
> SZ>
> SZ> _______________________________________________
> SZ> Pw_forum mailing list
> SZ> Pw_forum at pwscf.org
> SZ> http://www.democritos.it/mailman/listinfo/pw_forum
> SZ>
>
>
More information about the users
mailing list