[Pw_forum] density files
Axel Kohlmeyer
akohlmey at vitae.cmm.upenn.edu
Wed Feb 8 23:01:18 CET 2006
On Wed, 8 Feb 2006, Silviu Zilberman wrote:
hi,
SZ> Paolo Giannozzi wrote:
SZ> > On Wednesday 08 February 2006 12:18, Silviu Zilberman wrote:
SZ> >
SZ> >
SZ> >> I have been running calculations on Lemieux which is an alpha cluster
SZ> >> super computer in Pittsburgh. For some reason that is still mysterious
SZ> >> to me, writing these density files on the scratch space took very long
SZ> >> time, ~30 (!) minutes for a 68MB file.
SZ> >>
SZ> >
SZ> > maybe you should rename your machine "Lepire" :-)
paolo, you should be careful with remarks like this.
people in pittsburgh take sports _very_ seriously and
don't like others making fun of the names of their idol
from the pittsburgh penguins. you may get away with it
since the steelers just won the superbowl last weekend...
SZ> > The charge density file should be written only when the wavefunctions
SZ> > are written, every "isave" steps and at the end of the run. If it is written
SZ> > at each time step, this is definitely wrong.
SZ> >
SZ> The charge density is written only every isave steps, and I set it to a
SZ> very large number to avoid this time-consuming i/o. But even if I write
SZ> it just once at the end of the calculation, it would still require ~90
SZ> minutes for 3 files, which is completely crazy, given that the maximal
SZ> time allocated per job is 12 hours on this supercomputer.
lemieux is an alpha and the dec compiler has the unfortunate property
to do synchronous i/o by default. this will have a desasterous effect
on a networked filesystem used by (too) many users.
please try compiling with '-assume buffered_io' and let me know
if that helps.
SZ> > The charge density in a parallel calculation is collected to a single node
SZ> > and written from there. Since it is not wise to collect it into a single
SZ> > array, each slice from each processor is collected and written. Maybe
SZ> > this algorithm is not optimal (maybe it is even "pessimal"). You should
SZ> > try to understand where exactly the machine spends all this time and
SZ> > why
another recommendation from the PSC staff is only use 3 processors per
node for the actual job (which generally is the performance limit for
memory bandwidth consuming jobs like DFT/PW/PP codes) so that there is
some cpu capacity left for asynchrous operation (e.g. kernel i/o and
the MPI and NFS threads).
regards,
axel.
SZ> >
SZ> I may do it, but for the time being, these files are not very useful for
SZ> me. I can change the code to respect again the disk_io parameter and
SZ> avoid writing these files all together. However I would like to know
SZ> first if there was some reasoning behind dumping these files by default
SZ> without user control over it.
SZ>
SZ> Thanks, Silviu.
SZ>
SZ>
SZ> _______________________________________________
SZ> Pw_forum mailing list
SZ> Pw_forum at pwscf.org
SZ> http://www.democritos.it/mailman/listinfo/pw_forum
SZ>
--
=======================================================================
Axel Kohlmeyer akohlmey at cmm.chem.upenn.edu http://www.cmm.upenn.edu
Center for Molecular Modeling -- University of Pennsylvania
Department of Chemistry, 231 S.34th Street, Philadelphia, PA 19104-6323
tel: 1-215-898-1582, fax: 1-215-573-6233, office-tel: 1-215-898-5425
=======================================================================
If you make something idiot-proof, the universe creates a better idiot.
More information about the users
mailing list