[Pw_forum] Restarting phonon calculation with images, possibility of changing the number of images
xw111luoye at gmail.com
Fri Sep 2 18:26:47 CEST 2016
It could work but not really solve your problem in the best way.
The ph.x first decide how to distribute the q+irr calculations among the
images without knowing whether individual q+irr is done or not.
Then it starts the calculations including checking whether the calculation
has already been done.
So it is highly probably that you still end up with one image finishes way
eariler then other images.
I noticed the speed between different q can be very different. you can
check the number of k points used for each q.
The speed between different irr belonging to the same q is similar.
To use my resource more efficiently, I prefer the Grid way of computing by
breaking the whole calculation by q and then distribute irr among images.
Ye Luo, Ph.D.
Leadership Computing Facility
Argonne National Laboratory
2016-09-02 10:40 GMT-05:00 Thomas Brumme <thomas.brumme at mpsd.mpg.de>:
> OK, I think I found a possibility, which does not involve
> writing input files like in the GRID example (i.e. finding out
> which q points and representations finished and which
> didn't which can be quite tedious) but maybe someone
> can confirm...
> I now have, e.g., four _ph folders. For two of the images the
> calculations are nearly finished, while the other two haven't
> finished a single calculation. Restarting with 4 images would
> result in 2 of them just waiting... However, if I could restart
> with more or less images the work should be more evenly
> Lets say the original image 1 and 3 are finished and 0 and 2
> not, so "not / finished / not / finished"... In that case I could
> restart with only two images and the work would be evenly
> On the other hand, if I would have something like:
> "finished / finished / not / not"
> reducing the number of images to 2 would not solve the
> problem, but in the more general case with many more
> images, doubling the number of images could at least
> reduce the total number of CPUs which don't do anything.
> So, I need to create _ph folders for the number of images
> I want to use... Then I need to copy the directory
> of the original calculation into them in order to have the
> patterns. Then I also copy all the dynmat files of all the
> original images into those directories. If I now restart
> the phonon code should always recognize if a calculation
> has already been done...
> Does this sound reasonable?
> Kind regards
> On 09/02/2016 12:01 PM, Thomas Brumme wrote:
> > Dear all,
> > I have a question concerning the restart possibilities with image
> > parallelization in a phonon calculation.
> > I have the problem that for some of the images the calculation did not
> > converge. I know that I can achieve
> > convergence by reducing the mixing since I encountered the problem
> > before for exactly the same system.
> > Yet, now, as some of the images are finished with their task (or close
> > to), I have only the possibility of either
> > using only one image copying the dynmat.$iq.$ir.xml files to the
> > _ph0/*.phsave/ directory, or to restart using
> > the same number of images and live with the fact that some images will
> > do nothing...
> > Or is there a third possibility I don't know? Wouldn't it be better to
> > first check what has already been done
> > and then distributing the work among the images? Or is this too hard to
> > code? (I haven't looked at this part
> > of the code yet)
> > OK, I think I could also use some kind of GRID parallelization and
> > create some input files by hand, setting
> > the start_irr, start_q, and so on, but this is rather tedious since I
> > have a big system and a q-point grid...
> > So, again the (maybe stupid) question: Is there another possibility?
> > Regards
> > Thomas
> Dr. rer. nat. Thomas Brumme
> Max Planck Institute for the Structure and Dynamics of Matter
> Luruper Chaussee 149
> 22761 Hamburg
> Tel: +49 (0)40 8998 6557
> email: Thomas.Brumme at mpsd.mpg.de
> Pw_forum mailing list
> Pw_forum at pwscf.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the users