<div dir="ltr"><div><div><div><div>Hi Thomas,<br><br></div><div></div><div>It could work but not really solve your problem in the best way.<br></div>The ph.x first decide how to distribute the q+irr calculations among the images without knowing whether individual q+irr is done or not.<br></div>Then it starts the calculations including checking whether the calculation has already been done.<br></div></div><div>So it is highly probably that you still end up with one image finishes way eariler then other images.<br><br></div><div>I noticed the speed between different q can be very different. you can check the number of k points used for each q.<br></div><div>The speed between different irr belonging to the same q is similar.<br></div><div>To use my resource more efficiently, I prefer the Grid way of computing by breaking the whole calculation by q and then distribute irr among images.<br></div><div><br></div><div>Ye<br></div></div><div class="gmail_extra"><br clear="all"><div><div class="gmail_signature" data-smartmail="gmail_signature"><div dir="ltr">===================<br>

Ye Luo, Ph.D.<br>

Leadership Computing Facility<br>

Argonne National Laboratory</div></div></div>

<br><div class="gmail_quote">2016-09-02 10:40 GMT-05:00 Thomas Brumme <span dir="ltr"><<a href="mailto:thomas.brumme@mpsd.mpg.de" target="_blank">thomas.brumme@mpsd.mpg.de</a>></span>:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">OK, I think I found a possibility, which does not involve<br>

writing input files like in the GRID example (i.e. finding out<br>

which q points and representations finished and which<br>

didn't which can be quite tedious) but maybe someone<br>

can confirm...<br>

<br>

I now have, e.g., four _ph folders. For two of the images the<br>

calculations are nearly finished, while the other two haven't<br>

finished a single calculation. Restarting with 4 images would<br>

result in 2 of them just waiting... However, if I could restart<br>

with more or less images the work should be more evenly<br>

distributed.<br>

<br>

Lets say the original image 1 and 3 are finished and 0 and 2<br>

not, so "not / finished / not / finished"... In that case I could<br>

restart with only two images and the work would be evenly<br>

distributed.<br>

<br>

On the other hand, if I would have something like:<br>

"finished / finished / not / not"<br>

reducing the number of images to 2 would not solve the<br>

problem, but in the more general case with many more<br>

images, doubling the number of images could at least<br>

reduce the total number of CPUs which don't do anything.<br>

<br>

So, I need to create _ph folders for the number of images<br>

I want to use... Then I need to copy the directory<br>

_ph0/$prefix.phsave/<br>

of the original calculation into them in order to have the<br>

patterns. Then I also copy all the dynmat files of all the<br>

original images into those directories. If I now restart<br>

the phonon code should always recognize if a calculation<br>

has already been done...<br>

<br>

Does this sound reasonable?<br>

<br>

Kind regards<br>

<span class="HOEnZb"><font color="#888888"><br>

Thomas<br>

</font></span><div class="HOEnZb"><div class="h5"><br>

<br>

On 09/02/2016 12:01 PM, Thomas Brumme wrote:<br>

> Dear all,<br>

><br>

> I have a question concerning the restart possibilities with image<br>

> parallelization in a phonon calculation.<br>

> I have the problem that for some of the images the calculation did not<br>

> converge. I know that I can achieve<br>

> convergence by reducing the mixing since I encountered the problem<br>

> before for exactly the same system.<br>

> Yet, now, as some of the images are finished with their task (or close<br>

> to), I have only the possibility of either<br>

> using only one image copying the dynmat.$iq.$ir.xml files to the<br>

> _ph0/*.phsave/ directory, or to restart using<br>

> the same number of images and live with the fact that some images will<br>

> do nothing...<br>

> Or is there a third possibility I don't know? Wouldn't it be better to<br>

> first check what has already been done<br>

> and then distributing the work among the images? Or is this too hard to<br>

> code? (I haven't looked at this part<br>

> of the code yet)<br>

><br>

> OK, I think I could also use some kind of GRID parallelization and<br>

> create some input files by hand, setting<br>

> the start_irr, start_q, and so on, but this is rather tedious since I<br>

> have a big system and a q-point grid...<br>

> So, again the (maybe stupid) question: Is there another possibility?<br>

><br>

> Regards<br>

><br>

> Thomas<br>

><br>

><br>

<br>

--<br>

Dr. rer. nat. Thomas Brumme<br>

Max Planck Institute for the Structure and Dynamics of Matter<br>

Luruper Chaussee 149<br>

22761 Hamburg<br>

<br>

Tel:  <a href="tel:%2B49%20%280%2940%208998%206557" value="+494089986557">+49 (0)40 8998 6557</a><br>

<br>

email: <a href="mailto:Thomas.Brumme@mpsd.mpg.de">Thomas.Brumme@mpsd.mpg.de</a><br>

<br>

______________________________<wbr>_________________<br>

Pw_forum mailing list<br>

<a href="mailto:Pw_forum@pwscf.org">Pw_forum@pwscf.org</a><br>

<a href="http://pwscf.org/mailman/listinfo/pw_forum" rel="noreferrer" target="_blank">http://pwscf.org/mailman/<wbr>listinfo/pw_forum</a><br>

</div></div></blockquote></div><br></div>