<html>So you mean it's not normal that bands.x takes more than 7 hours? What's suspicious is that the reported actual CPU time is much less, only 16 minutes. What could be the problem?<br />Here's the output of a bands.x calculation:<br /><br /> Program BANDS v.5.1.2 starts on 5Dec2015 at 9:15:18<br /><br /> This program is part of the open-source Quantum ESPRESSO suite<br /> for quantum simulation of materials; please cite<br /> "P. Giannozzi et al., J. Phys.:Condens. Matter 21 395502 (2009);<br /> URL http://www.quantum-espresso.org",<br /> in publications or presentations arising from this work. More details at<br /> http://www.quantum-espresso.org/quote<br /><br /> Parallel version (MPI), running on 64 processors<br /> R & G space division: proc/nbgrp/npool/nimage = 64<br /><br /> Reading data from directory:<br /> ./tmp/Ni3HTP2.save<br /><br /> Info: using nr1, nr2, nr3 values from input<br /><br /> Info: using nr1s, nr2s, nr3s values from input<br /><br /> IMPORTANT: XC functional enforced from input :<br /> Exchange-correlation = SLA PW PBE PBE ( 1 4 3 4 0 0)<br /> Any further DFT definition will be discarded<br /> Please, verify this is what you really want<br /><br /> file H.pbe-rrkjus.UPF: wavefunction(s) 1S renormalized<br /> file C.pbe-rrkjus.UPF: wavefunction(s) 2S 2P renormalized<br /> file N.pbe-rrkjus.UPF: wavefunction(s) 2S renormalized<br /> file Ni.pbe-nd-rrkjus.UPF: wavefunction(s) 4S renormalized<br /> <br /> Parallelization info<br /> --------------------<br /> sticks: dense smooth PW G-vecs: dense smooth PW<br /> Min 588 588 151 92668 92668 12083<br /> Max 590 590 152 92671 92671 12086<br /> Sum 37643 37643 9677 5930831 5930831 773403<br /> <br /><br /> Check: negative/imaginary core charge= -0.000004 0.000000<br /><br /> negative rho (up, down): 2.225E-03 0.000E+00<br /> high-symmetry point: 0.0000 0.0000 0.4981 x coordinate 0.0000<br /> high-symmetry point: 0.3332 0.5780 0.4981 x coordinate 0.6672<br /> high-symmetry point: 0.5000 0.2890 0.4981 x coordinate 1.0009<br /> high-symmetry point: 0.0000 0.0000 0.4981 x coordinate 1.5784<br /><br /> Plottable bands written to file bands.out.gnu<br /> Bands written to file bands.out<br /> <br /> BANDS : 0h16m CPU 7h38m WALL<br /><br /> <br /> This run was terminated on: 16:53:49 5Dec2015 <br /><br />=------------------------------------------------------------------------------=<br /> JOB DONE.<br />=------------------------------------------------------------------------------=<br /><br /><br /><br /><br />Am Samstag, 05. Dezember 2015 21:03 CET, stefano de gironcoli <degironc@sissa.it> schrieb:<br /> <blockquote type="cite" cite="566342FC.6090506@sissa.it"> </blockquote><meta content="text/html; charset=windows-1252"
http-equiv="Content-Type"><div class="moz-cite-prefix">The only parallelization that i see in bands is the basic one over R & G. If it is different from the parallelization used previously you should use wf_collect.<br />the code computes the overlap between the orbital at k and k+dk in order to decide how to connect them. it's an nbnd^2 operation done band by band. not very efficient evidently but it should not take hours.<br />you can use wf_collect=.true. and increase the number of processors.<br /> <br />stefano<br /><br /><br />On 05/12/2015 12:57, Maxim Skripnik wrote:</div><blockquote cite="mid:53cdc7c4e767a41e.5662d119@limbe.rz.uni-konstanz.de" type="cite">Thank you for the information. Yes, at the beginning of the pw.x output it says:<br /> Parallel version (MPI), running on 64 processors<br /> R & G space division: proc/nbgrp/npool/nimage = 64<br /><br />Is bands.x parallelized at all? If so, where can I find information on that? There's nothing mentioned in the documentation:<br /><a class="moz-txt-link-freetext" href="http://www.quantum-espresso.org/wp-content/uploads/Doc/pp_user_guide.pdf">http://www.quantum-espresso.org/wp-content/uploads/Doc/pp_user_guide.pdf</a><br /><a class="moz-txt-link-freetext" href="http://www.quantum-espresso.org/wp-content/uploads/Doc/INPUT_BANDS.html">http://www.quantum-espresso.org/wp-content/uploads/Doc/INPUT_BANDS.html</a><br /><br />What could be the reason for bands.x taking many hours to calculate the bands? The foregoing pw.x calculation has already determined the energy for each k-point along a path (Gamma -> K -> M -> Gamma). There are 61 k-points and 129 bands. So what is bands.x actaully doing beside reformating that data? The input file job.bands looks like this:<br /> &bands<br /> prefix = 'st1'<br /> outdir = './tmp'<br />/<br />The calculation is initiated by<br />mpirun -np 64 bands.x < job.bands<br /><br />Maxim Skripnik<br />Department of Physics<br />University of Konstanz<br /><br />Am Samstag, 05. Dezember 2015 02:37 CET, stefano de gironcoli <a class="moz-txt-link-rfc2396E" href="mailto:degironc@sissa.it"><degironc@sissa.it></a> schrieb:<br /> <blockquote type="cite" cite="56623FC2.9070705@sissa.it"> </blockquote> <meta content="text/html; charset=windows-1252"
http-equiv="Content-Type"><div class="moz-cite-prefix">On 04/12/2015 22:53, Maxim Skripnik wrote:</div><blockquote cite="mid:bbc1c3d2d4083244.56620b59@limbe.rz.uni-konstanz.de" type="cite">Hello,<br /><br />I'm a bit confused by the parallelization scheme of QE. First of all, I run calculations on a cluster with usually 1 to 8 nodes, each of which has 16 cores. There is a very good scaling of pw.x e.g. for structural relaxation jobs. I do not specify any particular parallelization scheme as mentioned in the documentation, i.e. I start the calculations with<br />mpirun -np 128 pw.x < job.pw<br />on 8 nodes, 16 cores each. According to the documentation ni=1, nk=1 and nt=1. So in which respect are the calculations parallelized by default? Why do the calculations scale so well without specifying ni, nk, nt, nd?</blockquote> R and G parallelization is performed.<br />wavefunctions' planewaves, density planewaves and slices of real space objects are distributed across 128 processors. A report of how this is done is given at the beginning of the output.<br />Did you had a look to it ?<br /> <blockquote cite="mid:bbc1c3d2d4083244.56620b59@limbe.rz.uni-konstanz.de" type="cite">Second question is, whether one can speed up bands.x calculations. Up to now I start these this way:<br />mpirun -np 64 bands.x < job.bands<br />on 4 nodes, 16 cores each. Does it make sense to define nb for bands.x? If yes, what would be reasonable values?</blockquote> expect no gain. band parallelization is not implemented in bands.<br /><br />stefano<br /><br /><br /><br /><br /><br /><br /> <blockquote cite="mid:bbc1c3d2d4083244.56620b59@limbe.rz.uni-konstanz.de" type="cite">The systems of interest consist of typically ~50 atoms with periodic boundaries.<br /><br />Maxim Skripnik<br />Department of Physics<br />University of Konstanz<fieldset class="mimeAttachmentHeader"> </fieldset> <pre wrap="">_______________________________________________
Pw_forum mailing list
<a moz-do-not-send="true" class="moz-txt-link-abbreviated" href="mailto:Pw_forum@pwscf.org">Pw_forum@pwscf.org</a>
<a moz-do-not-send="true" class="moz-txt-link-freetext" href="http://pwscf.org/mailman/listinfo/pw_forum">http://pwscf.org/mailman/listinfo/pw_forum</a></pre></blockquote><br /><br /><br /> <fieldset class="mimeAttachmentHeader"> </fieldset> <pre wrap="">_______________________________________________
Pw_forum mailing list
<a class="moz-txt-link-abbreviated" href="mailto:Pw_forum@pwscf.org">Pw_forum@pwscf.org</a>
<a class="moz-txt-link-freetext" href="http://pwscf.org/mailman/listinfo/pw_forum">http://pwscf.org/mailman/listinfo/pw_forum</a></pre></blockquote><br /><br /><br /> </html>