<html>
<head>
<meta content="text/html; charset=windows-1252"
http-equiv="Content-Type">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<div class="moz-cite-prefix">you can try to figure out what's
happening looking to<br>
<br>
- the way the processors are split in the various parallelization
scheme (npool, nband,ntask, ndiag). this is written at the
beginning of the output. openmp parallelization can also be
enabled. it does not always help.<br>
- the dimensions of your system (number of bands, number of
planevawes, fft grid dimensions).<br>
- the time spent in the different routines, including the
parallel communication time. this is given at the end of your
output and depends on the speed and latency of the interconnection
between processors.<br>
<br>
A concern in the calculation might be the available RAM memory. If
the code starts swapping it's going to get very slow.<br>
<br>
Another concern is the I/O to disk that is generally slow and even
slower in parallel. Always use local scratch areas, never write on
a remote disk.<br>
Possibly don't write at all.<br>
<br>
stefano<br>
<br>
On 06/11/2015 22:18, Mofrad, Amir Mehdi (MU-Student) wrote:<br>
</div>
<blockquote
cite="mid:BN3PR01MB1352DE52F4ED5F426B955B54A7280@BN3PR01MB1352.prod.exchangelabs.com"
type="cite">
<meta http-equiv="Content-Type" content="text/html;
charset=windows-1252">
<style type="text/css" style="display:none;"><!-- P {margin-top:0;margin-bottom:0;} --></style>
<div id="divtagdefaultwrapper"
style="font-size:12pt;color:#000000;background-color:#FFFFFF;font-family:Calibri,Arial,Helvetica,sans-serif;">
<p>Dear all QE users and developers, <br>
</p>
<p><br>
</p>
<p>I have done an scf calculation on 1 processor which took me
11h37m. When I ran it on 4 processors it took 5h29m. I'm
running the same calculation on 8 processors and it has been
taking 5h17m already. Isn't it supposed to take less than 5
hours when I'm running it on 8 processors instead of 4
processors? <br>
</p>
<p>I used the following command for parallelization: " <b>mpirun
-np 8 pw.x -inp Siliceous-SOD.in Siliceous_SOD8out
&></b>
<b>Siliceous_SOD8.screen </dev/null & </b>"</p>
<p>I used to use "<b>mpirun -np 4 pw.x <inputfile> output</b>"<b>
</b>to parallelize before, however, it took forever (as if it
were being idle).
<br>
</p>
<p>At this stage I really need to do my calculations in parallel
and I don't know what the problem is. One thing that I'm sure
is that OPENMP and MPI are completely and properly installed
on my system.
<br>
</p>
<p><span> </span><br>
</p>
<p>Any help would be thoroughly appreciated.<br>
</p>
<p><br>
</p>
<div id="Signature">
<div id="divtagdefaultwrapper" style="font-size:12pt;
color:#000000; background-color:#FFFFFF;
font-family:Calibri,Arial,Helvetica,sans-serif">
<p>Amir M. Mofrad<span> <br>
</span></p>
<span></span>University of Missouri<br>
</div>
</div>
</div>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
<pre wrap="">_______________________________________________
Pw_forum mailing list
<a class="moz-txt-link-abbreviated" href="mailto:Pw_forum@pwscf.org">Pw_forum@pwscf.org</a>
<a class="moz-txt-link-freetext" href="http://pwscf.org/mailman/listinfo/pw_forum">http://pwscf.org/mailman/listinfo/pw_forum</a></pre>
</blockquote>
<br>
</body>
</html>