<div dir="ltr"><div dir="ltr"><div class="gmail_default" style="font-size:large">Dear Expert Users and the Developers of QE,</div><div class="gmail_default" style="font-size:large">Could you please have a look for this thread?</div><div class="gmail_default" style="font-size:large"><br></div><div class="gmail_default" style="font-size:large">regards</div><div class="gmail_default" style="font-size:large">Bhamu<br clear="all"></div><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Jul 31, 2019 at 6:08 PM Dr. K. C. Bhamu <<a href="mailto:kcbhamu85@gmail.com">kcbhamu85@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div style="font-size:large">Dear QE users and developers,<div style="font-size:large"><br></div><div style="font-size:large">Greetings!!<br></div><div style="font-size:large"><br></div><div style="font-size:large">I am looking for a help to do the effective parallelization for a gamma
centered calculation with qe-6.4.1 with intel mkl 2015 with external
fftw3 or internal fftw3 on a cluster having 32 processors on each node.<br></div><div style="font-size:large"> <br></div><div style="font-size:large">The
system is a binary case with 128 atoms (1664.00 electrons) as first
case and in another case we are having 250 atoms (3250.00 electrons).</div><div style="font-size:large">Job
on 32 processor for the scf file with 128 atoms is running well but for
the other file (250 atoms, other parameters are same) we are getting
the error after first iteration as appended at the bottom of the email . <br></div><div style="font-size:large">If we use two nodes for the second case then the CPU time is too much (~ five times to the first case). <br></div><div style="font-size:large">Could
someone please help me to run the jobs with effective parallelization
for gamma k-point calculations with 1/2/3/4.. nodes (32 proc for each
node)?</div><div style="font-size:large"><br></div><div style="font-size:large"><br></div><div style="font-size:large">The other useful information that may be required by you to diagnosis the problem is:</div><div style="font-size:large"> Parallel version (MPI), running on 32 processors<br><br> MPI processes distributed on 1 nodes<br> R & G space division: proc/nbgrp/npool/nimage = 32<br> Waiting for input...<br> Reading input from standard input<br><br> Current dimensions of program PWSCF are:<br> Max number of different atomic species (ntypx) = 10<br> Max number of k-points (npk) = 40000<br> Max angular momentum in pseudopotentials (lmaxx) = 3<br><br> gamma-point specific algorithms are used<br><br> Subspace diagonalization in iterative solution of the eigenvalue problem:<br> one sub-group per band group will be used<br> scalapack distributed-memory algorithm (size of sub-group: 4* 4 procs)<br> <br> Parallelization info<br> --------------------<br> sticks: dense smooth PW G-vecs: dense smooth PW<br> Min 936 936 233 107112 107112 13388<br> Max 937 937 236 107120 107120 13396<br> Sum 29953 29953 7495 3427749 3427749 428575<br></div><div style="font-size:large"> total cpu time spent up to now is 143.9 secs</div><div style="font-size:large"><br></div><div style="font-size:large">and<br></div><div style="font-size:large"><br></div><div style="font-size:large"> number of k points= 1<br> cart. coord. in units 2pi/alat<br> k( 1) = ( 0.0000000 0.0000000 0.0000000), wk = 2.0000000<br><br> Dense grid: 1713875 G-vectors FFT dimensions: ( 216, 225, 216)<br><br> Estimated max dynamical RAM per process > 1.01 GB<br><br> Estimated total dynamical RAM > 64.62 GB<br><br> Initial potential from superposition of free atoms<br><br> starting charge 3249.86289, renormalised to 3250.00000<br> Starting wfcs are 2125 randomized atomic wfcs<br></div><div style="font-size:large"><br></div><div style="font-size:large">========== Below is the error for the case with 250 atoms run over 32 procs=============<br></div><div style="font-size:large"><br></div><div style="font-size:large"><br> Self-consistent Calculation<br><br> iteration # 1 ecut= 80.00 Ry beta= 0.70<br> Davidson diagonalization with overlap<br><br>===================================================================================<br>= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES<br>= PID 154663 RUNNING AT node:1<br>= EXIT CODE: 9<br>= CLEANING UP REMAINING PROCESSES<br>= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES<br>===================================================================================<br>APPLICATION TERMINATED WITH THE EXIT STRING: Killed (signal 9)</div><div style="font-size:large"><br></div><div style="font-size:large"><br></div><div style="font-size:large"><br></div><div style="font-size:large">On the other cluster (68 procs per node) I do not observe any error.</div><div style="font-size:large"><br></div><div style="font-size:large">Please let me know if I need to provide some additional information.<br></div><div style="font-size:large"><br></div><div style="font-size:large">Looking forward to hearing from the experts.</div><div style="font-size:large"><br></div><div style="font-size:large">Regards</div><div style="font-size:large"><br></div><div style="font-size:large">K.C. Bhamu, Ph.D.<br></div><div style="font-size:large">Postdoctoral Fellow<br></div><div style="font-size:large">CSIR-NCL, Pune</div><div style="font-size:large">India<br></div></div></div>
</blockquote></div></div>