[QE-users] how to do parallelization for a gamma centered calculation
Dr. K. C. Bhamu
kcbhamu85 at gmail.com
Wed Jul 31 14:38:19 CEST 2019
Dear QE users and developers,
Greetings!!
I am looking for a help to do the effective parallelization for a gamma
centered calculation with qe-6.4.1 with intel mkl 2015 with external fftw3
or internal fftw3 on a cluster having 32 processors on each node.
The system is a binary case with 128 atoms (1664.00 electrons) as first
case and in another case we are having 250 atoms (3250.00 electrons).
Job on 32 processor for the scf file with 128 atoms is running well but for
the other file (250 atoms, other parameters are same) we are getting the
error after first iteration as appended at the bottom of the email .
If we use two nodes for the second case then the CPU time is too much (~
five times to the first case).
Could someone please help me to run the jobs with effective parallelization
for gamma k-point calculations with 1/2/3/4.. nodes (32 proc for each node)?
The other useful information that may be required by you to diagnosis the
problem is:
Parallel version (MPI), running on 32 processors
MPI processes distributed on 1 nodes
R & G space division: proc/nbgrp/npool/nimage = 32
Waiting for input...
Reading input from standard input
Current dimensions of program PWSCF are:
Max number of different atomic species (ntypx) = 10
Max number of k-points (npk) = 40000
Max angular momentum in pseudopotentials (lmaxx) = 3
gamma-point specific algorithms are used
Subspace diagonalization in iterative solution of the eigenvalue
problem:
one sub-group per band group will be used
scalapack distributed-memory algorithm (size of sub-group: 4* 4
procs)
Parallelization info
--------------------
sticks: dense smooth PW G-vecs: dense smooth PW
Min 936 936 233 107112 107112 13388
Max 937 937 236 107120 107120 13396
Sum 29953 29953 7495 3427749 3427749 428575
total cpu time spent up to now is 143.9 secs
and
number of k points= 1
cart. coord. in units 2pi/alat
k( 1) = ( 0.0000000 0.0000000 0.0000000), wk = 2.0000000
Dense grid: 1713875 G-vectors FFT dimensions: ( 216, 225, 216)
Estimated max dynamical RAM per process > 1.01 GB
Estimated total dynamical RAM > 64.62 GB
Initial potential from superposition of free atoms
starting charge 3249.86289, renormalised to 3250.00000
Starting wfcs are 2125 randomized atomic wfcs
========== Below is the error for the case with 250 atoms run over 32
procs=============
Self-consistent Calculation
iteration # 1 ecut= 80.00 Ry beta= 0.70
Davidson diagonalization with overlap
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= PID 154663 RUNNING AT node:1
= EXIT CODE: 9
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
APPLICATION TERMINATED WITH THE EXIT STRING: Killed (signal 9)
On the other cluster (68 procs per node) I do not observe any error.
Please let me know if I need to provide some additional information.
Looking forward to hearing from the experts.
Regards
K.C. Bhamu, Ph.D.
Postdoctoral Fellow
CSIR-NCL, Pune
India
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20190731/1db6163c/attachment.html>
More information about the users
mailing list