[QE-users] how to do parallelization for a gamma centered calculation

Dr. K. C. Bhamu kcbhamu85 at gmail.com
Wed Jul 31 14:38:19 CEST 2019


Dear QE users and developers,

Greetings!!

I am looking for a help to do the effective parallelization for a gamma
centered calculation with qe-6.4.1 with intel mkl 2015 with external fftw3
or internal fftw3 on a cluster having 32 processors on each node.

The system is a binary  case with 128 atoms (1664.00 electrons) as first
case and in another case we are having 250 atoms (3250.00 electrons).
Job on 32 processor for the scf file with 128 atoms is running well but for
the other file (250 atoms, other parameters are same) we are getting the
error after first iteration as  appended at the bottom of the email .
If we use two nodes for the second case then the CPU time is too much (~
five times to the first case).
Could someone please help me to run the jobs with effective parallelization
for gamma k-point calculations with 1/2/3/4.. nodes (32 proc for each node)?


The other useful information that may be required by you to diagnosis the
problem is:
 Parallel version (MPI), running on    32 processors

     MPI processes distributed on     1 nodes
     R & G space division:  proc/nbgrp/npool/nimage =      32
     Waiting for input...
     Reading input from standard input

     Current dimensions of program PWSCF are:
     Max number of different atomic species (ntypx) = 10
     Max number of k-points (npk) =  40000
     Max angular momentum in pseudopotentials (lmaxx) =  3

     gamma-point specific algorithms are used

     Subspace diagonalization in iterative solution of the eigenvalue
problem:
     one sub-group per band group will be used
     scalapack distributed-memory algorithm (size of sub-group:  4*  4
procs)

     Parallelization info
     --------------------
     sticks:   dense  smooth     PW     G-vecs:    dense   smooth      PW
     Min         936     936    233               107112   107112   13388
     Max         937     937    236               107120   107120   13396
     Sum       29953   29953   7495              3427749  3427749  428575
     total cpu time spent up to now is      143.9 secs

and

     number of k points=     1
                       cart. coord. in units 2pi/alat
        k(    1) = (   0.0000000   0.0000000   0.0000000), wk =   2.0000000

     Dense  grid:  1713875 G-vectors     FFT dimensions: ( 216, 225, 216)

     Estimated max dynamical RAM per process >       1.01 GB

     Estimated total dynamical RAM >      64.62 GB

     Initial potential from superposition of free atoms

     starting charge 3249.86289, renormalised to 3250.00000
     Starting wfcs are 2125 randomized atomic wfcs

========== Below is the error for the case with 250 atoms run over 32
procs=============


     Self-consistent Calculation

     iteration #  1     ecut=    80.00 Ry     beta= 0.70
     Davidson diagonalization with overlap

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   PID 154663 RUNNING AT node:1
=   EXIT CODE: 9
=   CLEANING UP REMAINING PROCESSES
=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
APPLICATION TERMINATED WITH THE EXIT STRING: Killed (signal 9)



On the other cluster (68 procs per node) I do not observe any error.

Please let me know if I need to provide some additional information.

Looking forward to hearing from the experts.

Regards

K.C. Bhamu, Ph.D.
Postdoctoral Fellow
CSIR-NCL, Pune
India
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20190731/1db6163c/attachment.html>


More information about the users mailing list