[QE-users] how to do parallelization for a gamma centered calculation
Dr. K. C. Bhamu
kcbhamu85 at gmail.com
Tue Aug 6 11:56:12 CEST 2019
Dear Expert Users and the Developers of QE,
Could you please have a look for this thread?
regards
Bhamu
On Wed, Jul 31, 2019 at 6:08 PM Dr. K. C. Bhamu <kcbhamu85 at gmail.com> wrote:
> Dear QE users and developers,
>
> Greetings!!
>
> I am looking for a help to do the effective parallelization for a gamma
> centered calculation with qe-6.4.1 with intel mkl 2015 with external fftw3
> or internal fftw3 on a cluster having 32 processors on each node.
>
> The system is a binary case with 128 atoms (1664.00 electrons) as first
> case and in another case we are having 250 atoms (3250.00 electrons).
> Job on 32 processor for the scf file with 128 atoms is running well but
> for the other file (250 atoms, other parameters are same) we are getting
> the error after first iteration as appended at the bottom of the email .
> If we use two nodes for the second case then the CPU time is too much (~
> five times to the first case).
> Could someone please help me to run the jobs with effective
> parallelization for gamma k-point calculations with 1/2/3/4.. nodes (32
> proc for each node)?
>
>
> The other useful information that may be required by you to diagnosis the
> problem is:
> Parallel version (MPI), running on 32 processors
>
> MPI processes distributed on 1 nodes
> R & G space division: proc/nbgrp/npool/nimage = 32
> Waiting for input...
> Reading input from standard input
>
> Current dimensions of program PWSCF are:
> Max number of different atomic species (ntypx) = 10
> Max number of k-points (npk) = 40000
> Max angular momentum in pseudopotentials (lmaxx) = 3
>
> gamma-point specific algorithms are used
>
> Subspace diagonalization in iterative solution of the eigenvalue
> problem:
> one sub-group per band group will be used
> scalapack distributed-memory algorithm (size of sub-group: 4* 4
> procs)
>
> Parallelization info
> --------------------
> sticks: dense smooth PW G-vecs: dense smooth PW
> Min 936 936 233 107112 107112 13388
> Max 937 937 236 107120 107120 13396
> Sum 29953 29953 7495 3427749 3427749 428575
> total cpu time spent up to now is 143.9 secs
>
> and
>
> number of k points= 1
> cart. coord. in units 2pi/alat
> k( 1) = ( 0.0000000 0.0000000 0.0000000), wk = 2.0000000
>
> Dense grid: 1713875 G-vectors FFT dimensions: ( 216, 225, 216)
>
> Estimated max dynamical RAM per process > 1.01 GB
>
> Estimated total dynamical RAM > 64.62 GB
>
> Initial potential from superposition of free atoms
>
> starting charge 3249.86289, renormalised to 3250.00000
> Starting wfcs are 2125 randomized atomic wfcs
>
> ========== Below is the error for the case with 250 atoms run over 32
> procs=============
>
>
> Self-consistent Calculation
>
> iteration # 1 ecut= 80.00 Ry beta= 0.70
> Davidson diagonalization with overlap
>
>
> ===================================================================================
> = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
> = PID 154663 RUNNING AT node:1
> = EXIT CODE: 9
> = CLEANING UP REMAINING PROCESSES
> = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
>
> ===================================================================================
> APPLICATION TERMINATED WITH THE EXIT STRING: Killed (signal 9)
>
>
>
> On the other cluster (68 procs per node) I do not observe any error.
>
> Please let me know if I need to provide some additional information.
>
> Looking forward to hearing from the experts.
>
> Regards
>
> K.C. Bhamu, Ph.D.
> Postdoctoral Fellow
> CSIR-NCL, Pune
> India
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20190806/eae5f9ca/attachment.html>
More information about the users
mailing list