[QE-users] Parallel computing of QE7.1 vc-relax crashes when using large number of processors

Mon Oct 31 07:46:35 CET 2022

Hello Paolo,

I think the problem happens before the start of the iteration of 
self-consistent calculation.

The last output from .out file before the crash is like this:

"Smooth grid:   274793 G-vectors     FFT dimensions: (  81,  81, 81)"
"Estimated max dynamical RAM per process >     594.72 MB"
"Estimated total dynamical RAM >       9.29 GB"
"Initial potential from superposition of free atoms"
"starting charge     755.9699, renormalised to     756.0000"
"Starting wfcs are  702 randomized atomic wfcs"

Thank you.

Best regards,
Xin

On 30/10/2022 09:23, Paolo Giannozzi wrote:
> You get the message when the calculation starts, after initialization, 
> after a few scf steps, after a few optimization steps, ... ?
>
> Paolo
>
> On 28/10/2022 14:45, Xin Jin wrote:
>>
>> You don't often get email from xin.tlg.jin at outlook.com. Learn why 
>> this is important <https://aka.ms/LearnAboutSenderIdentification>
>>
>>
>> Dear Quantum Espresso Forum,
>>
>> I encountered a problem related to the parallel computing using QE7.1 
>> for vc-relax.
>>
>> I was trying to perform a vc-relax for a 3*3*3 BCC tungsten super 
>> cell. The code works fine for non-parallel computing, also works fine 
>> for parallel computing if the number of processors is smaller than 10.
>>
>> However, if the number of processors is larger than 10, I will get 
>> following MPI error:
>> /*** An error occurred in MPI_Comm_free//
>> //*** reported by process [3585895498,2]//
>> //*** on communicator MPI_COMM_WORLD//
>> //*** MPI_ERR_COMM: invalid communicator//
>> //*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now 
>> abort,//
>> //***    and potentially your MPI job)/
>>
>> For parallel computing, I am using /OpenMPI/3.1.4-gcccuda/. (In 
>> addition, it seems that If I use OpenMPI V4, the simulation speed 
>> will be much slower than that of V3.)
>>
>> Another thing is that, if I decrease the size of the supper cell, for 
>> example to 2*2*2, then there is no problem in the parallel computing 
>> even if I use more than 30 processors.
>>
>> Could you help me look at this problem, please?
>>
>> The input for QE can be found below.
>>
>> Thank you in advance!
>>
>> Xin Jin
>>
>> /&control//
>> //
>> //    calculation='vc-relax' //
>> //    restart_mode='from_scratch', //
>> //    prefix='W_relax', //
>> //    pseudo_dir="../../PP_files",//
>> //    outdir='./'//
>> //
>> // ///
>> ////
>> //
>> // &system//
>> //    ibrav= 0, //
>> //    celldm(1)=5.972,//
>> //    nat=  54, //
>> //    ntyp= 1,//
>> //    ecutwfc = 50,//
>> //    ecutrho = 500,//
>> //    occupations='smearing', smearing='mp', degauss=0.06//
>> // ///
>> //
>> // &electrons//
>> //    diagonalization='david',//
>> //    conv_thr =  1.0d-8,//
>> //    mixing_beta = 0.5,//
>> // ///
>> ////
>> // &ions//
>> // ///
>> //
>> // &cell//
>> //    press = 0.0,//
>> // ///
>> ////
>> //ATOMIC_SPECIES//
>> // W  183.84 W.pbe-spn-kjpaw_psl.1.0.0.UPF//
>> ////
>> //CELL_PARAMETERS {alat}//
>> //   3.0  0.0  0.0//
>> //   0.0  3.0  0.0//
>> //   0.0  0.0  3.0 //
>> ////
>> //ATOMIC_POSITIONS {alat}//
>> //W 0.00000 0.00000 0.00000//
>> //W 0.50000 0.50000 0.50000//
>> //W 1.00000 0.00000 0.00000//
>> //W 1.50000 0.50000 0.50000//
>> //W 2.00000 0.00000 0.00000//
>> //W 2.50000 0.50000 0.50000//
>> //W 0.00000 1.00000 0.00000//
>> //W 0.50000 1.50000 0.50000//
>> //W 1.00000 1.00000 0.00000//
>> //W 1.50000 1.50000 0.50000//
>> //W 2.00000 1.00000 0.00000//
>> //W 2.50000 1.50000 0.50000//
>> //W 0.00000 2.00000 0.00000//
>> //W 0.50000 2.50000 0.50000//
>> //W 1.00000 2.00000 0.00000//
>> //W 1.50000 2.50000 0.50000//
>> //W 2.00000 2.00000 0.00000//
>> //W 2.50000 2.50000 0.50000//
>> //W 0.00000 0.00000 1.00000//
>> //W 0.50000 0.50000 1.50000//
>> //W 1.00000 0.00000 1.00000//
>> //W 1.50000 0.50000 1.50000//
>> //W 2.00000 0.00000 1.00000//
>> //W 2.50000 0.50000 1.50000//
>> //W 0.00000 1.00000 1.00000//
>> //W 0.50000 1.50000 1.50000//
>> //W 1.00000 1.00000 1.00000//
>> //W 1.50000 1.50000 1.50000//
>> //W 2.00000 1.00000 1.00000//
>> //W 2.50000 1.50000 1.50000//
>> //W 0.00000 2.00000 1.00000//
>> //W 0.50000 2.50000 1.50000//
>> //W 1.00000 2.00000 1.00000//
>> //W 1.50000 2.50000 1.50000//
>> //W 2.00000 2.00000 1.00000//
>> //W 2.50000 2.50000 1.50000//
>> //W 0.00000 0.00000 2.00000//
>> //W 0.50000 0.50000 2.50000//
>> //W 1.00000 0.00000 2.00000//
>> //W 1.50000 0.50000 2.50000//
>> //W 2.00000 0.00000 2.00000//
>> //W 2.50000 0.50000 2.50000//
>> //W 0.00000 1.00000 2.00000//
>> //W 0.50000 1.50000 2.50000//
>> //W 1.00000 1.00000 2.00000//
>> //W 1.50000 1.50000 2.50000//
>> //W 2.00000 1.00000 2.00000//
>> //W 2.50000 1.50000 2.50000//
>> //W 0.00000 2.00000 2.00000//
>> //W 0.50000 2.50000 2.50000//
>> //W 1.00000 2.00000 2.00000//
>> //W 1.50000 2.50000 2.50000//
>> //W 2.00000 2.00000 2.00000//
>> //W 2.50000 2.50000 2.50000//
>> //
>> //K_POINTS {automatic}//
>> //4 4 4 0 0 0//
>> /
>>
>> _______________________________________________
>> The Quantum ESPRESSO community stands by the Ukrainian
>> people and expresses its concerns about the devastating
>> effects that the Russian military offensive has on their
>> country and on the free and peaceful scientific, cultural,
>> and economic cooperation amongst peoples
>> _______________________________________________
>> Quantum ESPRESSO is supported by MaX (www.max-centre.eu)
>> users mailing list users at lists.quantum-espresso.org
>> https://lists.quantum-espresso.org/mailman/listinfo/users
>