[QE-users] Parallel computing of QE7.1 vc-relax crashes when using large number of processors
Xin Jin
xin.tlg.jin at outlook.com
Mon Oct 31 07:46:35 CET 2022
Hello Paolo,
I think the problem happens before the start of the iteration of
self-consistent calculation.
The last output from .out file before the crash is like this:
"Smooth grid: 274793 G-vectors FFT dimensions: ( 81, 81, 81)"
"Estimated max dynamical RAM per process > 594.72 MB"
"Estimated total dynamical RAM > 9.29 GB"
"Initial potential from superposition of free atoms"
"starting charge 755.9699, renormalised to 756.0000"
"Starting wfcs are 702 randomized atomic wfcs"
Thank you.
Best regards,
Xin
On 30/10/2022 09:23, Paolo Giannozzi wrote:
> You get the message when the calculation starts, after initialization,
> after a few scf steps, after a few optimization steps, ... ?
>
> Paolo
>
> On 28/10/2022 14:45, Xin Jin wrote:
>>
>> You don't often get email from xin.tlg.jin at outlook.com. Learn why
>> this is important <https://aka.ms/LearnAboutSenderIdentification>
>>
>>
>> Dear Quantum Espresso Forum,
>>
>> I encountered a problem related to the parallel computing using QE7.1
>> for vc-relax.
>>
>> I was trying to perform a vc-relax for a 3*3*3 BCC tungsten super
>> cell. The code works fine for non-parallel computing, also works fine
>> for parallel computing if the number of processors is smaller than 10.
>>
>> However, if the number of processors is larger than 10, I will get
>> following MPI error:
>> /*** An error occurred in MPI_Comm_free//
>> //*** reported by process [3585895498,2]//
>> //*** on communicator MPI_COMM_WORLD//
>> //*** MPI_ERR_COMM: invalid communicator//
>> //*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now
>> abort,//
>> //*** and potentially your MPI job)/
>>
>> For parallel computing, I am using /OpenMPI/3.1.4-gcccuda/. (In
>> addition, it seems that If I use OpenMPI V4, the simulation speed
>> will be much slower than that of V3.)
>>
>> Another thing is that, if I decrease the size of the supper cell, for
>> example to 2*2*2, then there is no problem in the parallel computing
>> even if I use more than 30 processors.
>>
>> Could you help me look at this problem, please?
>>
>> The input for QE can be found below.
>>
>> Thank you in advance!
>>
>> Xin Jin
>>
>> /&control//
>> //
>> // calculation='vc-relax' //
>> // restart_mode='from_scratch', //
>> // prefix='W_relax', //
>> // pseudo_dir="../../PP_files",//
>> // outdir='./'//
>> //
>> // ///
>> ////
>> //
>> // &system//
>> // ibrav= 0, //
>> // celldm(1)=5.972,//
>> // nat= 54, //
>> // ntyp= 1,//
>> // ecutwfc = 50,//
>> // ecutrho = 500,//
>> // occupations='smearing', smearing='mp', degauss=0.06//
>> // ///
>> //
>> // &electrons//
>> // diagonalization='david',//
>> // conv_thr = 1.0d-8,//
>> // mixing_beta = 0.5,//
>> // ///
>> ////
>> // &ions//
>> // ///
>> //
>> // &cell//
>> // press = 0.0,//
>> // ///
>> ////
>> //ATOMIC_SPECIES//
>> // W 183.84 W.pbe-spn-kjpaw_psl.1.0.0.UPF//
>> ////
>> //CELL_PARAMETERS {alat}//
>> // 3.0 0.0 0.0//
>> // 0.0 3.0 0.0//
>> // 0.0 0.0 3.0 //
>> ////
>> //ATOMIC_POSITIONS {alat}//
>> //W 0.00000 0.00000 0.00000//
>> //W 0.50000 0.50000 0.50000//
>> //W 1.00000 0.00000 0.00000//
>> //W 1.50000 0.50000 0.50000//
>> //W 2.00000 0.00000 0.00000//
>> //W 2.50000 0.50000 0.50000//
>> //W 0.00000 1.00000 0.00000//
>> //W 0.50000 1.50000 0.50000//
>> //W 1.00000 1.00000 0.00000//
>> //W 1.50000 1.50000 0.50000//
>> //W 2.00000 1.00000 0.00000//
>> //W 2.50000 1.50000 0.50000//
>> //W 0.00000 2.00000 0.00000//
>> //W 0.50000 2.50000 0.50000//
>> //W 1.00000 2.00000 0.00000//
>> //W 1.50000 2.50000 0.50000//
>> //W 2.00000 2.00000 0.00000//
>> //W 2.50000 2.50000 0.50000//
>> //W 0.00000 0.00000 1.00000//
>> //W 0.50000 0.50000 1.50000//
>> //W 1.00000 0.00000 1.00000//
>> //W 1.50000 0.50000 1.50000//
>> //W 2.00000 0.00000 1.00000//
>> //W 2.50000 0.50000 1.50000//
>> //W 0.00000 1.00000 1.00000//
>> //W 0.50000 1.50000 1.50000//
>> //W 1.00000 1.00000 1.00000//
>> //W 1.50000 1.50000 1.50000//
>> //W 2.00000 1.00000 1.00000//
>> //W 2.50000 1.50000 1.50000//
>> //W 0.00000 2.00000 1.00000//
>> //W 0.50000 2.50000 1.50000//
>> //W 1.00000 2.00000 1.00000//
>> //W 1.50000 2.50000 1.50000//
>> //W 2.00000 2.00000 1.00000//
>> //W 2.50000 2.50000 1.50000//
>> //W 0.00000 0.00000 2.00000//
>> //W 0.50000 0.50000 2.50000//
>> //W 1.00000 0.00000 2.00000//
>> //W 1.50000 0.50000 2.50000//
>> //W 2.00000 0.00000 2.00000//
>> //W 2.50000 0.50000 2.50000//
>> //W 0.00000 1.00000 2.00000//
>> //W 0.50000 1.50000 2.50000//
>> //W 1.00000 1.00000 2.00000//
>> //W 1.50000 1.50000 2.50000//
>> //W 2.00000 1.00000 2.00000//
>> //W 2.50000 1.50000 2.50000//
>> //W 0.00000 2.00000 2.00000//
>> //W 0.50000 2.50000 2.50000//
>> //W 1.00000 2.00000 2.00000//
>> //W 1.50000 2.50000 2.50000//
>> //W 2.00000 2.00000 2.00000//
>> //W 2.50000 2.50000 2.50000//
>> //
>> //K_POINTS {automatic}//
>> //4 4 4 0 0 0//
>> /
>>
>> _______________________________________________
>> The Quantum ESPRESSO community stands by the Ukrainian
>> people and expresses its concerns about the devastating
>> effects that the Russian military offensive has on their
>> country and on the free and peaceful scientific, cultural,
>> and economic cooperation amongst peoples
>> _______________________________________________
>> Quantum ESPRESSO is supported by MaX (www.max-centre.eu)
>> users mailing list users at lists.quantum-espresso.org
>> https://lists.quantum-espresso.org/mailman/listinfo/users
>
More information about the users
mailing list