[QE-users] Turbo_eels.x crashing
Iurii TIMROV
iurii.timrov at epfl.ch
Thu Oct 7 17:54:05 CEST 2021
Dear Elio,
The problem occurs when one uses a GNU compiler with QE6.7 because there is a bug. In QE6.8 this bug was fixed. Please note that there is no problem in both versions of QE when an INTEL compiler is used.
So please use QE6.8.
Greetings,
Iurii
--
Dr. Iurii TIMROV
Senior Research Scientist
Theory and Simulation of Materials (THEOS)
Swiss Federal Institute of Technology Lausanne (EPFL)
CH-1015 Lausanne, Switzerland
+41 21 69 34 881
http://people.epfl.ch/265334
________________________________
From: Elio Physics <elio-physics at live.com>
Sent: Wednesday, October 6, 2021 9:22:14 PM
To: Iurii TIMROV; Quantum ESPRESSO users Forum
Subject: Re: Turbo_eels.x crashing
Dear Dr. iurii,
Reducing the number of nodes and /or the k-points does not help. Commenting out the 'noncolin' and 'lspinorb' flags solves the issue, no matter what the number of Kpoints and ecut values used.
Regards
________________________________
From: Iurii TIMROV <iurii.timrov at epfl.ch>
Sent: Wednesday, October 6, 2021 11:54 AM
To: Elio Physics <elio-physics at live.com>; Iurii TIMROV <iurii.timrov at epfl.ch>; Quantum ESPRESSO users Forum <users at lists.quantum-espresso.org>
Subject: Re: Turbo_eels.x crashing
Dear Elio,
> The problem is due to the fact that you use k-points pools for the pw.x run but you don't use k-points pools for the turbo_eels.x run. Please note that the two calculations must be parallelized in exactly the same way. So try the following:
> mpirun -np 64 pw.x -nk 2 < pwscf.in > pwscf.out
> mpirun -np 64 turbo_eels.x -nk 2 < eels.in > eels.out
Actually this is no longer needed. So the problem is somewhere else. Your example works for me with ifort (IFORT) 19.1.3.304 20200925 on my workstation with 8 cores.
In your files (PAl-turbo-eels.e1813454) it is written:
"Program received signal SIGSEGV: Segmentation fault - invalid memory reference."
"#4 0x40DD8C in lr_alloc_init_"
The subroutine "lr_alloc_init" allocates arrays and it seems that the code crashes at this stage.
> After reproducing the examples in the TDFPPT directory (copied the exact same input files) , I realized that the source of the error is the spin orbit coupling flags. The program exited normally when the SOC was not included. However , when the SOC is considered, the code crashes and exits
Can you try with SOC but by lowering the cutoff down to e.g. 50 Ry and reducing the k-mesh down to 6x6x1? Maybe there is not enough RAM in your case when SOC is included?
Do you manage to run example 16, which includes SOC and uses NC PP?
Iurii
--
Dr. Iurii TIMROV
Senior Research Scientist
Theory and Simulation of Materials (THEOS)
Swiss Federal Institute of Technology Lausanne (EPFL)
CH-1015 Lausanne, Switzerland
+41 21 69 34 881
http://people.epfl.ch/265334
________________________________
From: users <users-bounces at lists.quantum-espresso.org> on behalf of Iurii TIMROV via users <users at lists.quantum-espresso.org>
Sent: Wednesday, October 6, 2021 2:10:14 PM
To: Elio Physics; Quantum ESPRESSO users Forum
Subject: Re: [QE-users] Turbo_eels.x crashing
Dear Elio,
The problem is due to the fact that you use k-points pools for the pw.x run but you don't use k-points pools for the turbo_eels.x run. Please note that the two calculations must be parallelized in exactly the same way. So try the following:
mpirun -np 64 pw.x -nk 2 < pwscf.in > pwscf.out
mpirun -np 64 turbo_eels.x -nk 2 < eels.in > eels.out
Other comments:
> wf_collect=.true.
This is no longer supported/need in the latest versions of QE
> occupations='fixed',
> smearing='mp'
Since the occupations are fixed, you don't need to specify smearing='mp'
> nbnd=55
There is not need to specify this: the turbo_eels.x code needs only occupied KS states and these are determined automatically by the pw.x code
> conv_thr=1.D-8
I would recommend to push it to 1.D-10 - 1.D-15
> Pd 106.42 Pd.rel-pbe-n-nc.UPF
> S 32.065 S.rel-pbe-n-nc.UPF
Norm-conserving pseudopotentials from the PSlibrary are not tested, as far as I know. So be careful when using these pseudopotentials. You can try to use fully-relativistic norm-conserving pseudopotentials from the PseudoDojo library.
HTH
Greetings,
Iurii
--
Dr. Iurii TIMROV
Senior Research Scientist
Theory and Simulation of Materials (THEOS)
Swiss Federal Institute of Technology Lausanne (EPFL)
CH-1015 Lausanne, Switzerland
+41 21 69 34 881
http://people.epfl.ch/265334
________________________________
From: Elio Physics <elio-physics at live.com>
Sent: Wednesday, October 6, 2021 4:52:50 AM
To: Iurii TIMROV; Quantum Espresso users Forum
Subject: Re: Turbo_eels.x crashing
Dear Dr. Iurii,
Once again , thanks for your quick reply.
>>>>>>>Which version of Quantum ESPRESSO do you use? Can you try the latest version (v6.8)?
QE-6.7MaX
>>>>>>Which compiler and libraries do you use?
F90 (mpif90 wrapper).
module load gcc-5.3.0 mpi/openmpi-3.0.0/gcc-5.3.0
module load lapack-3.7.1
>>>> Do you use norm-conserving or ultrasoft pseudopotentials?
Norm conserving with SOC
I have shared with you a folder with all input, output files and pseudopotentials.
Thank you for your help.
Regards
________________________________
From: Iurii TIMROV <iurii.timrov at epfl.ch>
Sent: Tuesday, October 5, 2021 6:16 AM
To: Elio Physics <elio-physics at live.com>; Quantum ESPRESSO users Forum <users at lists.quantum-espresso.org>
Subject: Re: Turbo_eels.x crashing
1. Which version of Quantum ESPRESSO do you use? Can you try the latest version (v6.8)?
2. Which compiler and libraries do you use?
3. Do you use norm-conserving or ultrasoft pseudopotentials?
4. Can you share all your input and output files (with the right cutoff etc.) and the pseudopotentials via some shared folder (e.g. Google Drive)? I will try to reproduce your problem using QE v6.8 and investigate what is happening.
Greetings,
Iurii
--
Dr. Iurii TIMROV
Senior Research Scientist
Theory and Simulation of Materials (THEOS)
Swiss Federal Institute of Technology Lausanne (EPFL)
CH-1015 Lausanne, Switzerland
+41 21 69 34 881
http://people.epfl.ch/265334
________________________________
From: Elio Physics <elio-physics at live.com>
Sent: Monday, October 4, 2021 10:55:47 PM
To: Iurii TIMROV; Quantum ESPRESSO users Forum
Subject: Re: Turbo_eels.x crashing
After reproducing the examples in the TDFPPT directory (copied the exact same input files) , I realized that the source of the error is the spin orbit coupling flags. The program exited normally when the SOC was not included. However , when the SOC is considered, the code crashes and exits with the same error:
"Backtrace for this error:
--------------------------------------------------------------------------
A process has executed an operation involving a call to the
"fork()" system call to create a child process. Open MPI is currently
operating in a condition that could result in memory corruption or
other system errors; your job may hang, crash, or produce silent
data corruption. The use of fork() (or system() or other calls that
create child processes) is strongly discouraged.
4 0x40DD8C in lr_alloc_init_
#4 0x40DD8C in lr_alloc_init_
#4 0x40DD8C in lr_alloc_init_
#4 0x40DD8C in lr_alloc_init_
#5 0x404CFE in MAIN__ at lr_eels_main.f90:113
#5 0x404CFE in MAIN__ at lr_eels_main.f90:113
#4 0x40DD8C in lr_alloc_init_
#5 0x404CFE in MAIN__ at lr_eels_main.f90:113
#4 0x40DD8C in lr_alloc_init_
#5 0x404CFE in MAIN__ at lr_eels_main.f90:113
#5 0x404CFE in MAIN__ at lr_eels_main.f90:113
#5 0x404CFE in MAIN__ at lr_eels_main.f90:113
#4 0x40DD8C in lr_alloc_init_
#4 0x40DD8C in lr_alloc_init_
#5 0x404CFE in MAIN__ at lr_eels_main.f90:113
#5 0x404CFE in MAIN__ at lr_eels_main.f90:113
--------------------------------------------------------------------------
mpirun noticed that process rank 3 with PID 0 on node kcn159 exited on signal 11 (Segmentation fault)."
I tried to look if there was some additional flags in the scf input file, other than those for a usual scf calculation, but I could not find any. I do not really know what is going on.
Any clue would be highly appreciated.
Regards
________________________________
From: users <users-bounces at lists.quantum-espresso.org> on behalf of Elio Physics <elio-physics at live.com>
Sent: Monday, October 4, 2021 12:10 PM
To: Iurii TIMROV <iurii.timrov at epfl.ch>; Quantum ESPRESSO users Forum <users at lists.quantum-espresso.org>
Subject: Re: [QE-users] Turbo_eels.x crashing
The ecutwfc is not zero. It is 90Ry. It was incorrectly taken out while deleting some of the input.
I will try to reproduce the examples given in the TDFFPT directory. If these don't work, I will get back to you.
Thank you for your time.
________________________________
From: Iurii TIMROV <iurii.timrov at epfl.ch>
Sent: Monday, October 4, 2021 11:55 AM
To: Elio Physics <elio-physics at live.com>; Quantum ESPRESSO users Forum <users at lists.quantum-espresso.org>
Subject: Re: Turbo_eels.x crashing
> ecutwfc=0
The kinetic-energy cutoff cannot be zero.
Try the QE input generator and then adjust the input to your needs (add noncolin=.true., lspinorb=.true. etc.):
https://www.materialscloud.org/work/tools/qeinputgenerator
HTH
Iurii
--
Dr. Iurii TIMROV
Senior Research Scientist
Theory and Simulation of Materials (THEOS)
Swiss Federal Institute of Technology Lausanne (EPFL)
CH-1015 Lausanne, Switzerland
+41 21 69 34 881
http://people.epfl.ch/265334
________________________________
From: Elio Physics <elio-physics at live.com>
Sent: Monday, October 4, 2021 3:29:17 PM
To: Iurii TIMROV; Quantum Espresso users Forum
Subject: Re: Turbo_eels.x crashing
Dear Dr. Timrov,
thanks for your reply. The pw.x input is as follows:
&control
prefix='pds2'
calculation='scf',
restart_mode='from_scratch',
tstress=.true.,
tprnfor=.true.,
nstep=2000
pseudo_dir = '/fefs1/physics/eamoujaes/PSEUDO-CREATED',
wf_collect=.true.,
outdir='/',
etot_conv_thr=1.0D-5
forc_conv_thr=2.5D-4
/
&system
ibrav=4 ,A=4.376, C=18 nat=3, ntyp= 2, ecutwfc=0 , nbnd=55, occupations='smearing', noncolin=.true., lspinorb=.true., smearing='mp', degauss=0.005
/
&electrons
electron_maxstep=750
conv_thr=1.D-8,
mixing_beta=0.2D0,
mixing_mode='plain',
diago_david_ndim=2,
/
&ions
ion_dynamics ='bfgs'
/
&cell
cell_dynamics='bfgs'
press=0.0
cell_dofree='2Dxy'
/
ATOMIC_SPECIES
Pd 106.42 Pd.rel-pbe-n-nc.UPF
S 32.065 S.rel-pbe-n-nc.UPF
ATOMIC_POSITIONS crystal
.
.
.
K_POINTS {automatic}
10 10 1 0 0 0
________________________________
From: users <users-bounces at lists.quantum-espresso.org> on behalf of Iurii TIMROV via users <users at lists.quantum-espresso.org>
Sent: Monday, October 4, 2021 3:29 AM
To: Quantum Espresso users Forum <users at lists.quantum-espresso.org>
Subject: Re: [QE-users] Turbo_eels.x crashing
Dear Elie,
Can you provide please your input file for the pw.x code?
Greetings,
Iurii
--
Dr. Iurii TIMROV
Senior Research Scientist
Theory and Simulation of Materials (THEOS)
Swiss Federal Institute of Technology Lausanne (EPFL)
CH-1015 Lausanne, Switzerland
+41 21 69 34 881
http://people.epfl.ch/265334
________________________________
From: users <users-bounces at lists.quantum-espresso.org> on behalf of Elio Physics <Elio-Physics at live.com>
Sent: Sunday, October 3, 2021 12:21:47 AM
To: Quantum Espresso users Forum
Subject: [QE-users] Turbo_eels.x crashing
Dear all,
I am trying to calculate the absorption spectrum using the turbo_eels.x toolset . My system is metallic and has noncollinear spin-orbit coupling switched on, which means I cannot use neither turbo_davidson nor turbo_lanczos. For that matter , my turbo_eels input is :
&lr_input
prefix='pds'
outdir='/'
restart_step=50
restart=.false.
/
&lr_control
calculator='lanczos'
itermax=500
q1=0.001
q2=0.000
q3=0.000
/
The code starts by doing an nscf calculation but , right after that, it crashes. Looking at the error file, I found the following:
program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:
#0 0x2B22A841FB97
#1 0x2B22A841ED90
#2 0x2B22A906091F
#3 0x2B22A90B1C76
#4 0x40DD8C in lr_alloc_init_
#4 0x40DD8C in lr_alloc_init_
#4 0x40DD8C in lr_alloc_init_
#5 0x404CFE in MAIN__ at lr_eels_main.f90:113
#5 0x404CFE in MAIN__ at lr_eels_main.f90:113
.
.
.
#5 0x404CFE in MAIN__ at lr_eels_main.f90:113
#0 0x2AC6091BCB97
#1 0x2AC6091BBD90
#5 0x404CFE in MAIN__ at lr_eels_main.f90:113
#2 0x2AC609DFD91F
#3 0x2AC609E4EC76
#4 0x40DD8C in lr_alloc_init_
#5 0x404CFE in MAIN__ at lr_eels_main.f90:113
#5 0x404CFE in MAIN__ at lr_eels_main.f90:113
Can anyone please let me know how to circumvent this error?
Thanks in advance
Elie Moujaes
Federal University of Rondonia
Portyo Velho
Brazil
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20211007/6eb9842c/attachment.html>
More information about the users
mailing list