[QE-users] Segmentation fault in qe-7.3 with DFT+U
Angus Gentles
Angus.Gentles at ams-osram.com
Wed Jun 5 10:05:52 CEST 2024
Dear all,
I am getting a segmentation fault error when using DFT+U calculations as below.
[n3511-027:2614825] 127 more processes have sent help message help-mpi-btl-openib.txt / error in device init
[n3511-027:2614825] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
[n3511-027:2614833:0:2614833] Caught signal 11 (Segmentation fault: address not mapped to object at address 0xffffffff80a1adb8)
==== backtrace (tid:2614833) ====
0 /lib64/libucs.so.0(ucs_handle_error+0x2dc) [0x14bef008cedc]
1 /lib64/libucs.so.0(+0x2b0bc) [0x14bef008d0bc]
2 /lib64/libucs.so.0(+0x2b28a) [0x14bef008d28a]
3 /home/fs71766/waldhoer02/data/tools/vsc5/base_libs/install/openmpi-4.1.1/lib/libmpi.so.40(MPI_Bcast+0x58) [0x14bf04d4e598]
4 /gpfs/opt/sw/vsc4/VSC/x86_64/glibc-2.17/skylake/intel/oneapi/mkl/2022.0.1/lib/intel64/libmkl_blacs_openmpi_lp64.so.2(MKLMPI_Bcast+0xdd) [0x14bf0c5a6dfd]
5 /gpfs/opt/sw/vsc4/VSC/x86_64/glibc-2.17/skylake/intel/oneapi/mkl/2022.0.1/lib/intel64/libmkl_scalapack_lp64.so.2(PB_CpgemmMPI+0x1097) [0x14bf0c1bbad7]
6 /gpfs/opt/sw/vsc4/VSC/x86_64/glibc-2.17/skylake/intel/oneapi/mkl/2022.0.1/lib/intel64/libmkl_scalapack_lp64.so.2(pdgemm_+0xda7) [0x14bf0c21ff07]
7 /gpfs/opt/sw/vsc4/VSC/x86_64/glibc-2.17/skylake/intel/oneapi/mkl/2022.0.1/lib/intel64/libmkl_scalapack_lp64.so.2(pdlaed1_+0x7cd) [0x14bf0bcc6dbd]
8 /gpfs/opt/sw/vsc4/VSC/x86_64/glibc-2.17/skylake/intel/oneapi/mkl/2022.0.1/lib/intel64/libmkl_scalapack_lp64.so.2(pdlaed0_+0x9a1) [0x14bf0bcc6551]
9 /gpfs/opt/sw/vsc4/VSC/x86_64/glibc-2.17/skylake/intel/oneapi/mkl/2022.0.1/lib/intel64/libmkl_scalapack_lp64.so.2(pdstedc_+0x639) [0x14bf0bcd0f19]
10 /gpfs/opt/sw/vsc4/VSC/x86_64/glibc-2.17/skylake/intel/oneapi/mkl/2022.0.1/lib/intel64/libmkl_scalapack_lp64.so.2(mkl_pzheevd0_+0xf99) [0x14bf0bf377c9]
11 /gpfs/opt/sw/vsc4/VSC/x86_64/glibc-2.17/skylake/intel/oneapi/mkl/2022.0.1/lib/intel64/libmkl_scalapack_lp64.so.2(mkl_pzheevdm_+0xb99) [0x14bf0bf36379]
12 /gpfs/opt/sw/vsc4/VSC/x86_64/glibc-2.17/skylake/intel/oneapi/mkl/2022.0.1/lib/intel64/libmkl_scalapack_lp64.so.2(pzheevd_+0x3ca) [0x14bf0bf3553a]
13 pw.x() [0xbb9d5a]
14 pw.x() [0xb9bb82]
15 pw.x() [0x71de4b]
16 pw.x() [0x5a03a9]
17 pw.x() [0x5a47c4]
18 pw.x() [0x412a44]
19 pw.x() [0x41caf9]
20 pw.x() [0x4f955b]
21 pw.x() [0x40688c]
22 pw.x() [0x4065cd]
23 /lib64/libc.so.6(__libc_start_main+0xe5) [0x14bf033d8d85]
24 pw.x() [0x40660e]
=================================
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:
#0 0x14bf033ecb4f in ???
#1 0x14bf04d4e598 in ompi_comm_invalid
at ../../../../ompi/communicator/communicator.h:341
#2 0x14bf04d4e598 in PMPI_Bcast
at /home/fs71766/waldhoer02/data/tools/vsc5/base_libs/build/openmpi-4.1.1/ompi/mpi/c/profile/pbcast.c:72
#3 0x14bf0c5a6dfc in ???
#4 0x14bf0c1bbad6 in ???
#5 0x14bf0c21ff06 in ???
#6 0x14bf0bcc6dbc in ???
#7 0x14bf0bcc6550 in ???
#8 0x14bf0bcd0f18 in ???
#9 0x14bf0bf377c8 in ???
#10 0x14bf0bf36378 in ???
#11 0x14bf0bf35539 in ???
#12 0xbb9d59 in __zhpev_module_MOD_pzheevd_drv
at /home/fs71287/gentles/tools/sc-qe-7.3/LAXlib/zhpev_drv.f90:1562
#13 0xb9bb81 in laxlib_pcdiaghg_
at /home/fs71287/gentles/tools/sc-qe-7.3/LAXlib/cdiaghg.f90:587
#14 0x71de4a in pcegterg_
at /home/fs71287/gentles/tools/sc-qe-7.3/KS_Solvers/Davidson/cegterg.f90:944
#15 0x5a03a8 in diag_bands_k
at /home/fs71287/gentles/tools/sc-qe-7.3/PW/src/c_bands.f90:1030
#16 0x5a03a8 in diag_bands_
at /home/fs71287/gentles/tools/sc-qe-7.3/PW/src/c_bands.f90:322
#17 0x5a47c3 in c_bands_
at /home/fs71287/gentles/tools/sc-qe-7.3/PW/src/c_bands.f90:132
#18 0x412a43 in electrons_scf_
at /home/fs71287/gentles/tools/sc-qe-7.3/PW/src/electrons.f90:689
#19 0x41caf8 in electrons_
at /home/fs71287/gentles/tools/sc-qe-7.3/PW/src/electrons.f90:192
#20 0x4f955a in run_pwscf_
at /home/fs71287/gentles/tools/sc-qe-7.3/PW/src/run_pwscf.f90:189
#21 0x40688b in pwscf
at /home/fs71287/gentles/tools/sc-qe-7.3/PW/src/pwscf.f90:85
#22 0x4065cc in main
at /home/fs71287/gentles/tools/sc-qe-7.3/PW/src/pwscf.f90:40
I have been using 128 atom supercells with DFT+U calculations, Using InGaAsSb. The version is qe-7.3. I have seen a few previous versions having similar problems, but I am not sure if their fixes will be a appropriate given the changes in the DFT+U codes. It seems to be a problem with loading the environment. I am using supercomputers with 128 processors. The input file is:
&CONTROL
calculation = 'vc-relax',
disk_io = 'low',
etot_conv_thr = 1d-05,
forc_conv_thr = 0.001,
outdir = './tmp_In0.0Ga1.0As0.75Sb0.25_4x4x4',
prefix = 'In0.0Ga1.0As0.75Sb0.25_4x4x4',
pseudo_dir = '/home/fs71287/gentles/data/pseudos/',
restart_mode = 'from_scratch',
verbosity = 'low',
/
&SYSTEM
celldm(1) = 43.6265,
degauss = 0.0001,
ecutwfc = 100,
ibrav = 2,
lspinorb = .TRUE.,
nat = 128,
nbnd = 1792,
noncolin = .TRUE.,
ntyp = 20,
occupations = 'smearing',
/
&ELECTRONS
conv_thr = 1d-05,
mixing_beta = 0.65,
/
&IONS
/
&CELL
cell_dofree = 'ibrav',
/
ATOMIC_SPECIES
As1 74.9216 As.pbe.NC-FR.standard.v0.4.UPF
As0 74.9216 As.pbe.NC-FR.standard.v0.4.UPF
As2 74.9216 As.pbe.NC-FR.standard.v0.4.UPF
As3 74.9216 As.pbe.NC-FR.standard.v0.4.UPF
As4 74.9216 As.pbe.NC-FR.standard.v0.4.UPF
Ga1 69.7230 Ga.pbe.NC-FR.standard.v0.4.UPF
Ga0 69.7230 Ga.pbe.NC-FR.standard.v0.4.UPF
Ga2 69.7230 Ga.pbe.NC-FR.standard.v0.4.UPF
Ga3 69.7230 Ga.pbe.NC-FR.standard.v0.4.UPF
Ga4 69.7230 Ga.pbe.NC-FR.standard.v0.4.UPF
In1 114.8180 In.pbe.NC-FR.standard.v0.4.UPF
In0 114.8180 In.pbe.NC-FR.standard.v0.4.UPF
In2 114.8180 In.pbe.NC-FR.standard.v0.4.UPF
In3 114.8180 In.pbe.NC-FR.standard.v0.4.UPF
In4 114.8180 In.pbe.NC-FR.standard.v0.4.UPF
Sb1 121.7600 Sb.pbe.NC-FR.standard.v0.4.UPF
Sb0 121.7600 Sb.pbe.NC-FR.standard.v0.4.UPF
Sb2 121.7600 Sb.pbe.NC-FR.standard.v0.4.UPF
Sb3 121.7600 Sb.pbe.NC-FR.standard.v0.4.UPF
Sb4 121.7600 Sb.pbe.NC-FR.standard.v0.4.UPF
ATOMIC_POSITIONS crystal
Ga4 0.000000 0.000000 0.000000
As4 0.062500 0.062500 0.062500
Ga3 0.000000 0.000000 0.250000
As4 0.062500 0.062500 0.312500
Ga4 0.000000 0.000000 0.500000
As4 0.062500 0.062500 0.562500
Ga4 0.000000 0.000000 0.750000
As4 0.062500 0.062500 0.812500
Ga4 0.000000 0.250000 0.000000
As4 0.062500 0.312500 0.062500
Ga4 0.000000 0.250000 0.250000
As4 0.062500 0.312500 0.312500
Ga2 0.000000 0.250000 0.500000
Sb4 0.062500 0.312500 0.562500
Ga2 0.000000 0.250000 0.750000
As4 0.062500 0.312500 0.812500
Ga4 0.000000 0.500000 0.000000
As4 0.062500 0.562500 0.062500
Ga4 0.000000 0.500000 0.250000
As4 0.062500 0.562500 0.312500
Ga1 0.000000 0.500000 0.500000
Sb4 0.062500 0.562500 0.562500
Ga3 0.000000 0.500000 0.750000
As4 0.062500 0.562500 0.812500
Ga4 0.000000 0.750000 0.000000
As4 0.062500 0.812500 0.062500
Ga2 0.000000 0.750000 0.250000
Sb4 0.062500 0.812500 0.312500
Ga2 0.000000 0.750000 0.500000
As4 0.062500 0.812500 0.562500
Ga4 0.000000 0.750000 0.750000
As4 0.062500 0.812500 0.812500
Ga4 0.250000 0.000000 0.000000
As4 0.312500 0.062500 0.062500
Ga3 0.250000 0.000000 0.250000
As4 0.312500 0.062500 0.312500
Ga4 0.250000 0.000000 0.500000
As4 0.312500 0.062500 0.562500
Ga4 0.250000 0.000000 0.750000
As4 0.312500 0.062500 0.812500
Ga3 0.250000 0.250000 0.000000
As4 0.312500 0.312500 0.062500
Ga4 0.250000 0.250000 0.250000
As4 0.312500 0.312500 0.312500
Ga2 0.250000 0.250000 0.500000
Sb4 0.312500 0.312500 0.562500
Ga2 0.250000 0.250000 0.750000
Sb4 0.312500 0.312500 0.812500
Ga4 0.250000 0.500000 0.000000
As4 0.312500 0.562500 0.062500
Ga4 0.250000 0.500000 0.250000
As4 0.312500 0.562500 0.312500
Ga1 0.250000 0.500000 0.500000
Sb4 0.312500 0.562500 0.562500
Ga2 0.250000 0.500000 0.750000
As4 0.312500 0.562500 0.812500
Ga4 0.250000 0.750000 0.000000
As4 0.312500 0.812500 0.062500
Ga2 0.250000 0.750000 0.250000
Sb4 0.312500 0.812500 0.312500
Ga2 0.250000 0.750000 0.500000
As4 0.312500 0.812500 0.562500
Ga4 0.250000 0.750000 0.750000
As4 0.312500 0.812500 0.812500
Ga4 0.500000 0.000000 0.000000
As4 0.562500 0.062500 0.062500
Ga3 0.500000 0.000000 0.250000
As4 0.562500 0.062500 0.312500
Ga4 0.500000 0.000000 0.500000
As4 0.562500 0.062500 0.562500
Ga4 0.500000 0.000000 0.750000
As4 0.562500 0.062500 0.812500
Ga3 0.500000 0.250000 0.000000
As4 0.562500 0.312500 0.062500
Ga4 0.500000 0.250000 0.250000
As4 0.562500 0.312500 0.312500
Ga2 0.500000 0.250000 0.500000
Sb4 0.562500 0.312500 0.562500
Ga1 0.500000 0.250000 0.750000
Sb4 0.562500 0.312500 0.812500
Ga4 0.500000 0.500000 0.000000
As4 0.562500 0.562500 0.062500
Ga3 0.500000 0.500000 0.250000
Sb4 0.562500 0.562500 0.312500
Ga0 0.500000 0.500000 0.500000
Sb4 0.562500 0.562500 0.562500
Ga2 0.500000 0.500000 0.750000
As4 0.562500 0.562500 0.812500
Ga4 0.500000 0.750000 0.000000
As4 0.562500 0.812500 0.062500
Ga1 0.500000 0.750000 0.250000
Sb4 0.562500 0.812500 0.312500
Ga2 0.500000 0.750000 0.500000
As4 0.562500 0.812500 0.562500
Ga4 0.500000 0.750000 0.750000
As4 0.562500 0.812500 0.812500
Ga4 0.750000 0.000000 0.000000
As4 0.812500 0.062500 0.062500
Ga3 0.750000 0.000000 0.250000
As4 0.812500 0.062500 0.312500
Ga4 0.750000 0.000000 0.500000
As4 0.812500 0.062500 0.562500
Ga4 0.750000 0.000000 0.750000
As4 0.812500 0.062500 0.812500
Ga3 0.750000 0.250000 0.000000
As4 0.812500 0.312500 0.062500
Ga4 0.750000 0.250000 0.250000
As4 0.812500 0.312500 0.312500
Ga2 0.750000 0.250000 0.500000
Sb4 0.812500 0.312500 0.562500
Ga1 0.750000 0.250000 0.750000
Sb4 0.812500 0.312500 0.812500
Ga4 0.750000 0.500000 0.000000
As4 0.812500 0.562500 0.062500
Ga3 0.750000 0.500000 0.250000
As4 0.812500 0.562500 0.312500
Ga1 0.750000 0.500000 0.500000
Sb4 0.812500 0.562500 0.562500
Ga2 0.750000 0.500000 0.750000
As4 0.812500 0.562500 0.812500
Ga4 0.750000 0.750000 0.000000
As4 0.812500 0.812500 0.062500
Ga2 0.750000 0.750000 0.250000
Sb4 0.812500 0.812500 0.312500
Ga2 0.750000 0.750000 0.500000
As4 0.812500 0.812500 0.562500
Ga4 0.750000 0.750000 0.750000
As4 0.812500 0.812500 0.812500
K_POINTS automatic
3 3 3 0 0 0
HUBBARD (atomic)
U In0-5p -2.24
U Ga0-4p -2.04
U As0-4p 4.02
U Sb0-5p 4.43
U In1-5p -2.2575000000000003
U Ga1-4p -2.39
U As1-4p 4.5424999999999995
U Sb1-5p 4.7825
U In2-5p -2.2750000000000004
U Ga2-4p -2.74
U As2-4p 5.0649999999999995
U Sb2-5p 5.135
U In3-5p -2.2925
U Ga3-4p -3.09
U As3-4p 5.5875
U Sb3-5p 5.4875
U In4-5p -2.31
U Ga4-4p -3.44
U As4-4p 6.11
U Sb4-5p 5.84
The output file gets through several optimisation steps and then stalls at an scf iteration, as below:
Number of occupied Hubbard levels = 615.3045
total cpu time spent up to now is 166039.0 secs
total energy = -22879.33720322 Ry
estimated scf accuracy < 0.00963404 Ry
iteration # 2 ecut= 100.00 Ry beta= 0.65
Davidson diagonalization with overlap
ethr = 5.38E-07, avg # of iterations = 6.0
total cpu time spent up to now is 172245.2 secs
total energy = -22879.33685019 Ry
estimated scf accuracy < 0.03584310 Ry
iteration # 3 ecut= 100.00 Ry beta= 0.65
Davidson diagonalization with overlap
I am relatively confident that this isn't a problem with the memory of the computer being overcome. I get problems on smaller cells and bands calculations. Any help is appreciated.
With kind regards, I am
Angus Gentles
ams-OSRAM
Intitute of Microelectronics, TU Wien
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20240605/0220daf7/attachment.html>
More information about the users
mailing list