[QE-users] Segmentation fault in qe-7.3 with DFT+U

Angus Gentles Angus.Gentles at ams-osram.com
Wed Jun 5 10:05:52 CEST 2024


Dear all,

I am getting a segmentation fault error when using DFT+U calculations as below.

[n3511-027:2614825] 127 more processes have sent help message help-mpi-btl-openib.txt / error in device init
[n3511-027:2614825] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
[n3511-027:2614833:0:2614833] Caught signal 11 (Segmentation fault: address not mapped to object at address 0xffffffff80a1adb8)
==== backtrace (tid:2614833) ====
 0  /lib64/libucs.so.0(ucs_handle_error+0x2dc) [0x14bef008cedc]
 1  /lib64/libucs.so.0(+0x2b0bc) [0x14bef008d0bc]
 2  /lib64/libucs.so.0(+0x2b28a) [0x14bef008d28a]
 3  /home/fs71766/waldhoer02/data/tools/vsc5/base_libs/install/openmpi-4.1.1/lib/libmpi.so.40(MPI_Bcast+0x58) [0x14bf04d4e598]
 4  /gpfs/opt/sw/vsc4/VSC/x86_64/glibc-2.17/skylake/intel/oneapi/mkl/2022.0.1/lib/intel64/libmkl_blacs_openmpi_lp64.so.2(MKLMPI_Bcast+0xdd) [0x14bf0c5a6dfd]
 5  /gpfs/opt/sw/vsc4/VSC/x86_64/glibc-2.17/skylake/intel/oneapi/mkl/2022.0.1/lib/intel64/libmkl_scalapack_lp64.so.2(PB_CpgemmMPI+0x1097) [0x14bf0c1bbad7]
 6  /gpfs/opt/sw/vsc4/VSC/x86_64/glibc-2.17/skylake/intel/oneapi/mkl/2022.0.1/lib/intel64/libmkl_scalapack_lp64.so.2(pdgemm_+0xda7) [0x14bf0c21ff07]
 7  /gpfs/opt/sw/vsc4/VSC/x86_64/glibc-2.17/skylake/intel/oneapi/mkl/2022.0.1/lib/intel64/libmkl_scalapack_lp64.so.2(pdlaed1_+0x7cd) [0x14bf0bcc6dbd]
 8  /gpfs/opt/sw/vsc4/VSC/x86_64/glibc-2.17/skylake/intel/oneapi/mkl/2022.0.1/lib/intel64/libmkl_scalapack_lp64.so.2(pdlaed0_+0x9a1) [0x14bf0bcc6551]
 9  /gpfs/opt/sw/vsc4/VSC/x86_64/glibc-2.17/skylake/intel/oneapi/mkl/2022.0.1/lib/intel64/libmkl_scalapack_lp64.so.2(pdstedc_+0x639) [0x14bf0bcd0f19]
10  /gpfs/opt/sw/vsc4/VSC/x86_64/glibc-2.17/skylake/intel/oneapi/mkl/2022.0.1/lib/intel64/libmkl_scalapack_lp64.so.2(mkl_pzheevd0_+0xf99) [0x14bf0bf377c9]
11  /gpfs/opt/sw/vsc4/VSC/x86_64/glibc-2.17/skylake/intel/oneapi/mkl/2022.0.1/lib/intel64/libmkl_scalapack_lp64.so.2(mkl_pzheevdm_+0xb99) [0x14bf0bf36379]
12  /gpfs/opt/sw/vsc4/VSC/x86_64/glibc-2.17/skylake/intel/oneapi/mkl/2022.0.1/lib/intel64/libmkl_scalapack_lp64.so.2(pzheevd_+0x3ca) [0x14bf0bf3553a]
13  pw.x() [0xbb9d5a]
14  pw.x() [0xb9bb82]
15  pw.x() [0x71de4b]
16  pw.x() [0x5a03a9]
17  pw.x() [0x5a47c4]
18  pw.x() [0x412a44]
19  pw.x() [0x41caf9]
20  pw.x() [0x4f955b]
21  pw.x() [0x40688c]
22  pw.x() [0x4065cd]
23  /lib64/libc.so.6(__libc_start_main+0xe5) [0x14bf033d8d85]
24  pw.x() [0x40660e]
=================================

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:
#0  0x14bf033ecb4f in ???
#1  0x14bf04d4e598 in ompi_comm_invalid
      at ../../../../ompi/communicator/communicator.h:341
#2  0x14bf04d4e598 in PMPI_Bcast
      at /home/fs71766/waldhoer02/data/tools/vsc5/base_libs/build/openmpi-4.1.1/ompi/mpi/c/profile/pbcast.c:72
#3  0x14bf0c5a6dfc in ???
#4  0x14bf0c1bbad6 in ???
#5  0x14bf0c21ff06 in ???
#6  0x14bf0bcc6dbc in ???
#7  0x14bf0bcc6550 in ???
#8  0x14bf0bcd0f18 in ???
#9  0x14bf0bf377c8 in ???
#10  0x14bf0bf36378 in ???
#11  0x14bf0bf35539 in ???
#12  0xbb9d59 in __zhpev_module_MOD_pzheevd_drv
      at /home/fs71287/gentles/tools/sc-qe-7.3/LAXlib/zhpev_drv.f90:1562
#13  0xb9bb81 in laxlib_pcdiaghg_
      at /home/fs71287/gentles/tools/sc-qe-7.3/LAXlib/cdiaghg.f90:587
#14  0x71de4a in pcegterg_
      at /home/fs71287/gentles/tools/sc-qe-7.3/KS_Solvers/Davidson/cegterg.f90:944
#15  0x5a03a8 in diag_bands_k
      at /home/fs71287/gentles/tools/sc-qe-7.3/PW/src/c_bands.f90:1030
#16  0x5a03a8 in diag_bands_
      at /home/fs71287/gentles/tools/sc-qe-7.3/PW/src/c_bands.f90:322
#17  0x5a47c3 in c_bands_
      at /home/fs71287/gentles/tools/sc-qe-7.3/PW/src/c_bands.f90:132
#18  0x412a43 in electrons_scf_
      at /home/fs71287/gentles/tools/sc-qe-7.3/PW/src/electrons.f90:689
#19  0x41caf8 in electrons_
      at /home/fs71287/gentles/tools/sc-qe-7.3/PW/src/electrons.f90:192
#20  0x4f955a in run_pwscf_
      at /home/fs71287/gentles/tools/sc-qe-7.3/PW/src/run_pwscf.f90:189
#21  0x40688b in pwscf
      at /home/fs71287/gentles/tools/sc-qe-7.3/PW/src/pwscf.f90:85
#22  0x4065cc in main
      at /home/fs71287/gentles/tools/sc-qe-7.3/PW/src/pwscf.f90:40

I have been using 128 atom supercells with DFT+U calculations, Using InGaAsSb. The version is qe-7.3. I have seen a few previous versions having similar problems, but I am not sure if their fixes will be a appropriate given the changes in the DFT+U codes. It seems to be a problem with loading the environment. I am using supercomputers with 128 processors. The input file is:

&CONTROL
  calculation = 'vc-relax',
  disk_io = 'low',
  etot_conv_thr = 1d-05,
  forc_conv_thr = 0.001,
  outdir = './tmp_In0.0Ga1.0As0.75Sb0.25_4x4x4',
  prefix = 'In0.0Ga1.0As0.75Sb0.25_4x4x4',
  pseudo_dir = '/home/fs71287/gentles/data/pseudos/',
  restart_mode = 'from_scratch',
  verbosity = 'low',
/
&SYSTEM
  celldm(1) = 43.6265,
  degauss = 0.0001,
  ecutwfc = 100,
  ibrav = 2,
  lspinorb = .TRUE.,
  nat = 128,
  nbnd = 1792,
  noncolin = .TRUE.,
  ntyp = 20,
  occupations = 'smearing',
/
&ELECTRONS
  conv_thr = 1d-05,
  mixing_beta = 0.65,
/
&IONS
/
&CELL
  cell_dofree = 'ibrav',
/
ATOMIC_SPECIES
  As1  74.9216 As.pbe.NC-FR.standard.v0.4.UPF
  As0  74.9216 As.pbe.NC-FR.standard.v0.4.UPF
  As2  74.9216 As.pbe.NC-FR.standard.v0.4.UPF
  As3  74.9216 As.pbe.NC-FR.standard.v0.4.UPF
  As4  74.9216 As.pbe.NC-FR.standard.v0.4.UPF
  Ga1  69.7230 Ga.pbe.NC-FR.standard.v0.4.UPF
  Ga0  69.7230 Ga.pbe.NC-FR.standard.v0.4.UPF
  Ga2  69.7230 Ga.pbe.NC-FR.standard.v0.4.UPF
  Ga3  69.7230 Ga.pbe.NC-FR.standard.v0.4.UPF
  Ga4  69.7230 Ga.pbe.NC-FR.standard.v0.4.UPF
  In1  114.8180 In.pbe.NC-FR.standard.v0.4.UPF
  In0  114.8180 In.pbe.NC-FR.standard.v0.4.UPF
  In2  114.8180 In.pbe.NC-FR.standard.v0.4.UPF
  In3  114.8180 In.pbe.NC-FR.standard.v0.4.UPF
  In4  114.8180 In.pbe.NC-FR.standard.v0.4.UPF
  Sb1  121.7600 Sb.pbe.NC-FR.standard.v0.4.UPF
  Sb0  121.7600 Sb.pbe.NC-FR.standard.v0.4.UPF
  Sb2  121.7600 Sb.pbe.NC-FR.standard.v0.4.UPF
  Sb3  121.7600 Sb.pbe.NC-FR.standard.v0.4.UPF
  Sb4  121.7600 Sb.pbe.NC-FR.standard.v0.4.UPF
ATOMIC_POSITIONS crystal
  Ga4 0.000000 0.000000 0.000000
  As4 0.062500 0.062500 0.062500
  Ga3 0.000000 0.000000 0.250000
  As4 0.062500 0.062500 0.312500
  Ga4 0.000000 0.000000 0.500000
  As4 0.062500 0.062500 0.562500
  Ga4 0.000000 0.000000 0.750000
  As4 0.062500 0.062500 0.812500
  Ga4 0.000000 0.250000 0.000000
  As4 0.062500 0.312500 0.062500
  Ga4 0.000000 0.250000 0.250000
  As4 0.062500 0.312500 0.312500
  Ga2 0.000000 0.250000 0.500000
  Sb4 0.062500 0.312500 0.562500
  Ga2 0.000000 0.250000 0.750000
  As4 0.062500 0.312500 0.812500
  Ga4 0.000000 0.500000 0.000000
  As4 0.062500 0.562500 0.062500
  Ga4 0.000000 0.500000 0.250000
  As4 0.062500 0.562500 0.312500
  Ga1 0.000000 0.500000 0.500000
  Sb4 0.062500 0.562500 0.562500
  Ga3 0.000000 0.500000 0.750000
  As4 0.062500 0.562500 0.812500
  Ga4 0.000000 0.750000 0.000000
  As4 0.062500 0.812500 0.062500
  Ga2 0.000000 0.750000 0.250000
  Sb4 0.062500 0.812500 0.312500
  Ga2 0.000000 0.750000 0.500000
  As4 0.062500 0.812500 0.562500
  Ga4 0.000000 0.750000 0.750000
  As4 0.062500 0.812500 0.812500
  Ga4 0.250000 0.000000 0.000000
  As4 0.312500 0.062500 0.062500
  Ga3 0.250000 0.000000 0.250000
  As4 0.312500 0.062500 0.312500
  Ga4 0.250000 0.000000 0.500000
  As4 0.312500 0.062500 0.562500
  Ga4 0.250000 0.000000 0.750000
  As4 0.312500 0.062500 0.812500
  Ga3 0.250000 0.250000 0.000000
  As4 0.312500 0.312500 0.062500
  Ga4 0.250000 0.250000 0.250000
  As4 0.312500 0.312500 0.312500
  Ga2 0.250000 0.250000 0.500000
  Sb4 0.312500 0.312500 0.562500
  Ga2 0.250000 0.250000 0.750000
  Sb4 0.312500 0.312500 0.812500
  Ga4 0.250000 0.500000 0.000000
  As4 0.312500 0.562500 0.062500
  Ga4 0.250000 0.500000 0.250000
  As4 0.312500 0.562500 0.312500
  Ga1 0.250000 0.500000 0.500000
  Sb4 0.312500 0.562500 0.562500
  Ga2 0.250000 0.500000 0.750000
  As4 0.312500 0.562500 0.812500
  Ga4 0.250000 0.750000 0.000000
  As4 0.312500 0.812500 0.062500
  Ga2 0.250000 0.750000 0.250000
  Sb4 0.312500 0.812500 0.312500
  Ga2 0.250000 0.750000 0.500000
  As4 0.312500 0.812500 0.562500
  Ga4 0.250000 0.750000 0.750000
  As4 0.312500 0.812500 0.812500
  Ga4 0.500000 0.000000 0.000000
  As4 0.562500 0.062500 0.062500
  Ga3 0.500000 0.000000 0.250000
  As4 0.562500 0.062500 0.312500
  Ga4 0.500000 0.000000 0.500000
  As4 0.562500 0.062500 0.562500
  Ga4 0.500000 0.000000 0.750000
  As4 0.562500 0.062500 0.812500
  Ga3 0.500000 0.250000 0.000000
  As4 0.562500 0.312500 0.062500
  Ga4 0.500000 0.250000 0.250000
  As4 0.562500 0.312500 0.312500
  Ga2 0.500000 0.250000 0.500000
  Sb4 0.562500 0.312500 0.562500
  Ga1 0.500000 0.250000 0.750000
  Sb4 0.562500 0.312500 0.812500
  Ga4 0.500000 0.500000 0.000000
  As4 0.562500 0.562500 0.062500
  Ga3 0.500000 0.500000 0.250000
  Sb4 0.562500 0.562500 0.312500
  Ga0 0.500000 0.500000 0.500000
  Sb4 0.562500 0.562500 0.562500
  Ga2 0.500000 0.500000 0.750000
  As4 0.562500 0.562500 0.812500
  Ga4 0.500000 0.750000 0.000000
  As4 0.562500 0.812500 0.062500
  Ga1 0.500000 0.750000 0.250000
  Sb4 0.562500 0.812500 0.312500
  Ga2 0.500000 0.750000 0.500000
  As4 0.562500 0.812500 0.562500
  Ga4 0.500000 0.750000 0.750000
  As4 0.562500 0.812500 0.812500
  Ga4 0.750000 0.000000 0.000000
  As4 0.812500 0.062500 0.062500
  Ga3 0.750000 0.000000 0.250000
  As4 0.812500 0.062500 0.312500
  Ga4 0.750000 0.000000 0.500000
  As4 0.812500 0.062500 0.562500
  Ga4 0.750000 0.000000 0.750000
  As4 0.812500 0.062500 0.812500
  Ga3 0.750000 0.250000 0.000000
  As4 0.812500 0.312500 0.062500
  Ga4 0.750000 0.250000 0.250000
  As4 0.812500 0.312500 0.312500
  Ga2 0.750000 0.250000 0.500000
  Sb4 0.812500 0.312500 0.562500
  Ga1 0.750000 0.250000 0.750000
  Sb4 0.812500 0.312500 0.812500
  Ga4 0.750000 0.500000 0.000000
  As4 0.812500 0.562500 0.062500
  Ga3 0.750000 0.500000 0.250000
  As4 0.812500 0.562500 0.312500
  Ga1 0.750000 0.500000 0.500000
  Sb4 0.812500 0.562500 0.562500
  Ga2 0.750000 0.500000 0.750000
  As4 0.812500 0.562500 0.812500
  Ga4 0.750000 0.750000 0.000000
  As4 0.812500 0.812500 0.062500
  Ga2 0.750000 0.750000 0.250000
  Sb4 0.812500 0.812500 0.312500
  Ga2 0.750000 0.750000 0.500000
  As4 0.812500 0.812500 0.562500
  Ga4 0.750000 0.750000 0.750000
  As4 0.812500 0.812500 0.812500
K_POINTS automatic
  3 3 3 0 0 0

HUBBARD (atomic)
U In0-5p -2.24
U Ga0-4p -2.04
U As0-4p 4.02
U Sb0-5p 4.43
U In1-5p -2.2575000000000003
U Ga1-4p -2.39
U As1-4p 4.5424999999999995
U Sb1-5p 4.7825
U In2-5p -2.2750000000000004
U Ga2-4p -2.74
U As2-4p 5.0649999999999995
U Sb2-5p 5.135
U In3-5p -2.2925
U Ga3-4p -3.09
U As3-4p 5.5875
U Sb3-5p 5.4875
U In4-5p -2.31
U Ga4-4p -3.44
U As4-4p 6.11
U Sb4-5p 5.84

The output file gets through several optimisation steps and then stalls at an scf iteration, as below:

     Number of occupied Hubbard levels =  615.3045

     total cpu time spent up to now is   166039.0 secs

     total energy              =  -22879.33720322 Ry
     estimated scf accuracy    <       0.00963404 Ry

     iteration #  2     ecut=   100.00 Ry     beta= 0.65
     Davidson diagonalization with overlap
     ethr =  5.38E-07,  avg # of iterations =  6.0

     total cpu time spent up to now is   172245.2 secs

     total energy              =  -22879.33685019 Ry
     estimated scf accuracy    <       0.03584310 Ry

     iteration #  3     ecut=   100.00 Ry     beta= 0.65
     Davidson diagonalization with overlap

I am relatively confident that this isn't a problem with the memory of the computer being overcome. I get problems on smaller cells and bands calculations. Any help is appreciated.


With kind regards, I am
Angus Gentles
ams-OSRAM
Intitute of Microelectronics, TU Wien


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20240605/0220daf7/attachment.html>


More information about the users mailing list