[Wannier] executing wannier90.x in parallel: Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Simon Imanuel Rombauer simon.rombauer at student.uni-augsburg.de
Wed Jul 5 13:01:29 CEST 2023


Dear all,

I am trying to get wannier90.x to run in parallel for some time now. I have installed it with the gfortran compiler and COMMS  = mpi ,MPIF90 = mpif90 in the make.inc file. All test pass when running in serial, when running it on 6 cores only 39 out of 62 tests pass. The error message is always the same (for instance from testw90_lavo3_dissphere):

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:
#0  0x7f1409423ad0 in ???
#1  0x7f1409422c35 in ???
#2  0x7f140904251f in ???
	at ./signal/../sysdeps/unix/sysv/linux/x86_64/libc_sigaction.c:0
#0  0x7f6f82623ad0 in ???
#1  0x7f6f82622c35 in ???
#2  0x7f6f8224251f in ???
	at ./signal/../sysdeps/unix/sysv/linux/x86_64/libc_sigaction.c:0
#0  0x7fb45be23ad0 in ???
#1  0x7fb45be22c35 in ???
#2  0x7fb45ba4251f in ???
	at ./signal/../sysdeps/unix/sysv/linux/x86_64/libc_sigaction.c:0
#3  0x56361c9b8470 in __w90_comms_MOD_comms_scatterv_cmplx_4
	at ../comms.F90:1247
#3  0x56366c90f470 in __w90_comms_MOD_comms_scatterv_cmplx_4
	at ../comms.F90:1247
#3  0x56143bd90470 in __w90_comms_MOD_comms_scatterv_cmplx_4
	at ../comms.F90:1247
#4  0x56361c8f73c0 in __w90_overlap_MOD_overlap_read
	at ../overlap.F90:203
#4  0x56366c84e3c0 in __w90_overlap_MOD_overlap_read
	at ../overlap.F90:203
#4  0x56143bccf3c0 in __w90_overlap_MOD_overlap_read
	at ../overlap.F90:203
#5  0x56361c89e80e in wannier
	at ../wannier_prog.F90:204
#6  0x56361c89fed0 in main
	at ../wannier_prog.F90:55
#5  0x56366c7f580e in wannier
	at ../wannier_prog.F90:204
#6  0x56366c7f6ed0 in main
	at ../wannier_prog.F90:55
#5  0x56143bc7680e in wannier
	at ../wannier_prog.F90:204
#6  0x56143bc77ed0 in main
	at ../wannier_prog.F90:55
#0  0x7fe9a9623ad0 in ???
#1  0x7fe9a9622c35 in ???
#2  0x7fe9a924251f in ???
	at ./signal/../sysdeps/unix/sysv/linux/x86_64/libc_sigaction.c:0
#3  0x560ec2af1470 in __w90_comms_MOD_comms_scatterv_cmplx_4
	at ../comms.F90:1247
#4  0x560ec2a303c0 in __w90_overlap_MOD_overlap_read
	at ../overlap.F90:203
#5  0x560ec29d780e in wannier
	at ../wannier_prog.F90:204
#6  0x560ec29d8ed0 in main
	at ../wannier_prog.F90:55
#0  0x7fcd60823ad0 in ???
#1  0x7fcd60822c35 in ???
#2  0x7fcd6044251f in ???
	at ./signal/../sysdeps/unix/sysv/linux/x86_64/libc_sigaction.c:0
#3  0x55b73e6b0470 in __w90_comms_MOD_comms_scatterv_cmplx_4
	at ../comms.F90:1247
#4  0x55b73e5ef3c0 in __w90_overlap_MOD_overlap_read
	at ../overlap.F90:203
#5  0x55b73e59680e in wannier
	at ../wannier_prog.F90:204
#6  0x55b73e597ed0 in main
	at ../wannier_prog.F90:55
--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that process rank 3 with PID 0 on node simon-ubuntu exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------

This error does not occur when running mpirun -np 6 wannier90.x -pp 'seedname'. The output, wout-files confirm that the calculations started in parallel using 6 cores. 
Any idea and input is highly appreciated!

Best,
Simon Rombauer

Masters Student Physics
Experimentalphysik IV
University Augsburg
Germany



More information about the Wannier mailing list