[Pw_forum] Internal MPI error

alan chen chenhanghuipwscf at gmail.com
Tue Oct 2 17:16:57 CEST 2007


Dear Forum Members,
     We have encountered an Internal MPI error in a couple of jobs. The jobs
are of medium size (around 60 atoms in the unit cell) . When they had run
after some time (one or two days) they were automatically killed due to the
following error:

Fatal error in MPI_Alltoallv: Internal MPI error!, error stack:
MPI_Alltoallv(407).: MPI_Alltoallv(sbuf=0x2a96197010, scnts=0x7fbfffdaf0,
sdispls=0x7fbfffdb70, MPI_DOUBLE_COMPLEX, rbuf=0x2ad32ce010,
rcnts=0x7fbfffdbf0, rdispls=0x7fbfffdc70, MPI_DOUBLE_COMPLEX,
comm=0x84000002) failed
MPI_Waitall(242)..........................: MPI_Waitall(count=64,
req_array=0x3391b40, status_array=0x3391630) failed
MPIDI_CH3_Progress_wait(212)..............: an error occurred while handling
an event returned by MPIDU_Sock_Wait()
MPIDI_CH3I_Progress_handle_sock_event(413):
MPIDU_Socki_handle_read(633)..............: connection failure
(set=0,sock=8,errno=104:Connection reset by peer)
[cli_30]: aborting job:
Fatal error in MPI_Alltoallv: Other MPI error, error stack:
MPI_Alltoallv(407)........................: MPI_Alltoallv(sbuf=0x2a96197010,
scnts=0x7fbfffdaf0, sdispls=0x7fbfffdb70, MPI_DOUBLE_COMPLEX,
rbuf=0x2aac72a010, rcnts=0x7fbfffdbf0, rdispls=0x7fbfffdc70,
MPI_DOUBLE_COMPLEX, comm=0x84000002) failed
MPI_Waitall(242)..........................: MPI_Waitall(count=64,
req_array=0x3382f50, status_array=0x3263c60) failed

Does anyone have this problem before? How can I avoid this problem?
Thank you very much.

Hanghui
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20071002/bacc9346/attachment.html>


More information about the users mailing list