[Pw_forum] Could you please help me to cope with the error message

vega vegalew at hotmail.com
Mon Nov 10 20:56:46 CET 2008


Dear all,

I am suffering from the error message like this,

[node1][0,1,12][btl_tcp_frag.c:202:mca_btl_tcp_frag_recv] mca_btl_tcp_frag_recv: readv failed with errno=104
forrtl: error (78): process killed (SIGTERM)
Image              PC        Routine            Line        Source             
libblas.so.3       55A8D51C  Unknown               Unknown  Unknown
pw.x               081CBD7B  Unknown               Unknown  Unknown
pw.x               0823A95E  Unknown               Unknown  Unknown
pw.x               08239C4A  Unknown               Unknown  Unknown
pw.x               081DEDC9  Unknown               Unknown  Unknown
pw.x               081D4E9C  Unknown               Unknown  Unknown
Unknown            FFFFD060  Unknown               Unknown  Unknown
 
Stack trace terminated abnormally.
forrtl: error (78): process killed (SIGTERM)
Image              PC        Routine            Line        Source             
.                  FFFFE410  Unknown               Unknown  Unknown
mca_oob_tcp.so     55F911B4  Unknown               Unknown  Unknown
Unknown            00000001  Unknown               Unknown  Unknown
forrtl: error (78): process killed (SIGTERM)
Image              PC        Routine            Line        Source             
pw.x               0813EE72  Unknown               Unknown  Unknown
pw.x               0813E577  Unknown               Unknown  Unknown
 
Stack trace terminated abnormally.
forrtl: error (78): process killed (SIGTERM)
Image              PC        Routine            Line        Source             
.                  FFFFE410  Unknown               Unknown  Unknown
forrtl: error (78): process killed (SIGTERM)
Image              PC        Routine            Line        Source             
.                  FFFFE40E  Unknown               Unknown  Unknown
forrtl: error (78): process killed (SIGTERM)
Image              PC        Routine            Line        Source             
.                  FFFFE410  Unknown               Unknown  Unknown
forrtl: error (78): process killed (SIGTERM)
Image              PC        Routine            Line        Source             
.                  FFFFE40E  Unknown               Unknown  Unknown
forrtl: error (78): process killed (SIGTERM)
Image              PC        Routine            Line        Source             
libblas.so.3       55A8C50F  Unknown               Unknown  Unknown
pw.x               081CBD7B  Unknown               Unknown  Unknown
pw.x               0823A95E  Unknown               Unknown  Unknown
pw.x               08239C4A  Unknown               Unknown  Unknown
pw.x               081DEDC9  Unknown               Unknown  Unknown
pw.x               081D4E9C  Unknown               Unknown  Unknown
Unknown            FFFFD060  Unknown               Unknown  Unknown
 
Stack trace terminated abnormally.
forrtl: error (78): process killed (SIGTERM)
Image              PC        Routine            Line        Source             
.                  FFFFE410  Unknown               Unknown  Unknown
forrtl: error (78): process killed (SIGTERM)
Image              PC        Routine            Line        Source             
.                  FFFFE40E  Unknown               Unknown  Unknown
forrtl: error (78): process killed (SIGTERM)
Image              PC        Routine            Line        Source             
.                  FFFFE410  Unknown               Unknown  Unknown
forrtl: error (78): process killed (SIGTERM)
Image              PC        Routine            Line        Source             
libblas.so.3       55A8C50F  Unknown               Unknown  Unknown
pw.x               081CBD7B  Unknown               Unknown  Unknown
pw.x               0823A95E  Unknown               Unknown  Unknown
pw.x               08239C4A  Unknown               Unknown  Unknown
pw.x               081DEDC9  Unknown               Unknown  Unknown
pw.x               081D4E9C  Unknown               Unknown  Unknown
Unknown            FFFFD060  Unknown               Unknown  Unknown
 
Stack trace terminated abnormally.
forrtl: error (78): process killed (SIGTERM)
Image              PC        Routine            Line        Source             
.                  FFFFE410  Unknown               Unknown  Unknown
forrtl: error (78): process killed (SIGTERM)
Image              PC        Routine            Line        Source             
.                  FFFFE410  Unknown               Unknown  Unknown
Unknown            00000003  Unknown               Unknown  Unknown
forrtl: error (78): process killed (SIGTERM)
Image              PC        Routine            Line        Source             
.                  FFFFE410  Unknown               Unknown  Unknown
forrtl: error (78): process killed (SIGTERM)
Image              PC        Routine            Line        Source             
libblas.so.3       55A8C50B  Unknown               Unknown  Unknown
pw.x               081CBD7B  Unknown               Unknown  Unknown
pw.x               0823A95E  Unknown               Unknown  Unknown
pw.x               08239C4A  Unknown               Unknown  Unknown
pw.x               081DEDC9  Unknown               Unknown  Unknown
pw.x               081D4E9C  Unknown               Unknown  Unknown
Unknown            FFFFCDC0  Unknown               Unknown  Unknown
 
Stack trace terminated abnormally.
forrtl: error (78): process killed (SIGTERM)
Image              PC        Routine            Line        Source             
.                  FFFFE410  Unknown               Unknown  Unknown
[node1][0,1,12][btl_tcp_frag.c:202:mca_btl_tcp_frag_recv] mca_btl_tcp_frag_recv: readv failed with errno=104
forrtl: error (78): process killed (SIGTERM)
Image              PC        Routine            Line        Source             
.                  FFFFE410  Unknown               Unknown  Unknown
forrtl: error (78): process killed (SIGTERM)
Image              PC        Routine            Line        Source             
.                  FFFFE410  Unknown               Unknown  Unknown
forrtl: error (78): process killed (SIGTERM)
Image              PC        Routine            Line        Source             
libblas.so.3       55A8BF47  Unknown               Unknown  Unknown
pw.x               080EA567  Unknown               Unknown  Unknown
 
Stack trace terminated abnormally.
forrtl: error (78): process killed (SIGTERM)
Image              PC        Routine            Line        Source             
libblas.so.3       55A8BF3B  Unknown               Unknown  Unknown
pw.x               081E3C7B  Unknown               Unknown  Unknown
 
Stack trace terminated abnormally.
forrtl: error (78): process killed (SIGTERM)
Image              PC        Routine            Line        Source             
.                  FFFFE410  Unknown               Unknown  Unknown
[node8][0,1,23][btl_tcp_frag.c:202:mca_btl_tcp_frag_recv] mca_btl_tcp_frag_recv: readv failed with errno=104
forrtl: error (78): process killed (SIGTERM)
Image              PC        Routine            Line        Source             
.                  FFFFE410  Unknown               Unknown  Unknown
mpirun noticed that job rank 14 with PID 3519 on node node3 exited on signal 11 (Segmentation fault). 


I could relax 72 atoms successfully with my system using openmpi. But when I wanted to relax 84 atoms, the error message stoped my calculation. Then I tried the mpich2 using the same system. With the help of mpich2 I could relax 120 atoms instead. But the error message bothered me again when I wanted to relax 132 atoms. I was get entangle by his troublesome thing for quite a long time. Could someone give me some suggestions to cope with this?
for better understanding my question, I will show the detail of my systems as follows,

there are 8 nodes in my cluster with the Ethernet.

CPU    intel Q6600
Memory    8G per node
Main Board intel S3000AH
hard disk seagate 750G (7200)
OS redhat linux enterprise 4 as 4 update 4 
Fortran intel ifort 10.1.015
C    intel icc 10.1.015
MPI    mpich2/openmpi
FFTW    fftw 2.1.5
MKL    10.0.1.014

thank you for reading. any hints will be deeply appreciated.

vega

=================================================================================
Vega Lew (weijia liu)
PH.D Candidate in Chemical Engineering
State Key Laboratory of Materials-oriented Chemical Engineering
College of Chemistry and Chemical Engineering
Nanjing University of Technology, 210009, Nanjing, Jiangsu, China
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20081111/09997556/attachment.html>


More information about the users mailing list