[Pw_forum] mpi problems

Carlo Nervi carlo.nervi at unito.it
Sat Mar 7 10:37:57 CET 2009


Dear All,
I localized my problems in compiling to MPI.
I installed the latest ifort icc and mkl libraries from
Intel, as well as mpich2-v1.0.8 in my Linux Gentoo. Now it
is compiling (apparently) correctly, but whenever I try to
run mpiexec it fail aith Abort message.
I tries also to run tests, setting PARA_PREFIX="" (this
should run without mpiexec), but I got strange random
errors (see below).
I think I'm doing something wrong with MPI, perhaps wrong
compiler FLAGS or include directories.
BRW, now the single CPU is running at 100%...
I would appreciate any suggestions...
Thanks,
   Carlo

./check-pw.x.j
Checking atom-lsda...passed
Checking atom-pbe...passed
Checking atom-sigmapbe...passed
Checking atom...passed
Checking berry...application called
MPI_Abort(MPI_COMM_WORLD, 0) - process 0[unset]: aborting
job: application called MPI_Abort(MPI_COMM_WORLD, 0) -
process 0 discrepancy in total energy detected
Reference:  -333.717527, You got:
discrepancy in number of scf iterations detected
Reference: 16, You got:
Checking berry, step 2 ...application called
MPI_Abort(MPI_COMM_WORLD, 0) - process 0[unset]: aborting
job:
application called MPI_Abort(MPI_COMM_WORLD, 0) - process 0
discrepancy in polarization detected
Reference: 0.29312, You got:
Checking electric0...application called
MPI_Abort(MPI_COMM_WORLD, 0) - process 0[unset]: aborting
job:
application called MPI_Abort(MPI_COMM_WORLD, 0) - process 0
discrepancy in total energy detected
Reference:   -62.950448, You got:     0.000000
discrepancy in number of scf iterations detected
Reference: 8, You got:
Checking electric1...application called
MPI_Abort(MPI_COMM_WORLD, 0) - process 0[unset]: aborting
job:
application called MPI_Abort(MPI_COMM_WORLD, 0) - process 0
discrepancy in total energy detected
Reference:   -62.950448, You got:     0.000000
discrepancy in number of scf iterations detected
Reference: 1, You got:
Checking electric2...application called
MPI_Abort(MPI_COMM_WORLD, 0) - process 0[unset]: aborting
job:
application called MPI_Abort(MPI_COMM_WORLD, 0) - process 0
discrepancy in total energy detected
Reference:   -63.066086, You got:     0.000000
discrepancy in number of scf iterations detected
Reference: 4, You got:
Checking eval_infix...application called
MPI_Abort(MPI_COMM_WORLD, 0) - process 0[unset]: aborting
job:
application called MPI_Abort(MPI_COMM_WORLD, 0) - process 0
discrepancy in total energy detected
Reference:   -15.794496, You got:     0.000000
discrepancy in number of scf iterations detected
Reference: 5, You got:
discrepancy in pressure detected
Reference: -30.30, You got:
Checking eval_infix, step 2 ...passed
Checking lattice-ibrav0-abc...application called
MPI_Abort(MPI_COMM_WORLD, 0) - process 0[unset]: aborting
job:
application called MPI_Abort(MPI_COMM_WORLD, 0) - process 0
discrepancy in total energy detected
Reference:    -2.232039, You got:     0.000000
discrepancy in number of scf iterations detected
Reference: 4, You got:
Checking lattice-ibrav0-cell_parameters+a...application
called MPI_Abort(MPI_COMM_WORLD, 0) - process 0[unset]:
aborting job:
application called MPI_Abort(MPI_COMM_WORLD, 0) - process 0
discrepancy in total energy detected
Reference:    -2.232039, You got:     0.000000
discrepancy in number of scf iterations detected
Reference: 4, You got:
Checking
lattice-ibrav0-cell_parameters+celldm...application called
MPI_Abort(MPI_COMM_WORLD, 0) - process 0[unset]: aborting
job:
application called MPI_Abort(MPI_COMM_WORLD, 0) - process 0
discrepancy in total energy detected
Reference:    -2.232039, You got:     0.000000
discrepancy in number of scf iterations detected
Reference: 4, You got:
Checking lattice-ibrav0-cell_parameters...application
called MPI_Abort(MPI_COMM_WORLD, 0) - process 0[unset]:
aborting job:
application called MPI_Abort(MPI_COMM_WORLD, 0) - process 0
discrepancy in total energy detected
Reference:    -2.232039, You got:     0.000000
discrepancy in number of scf iterations detected
Reference: 4, You got:
Checking lattice-ibrav1-kauto...application called
MPI_Abort(MPI_COMM_WORLD, 0) - process 0[unset]: aborting
job:
application called MPI_Abort(MPI_COMM_WORLD, 0) - process 0
discrepancy in total energy detected
Reference:    -2.231646, You got:
discrepancy in number of scf iterations detected
Reference: 4, You got:
Checking lattice-ibrav1...application called
MPI_Abort(MPI_COMM_WORLD, 0) - process 0[unset]: aborting
job:
application called MPI_Abort(MPI_COMM_WORLD, 0) - process 0
discrepancy in total energy detected
Reference:    -2.234237, You got:
discrepancy in number of scf iterations detected
Reference: 4, You got:
Checking lattice-ibrav10-kauto...application called
MPI_Abort(MPI_COMM_WORLD, 0) - process 0[unset]: aborting
job:
application called MPI_Abort(MPI_COMM_WORLD, 0) - process 0
discrepancy in total energy detected
Reference:    -2.231523, You got:     0.000000
discrepancy in number of scf iterations detected
Reference: 4, You got:
Checking lattice-ibrav10...application called
MPI_Abort(MPI_COMM_WORLD, 0) - process 0[unset]: aborting
job:
application called MPI_Abort(MPI_COMM_WORLD, 0) - process 0
discrepancy in total energy detected
Reference:    -2.237991, You got:     0.000000
discrepancy in number of scf iterations detected
Reference: 5, You got:
Checking lattice-ibrav11-kauto...application called
MPI_Abort(MPI_COMM_WORLD, 0) - process 0[unset]: aborting
job:
application called MPI_Abort(MPI_COMM_WORLD, 0) - process 0
discrepancy in total energy detected
Reference:    -2.231211, You got:
discrepancy in number of scf iterations detected
Reference: 4, You got:
Checking lattice-ibrav11...application called
MPI_Abort(MPI_COMM_WORLD, 0) - process 0[unset]: aborting
job:
application called MPI_Abort(MPI_COMM_WORLD, 0) - process 0
discrepancy in total energy detected
Reference:    -2.231893, You got:
discrepancy in number of scf iterations detected
Reference: 4, You got:
Checking lattice-ibrav12-kauto...application called
MPI_Abort(MPI_COMM_WORLD, 0) - process 0[unset]: aborting
job:
application called MPI_Abort(MPI_COMM_WORLD, 0) - process 0
discrepancy in total energy detected
Reference:    -2.231430, You got:
discrepancy in number of scf iterations detected
Reference: 4, You got:
Checking lattice-ibrav12...application called
MPI_Abort(MPI_COMM_WORLD, 0) - process 0[unset]: aborting
job:
application called MPI_Abort(MPI_COMM_WORLD, 0) - process 0
discrepancy in total energy detected
Reference:    -2.231539, You got:
discrepancy in number of scf iterations detected
Reference: 4, You got:
Checking lattice-ibrav13-kauto...application called
MPI_Abort(MPI_COMM_WORLD, 0) - process 0[unset]: aborting
job:
application called MPI_Abort(MPI_COMM_WORLD, 0) - process 0
discrepancy in total energy detected
Reference:    -2.231320, You got:
discrepancy in number of scf iterations detected
Reference: 4, You got:
Checking lattice-ibrav13...application called
MPI_Abort(MPI_COMM_WORLD, 0) - process 0[unset]: aborting
job:
application called MPI_Abort(MPI_COMM_WORLD, 0) - process 0
discrepancy in total energy detected
Reference:    -2.232363, You got:
discrepancy in number of scf iterations detected
Reference: 4, You got:
Checking lattice-ibrav14-kauto...application called
MPI_Abort(MPI_COMM_WORLD, 0) - process 0[unset]: aborting
job:
application called MPI_Abort(MPI_COMM_WORLD, 0) - process 0
discrepancy in total energy detected
Reference:    -2.231424, You got:     0.000000
discrepancy in number of scf iterations detected
Reference: 4, You got:
Checking lattice-ibrav14...application called
MPI_Abort(MPI_COMM_WORLD, 0) - process 0[unset]: aborting
job:
application called MPI_Abort(MPI_COMM_WORLD, 0) - process 0
discrepancy in total energy detected
Reference:    -2.232039, You got:     0.000000
discrepancy in number of scf iterations detected
Reference: 4, You got:
Checking lattice-ibrav2-kauto...application called
MPI_Abort(MPI_COMM_WORLD, 0) - process 0[unset]: aborting
job:
application called MPI_Abort(MPI_COMM_WORLD, 0) - process 0
discrepancy in total energy detected
Reference:    -2.234027, You got:
discrepancy in number of scf iterations detected
Reference: 4, You got:
Checking lattice-ibrav2...application called
MPI_Abort(MPI_COMM_WORLD, 0) - process 0[unset]: aborting
job:
application called MPI_Abort(MPI_COMM_WORLD, 0) - process 0
discrepancy in total energy detected
Reference:    -2.327848, You got:
discrepancy in number of scf iterations detected
Reference: 4, You got:
Checking lattice-ibrav3-kauto...application called
MPI_Abort(MPI_COMM_WORLD, 0) - process 0[unset]: aborting
job:
application called MPI_Abort(MPI_COMM_WORLD, 0) - process 0
discrepancy in total energy detected
Reference:    -2.231902, You got:
discrepancy in number of scf iterations detected
Reference: 4, You got:
Checking lattice-ibrav3...application called
MPI_Abort(MPI_COMM_WORLD, 0) - process 0[unset]: aborting
job:
application called MPI_Abort(MPI_COMM_WORLD, 0) - process 0
discrepancy in total energy detected
Reference:    -2.250115, You got:
discrepancy in number of scf iterations detected
Reference: 4, You got:
Checking lattice-ibrav4-kauto...application called
MPI_Abort(MPI_COMM_WORLD, 0) - process 0[unset]: aborting
job:
application called MPI_Abort(MPI_COMM_WORLD, 0) - process 0
discrepancy in total energy detected
Reference:    -2.231325, You got:
discrepancy in number of scf iterations detected
Reference: 4, You got:
Checking lattice-ibrav4...application called
MPI_Abort(MPI_COMM_WORLD, 0) - process 0[unset]: aborting
job:
application called MPI_Abort(MPI_COMM_WORLD, 0) - process 0
discrepancy in total energy detected
Reference:    -2.232944, You got:
discrepancy in number of scf iterations detected
Reference: 4, You got:
Checking lattice-ibrav5-kauto...application called
MPI_Abort(MPI_COMM_WORLD, 0) - process 0[unset]: aborting
job:
application called MPI_Abort(MPI_COMM_WORLD, 0) - process 0
discrepancy in total energy detected
Reference:    -2.231427, You got:
discrepancy in number of scf iterations detected
Reference: 4, You got:
Checking lattice-ibrav5...application called
MPI_Abort(MPI_COMM_WORLD, 0) - process 0[unset]: aborting
job:
application called MPI_Abort(MPI_COMM_WORLD, 0) - process 0
discrepancy in total energy detected
Reference:    -2.236472, You got:
discrepancy in number of scf iterations detected
Reference: 5, You got:
Checking lattice-ibrav6-kauto...application called
MPI_Abort(MPI_COMM_WORLD, 0) - process 0[unset]: aborting
job:
application called MPI_Abort(MPI_COMM_WORLD, 0) - process 0
discrepancy in total energy detected
Reference:    -2.231540, You got:
discrepancy in number of scf iterations detected
Reference: 4, You got:
Checking lattice-ibrav6...application called
MPI_Abort(MPI_COMM_WORLD, 0) - process 0[unset]: aborting
job:
application called MPI_Abort(MPI_COMM_WORLD, 0) - process 0
discrepancy in total energy detected
Reference:    -2.232909, You got:
discrepancy in number of scf iterations detected
Reference: 4, You got:
Checking lattice-ibrav7-kauto...application called
MPI_Abort(MPI_COMM_WORLD, 0) - process 0[unset]: aborting
job:
application called MPI_Abort(MPI_COMM_WORLD, 0) - process 0
discrepancy in total energy detected
Reference:    -2.231175, You got:
discrepancy in number of scf iterations detected
Reference: 4, You got:
Checking lattice-ibrav7...application called
MPI_Abort(MPI_COMM_WORLD, 0) - process 0[unset]: aborting
job:
application called MPI_Abort(MPI_COMM_WORLD, 0) - process 0
discrepancy in total energy detected
Reference:    -2.233682, You got:
discrepancy in number of scf iterations detected
Reference: 4, You got:
Checking lattice-ibrav8-kauto...application called
MPI_Abort(MPI_COMM_WORLD, 0) - process 0[unset]: aborting
job:
application called MPI_Abort(MPI_COMM_WORLD, 0) - process 0
discrepancy in total energy detected
Reference:    -2.231428, You got:
discrepancy in number of scf iterations detected
Reference: 4, You got:
Checking lattice-ibrav8...application called
MPI_Abort(MPI_COMM_WORLD, 0) - process 0[unset]: aborting
job:
application called MPI_Abort(MPI_COMM_WORLD, 0) - process 0
discrepancy in total energy detected
Reference:    -2.231982, You got:
discrepancy in number of scf iterations detected
Reference: 4, You got:
Checking lattice-ibrav9-kauto...application called
MPI_Abort(MPI_COMM_WORLD, 0) - process 0[unset]: aborting
job:
application called MPI_Abort(MPI_COMM_WORLD, 0) - process 0
discrepancy in total energy detected
Reference:    -2.231201, You got:
discrepancy in number of scf iterations detected
Reference: 4, You got:
Checking lattice-ibrav9...application called
MPI_Abort(MPI_COMM_WORLD, 0) - process 0[unset]: aborting
job:
application called MPI_Abort(MPI_COMM_WORLD, 0) - process 0
discrepancy in total energy detected
Reference:    -2.237083, You got:
discrepancy in number of scf iterations detected
Reference: 5, You got:
Checking lda+U-noU...application called
MPI_Abort(MPI_COMM_WORLD, 0) - process 0[unset]: aborting
job:
application called MPI_Abort(MPI_COMM_WORLD, 0) - process 0
discrepancy in total energy detected
Reference:  -174.824658, You got:
discrepancy in number of scf iterations detected
Reference: 9, You got:
Checking lda+U-user_ns...application called
MPI_Abort(MPI_COMM_WORLD, 0) - process 0[unset]: aborting
job:
application called MPI_Abort(MPI_COMM_WORLD, 0) - process 0
discrepancy in total energy detected
Reference:  -174.537417, You got:
discrepancy in number of scf iterations detected
Reference: 10, You got:
Checking lda+U...application called
MPI_Abort(MPI_COMM_WORLD, 0) - process 0[unset]: aborting
job:
application called MPI_Abort(MPI_COMM_WORLD, 0) - process 0
discrepancy in total energy detected
Reference:  -174.453376, You got:
discrepancy in number of scf iterations detected
Reference: 15, You got:
Checking lsda-cg...passed
Checking lsda-mixing_TF...passed
Checking lsda-mixing_localTF...passed
Checking lsda-mixing_ndim...passed
Checking lsda-nelup+neldw...passed
Checking lsda-tot_magnetization...passed
Checking lsda...passed
Checking lsda, step 2 ...passed
Checking md-pot_extrap1...1
FAILED with error condition!
Input: md-pot_extrap1.in, Output: md-pot_extrap1.out,
Reference: md-pot_extrap1.ref
Aborting



--
------------------------------------------------------
Carlo Nervi carlo.nervi at unito.it Tel:+39 011 6707507/8
Fax: +39 011 6707855 - Dipartimento di Chimica IFM
via P. Giuria 7, 10125 Torino, Italy
http://lem.ch.unito.it/





More information about the users mailing list