[Pw_forum] "MPI_COMM_RANK : Null communicator..." error through Platform LSF system
wangxinquan at tju.edu.cn
wangxinquan at tju.edu.cn
Wed Apr 9 05:50:07 CEST 2008
Dear users and developers,
Recently I have done a test on Nankai Stars HPC. The error message
"MPI_COMM_RANK : Null communicator¡Aborting program !"appeared when I did
a scf calculation through 2 cpu (2nodes).
To solve this problem, I have found some hints from google, such as¡°please
make sure that you used the same version of MPI for compiling and running, and
included the corresponding header file mpi.h in your code.¡±
(http://www.ncsa.edu/UserInfo/Resources/Hardware/XeonCluster/FAQ/XeonJobs.html)
According to the pwscf mailing list,"dynamic port number used in mpi
intercommunication is not working. This is most probably an installation issue
regarding LSF." may be a problem.
(http://www.democritos.it/pipermail/pw_forum/2007-June/006689.html)
According to the pwscf manual,"Your machine might be configured so as to
disallow interactive execution" may be another problem.
My question is:
To solve ¡°MPI_COMM_RANK¡¡± problem, do I need to modify pwscf code,
mpich_gm code or LSF system?
Calculation Details are as follows:
---------------------------------------------------------------------------------
HPC background:
Nankai Stars (http://202.113.29.200/introduce.htm)
800 Xeon 3.06 Ghz CPU (400 nodes)
800 GB Memory
53T High-Speed Storage
Myrinet
Parallel jobs are run and debuged through Platform LSF system.
Mpich_gm driver:1.2.6..13a
Espresso-3.2.3
---------------------------------------------------------------------------------
---------------------------------------------------------------------------------
Installation:
/configure CC=mpicc F77=mpif77 F90=mpif90
make all
---------------------------------------------------------------------------------
---------------------------------------------------------------------------------
Submit script :
#!/bin/bash
#BSUB -q normal
#BSUB -J test.icymoon
#BSUB -c 3:00
#BSUB -a "mpich_gm"
#BSUB -o %J.log
#BSUB -n 2
cd /nfs/s04r2p1/wangxq_tj
echo "test icymoon"
mpirun.lsf /nfs/s04r2p1/wangxq_tj/espresso-3.2.3/bin/pw.x <
/nfs/s04r2p1/wangxq_tj/cu.scf.in > cu.scf.out
echo "test icymoon end"
---------------------------------------------------------------------------------
---------------------------------------------------------------------------------
Output file (%J.log):
¡ ¡
The output (if any) follows:
test icymoon
0 - MPI_COMM_RANK : Null communicator
[0] Aborting program !
[0] Aborting program!
test icymoon end
---------------------------------------------------------------------------------
---------------------------------------------------------------------------------
<cu.scf.in>
&control
calculation='scf'
restart_mode='from_scratch',
pseudo_dir = '/nfs/s04r2p1/wangxq_tj/espresso-3.2.3/pseudo/',
outdir='/nfs/s04r2p1/wangxq_tj/',
prefix='cu'
/
&system
ibrav = 2, celldm(1) =6.73, nat= 1, ntyp= 1,
ecutwfc = 25.0, ecutrho = 300.0
occupations='smearing', smearing='methfessel-paxton', degauss=0.02
noncolin = .true.
starting_magnetization(1) = 0.5
angle1(1) = 90.0
angle2(1) = 0.0
/
&electrons
conv_thr = 1.0e-8
mixing_beta = 0.7
/
ATOMIC_SPECIES
Cu 63.55 Cu.pz-d-rrkjus.UPF
ATOMIC_POSITIONS
Cu 0.0 0.0 0.0
K_POINTS (automatic)
8 8 8 0 0 0
--------------------------------------------------------------------------------
---------------------------------------------------------------------------------
cu.scf.out
1 - MPI_COMM_RANK : Null communicator
[1] Aborting program !
[1] Aborting program!
TID HOST_NAME COMMAND_LINE STATUS TERMINATION_TIME
==== ========== ================ ======================= ===================
0001 node333 Exit (255) 04/08/2008 19:36:59
0002 node284 Exit (255) 04/08/2008 19:36:59
---------------------------------------------------------------------------------
Any help will be deeply appreciated!
Best regards,
=====================================
X.Q. Wang
wangxinquan at tju.edu.cn
School of Chemical Engineering and Technology
Tianjin University
92 Weijin Road, Tianjin, P. R. China
tel:86-22-27890268, fax: 86-22-27892301
=====================================
More information about the users
mailing list