[QE-users] Issue with running parallel version of QE 6.3 on more than 16 cpu
Martina Lessio
ml4132 at columbia.edu
Wed Aug 29 23:49:05 CEST 2018
Dear all,
I have been successfully using QE 5.4 for a while now but recently decided
to install the newest version hoping that some issues I have been
experiencing with 5.4 would be resolved. However, I now have some issues
when running version 6.3 in parallel. In particular, if I run a sample
calculation (input file provided below) on more than 16 processors the
calculation crashes after printing this line "Starting wfcs are random" and
the following error message is printed in the output file:
[compute-0-5.local:5241] *** An error occurred in MPI_Bcast
[compute-0-5.local:5241] *** on communicator MPI COMMUNICATOR 20 SPLIT FROM
18
[compute-0-5.local:5241] *** MPI_ERR_TRUNCATE: message truncated
[compute-0-5.local:5241] *** MPI_ERRORS_ARE_FATAL: your MPI job will now
abort
--------------------------------------------------------------------------
mpirun has exited due to process rank 16 with PID 5243 on
node compute-0-5.local exiting improperly. There are two reasons this could
occur:
1. this process did not call "init" before exiting, but others in
the job did. This can cause a job to hang indefinitely while it waits
for all processes to call "init". By rule, if one process calls "init",
then ALL processes must call "init" prior to termination.
2. this process called "init", but exited without calling "finalize".
By rule, all processes that call "init" MUST call "finalize" prior to
exiting or it will be considered an "abnormal termination"
This may have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------
[compute-0-5.local:05226] 1 more process has sent help message
help-mpi-errors.txt / mpi_errors_are_fatal
[compute-0-5.local:05226] Set MCA parameter "orte_base_help_aggregate" to 0
to see all help / error messages
Note that I have been running QE 5.4 on 24 cpu on this same computer
cluster without any issue. I am copying my input file at the end of this
email.
Any help with this would be greatly appreciated.
Thank you in advance.
All the best,
Martina
Martina Lessio
Department of Chemistry
Columbia University
*Input file:*
&control
calculation = 'relax'
restart_mode='from_scratch',
prefix='MoTe2_bulk_opt_1',
pseudo_dir = '/home/mlessio/espresso-5.4.0/pseudo/',
outdir='/home/mlessio/espresso-5.4.0/tempdir/'
/
&system
ibrav= 4, A=3.530, B=3.530, C=13.882, cosAB=-0.5, cosAC=0, cosBC=0,
nat= 6, ntyp= 2,
ecutwfc =60.
occupations='smearing', smearing='gaussian', degauss=0.01
nspin =1
/
&electrons
mixing_mode = 'plain'
mixing_beta = 0.7
conv_thr = 1.0d-10
/
&ions
/
ATOMIC_SPECIES
Mo 95.96 Mo_ONCV_PBE_FR-1.0.upf
Te 127.6 Te_ONCV_PBE_FR-1.1.upf
ATOMIC_POSITIONS {crystal}
Te 0.333333334 0.666666643 0.625000034
Te 0.666666641 0.333333282 0.375000000
Te 0.666666641 0.333333282 0.125000000
Te 0.333333334 0.666666643 0.874999966
Mo 0.333333334 0.666666643 0.250000000
Mo 0.666666641 0.333333282 0.750000000
K_POINTS {automatic}
8 8 2 0 0 0
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20180829/2c1723a4/attachment.html>
More information about the users
mailing list