[QE-users] Issue with running parallel version of QE 6.3 on more than 16 cpu
Paolo Giannozzi
p.giannozzi at gmail.com
Thu Aug 30 18:08:53 CEST 2018
Please report the exact conditions under which you are running the
24-processor case: something like
mpirun -np 24 pw.x -nk .. -nd .. -whatever_option
Paolo
On Wed, Aug 29, 2018 at 11:49 PM, Martina Lessio <ml4132 at columbia.edu>
wrote:
> Dear all,
>
> I have been successfully using QE 5.4 for a while now but recently decided
> to install the newest version hoping that some issues I have been
> experiencing with 5.4 would be resolved. However, I now have some issues
> when running version 6.3 in parallel. In particular, if I run a sample
> calculation (input file provided below) on more than 16 processors the
> calculation crashes after printing this line "Starting wfcs are random" and
> the following error message is printed in the output file:
> [compute-0-5.local:5241] *** An error occurred in MPI_Bcast
> [compute-0-5.local:5241] *** on communicator MPI COMMUNICATOR 20 SPLIT
> FROM 18
> [compute-0-5.local:5241] *** MPI_ERR_TRUNCATE: message truncated
> [compute-0-5.local:5241] *** MPI_ERRORS_ARE_FATAL: your MPI job will now
> abort
> --------------------------------------------------------------------------
> mpirun has exited due to process rank 16 with PID 5243 on
> node compute-0-5.local exiting improperly. There are two reasons this
> could occur:
>
> 1. this process did not call "init" before exiting, but others in
> the job did. This can cause a job to hang indefinitely while it waits
> for all processes to call "init". By rule, if one process calls "init",
> then ALL processes must call "init" prior to termination.
>
> 2. this process called "init", but exited without calling "finalize".
> By rule, all processes that call "init" MUST call "finalize" prior to
> exiting or it will be considered an "abnormal termination"
>
> This may have caused other processes in the application to be
> terminated by signals sent by mpirun (as reported here).
> --------------------------------------------------------------------------
> [compute-0-5.local:05226] 1 more process has sent help message
> help-mpi-errors.txt / mpi_errors_are_fatal
> [compute-0-5.local:05226] Set MCA parameter "orte_base_help_aggregate" to
> 0 to see all help / error messages
>
>
> Note that I have been running QE 5.4 on 24 cpu on this same computer
> cluster without any issue. I am copying my input file at the end of this
> email.
>
> Any help with this would be greatly appreciated.
> Thank you in advance.
>
> All the best,
> Martina
>
> Martina Lessio
> Department of Chemistry
> Columbia University
>
> *Input file:*
> &control
> calculation = 'relax'
> restart_mode='from_scratch',
> prefix='MoTe2_bulk_opt_1',
> pseudo_dir = '/home/mlessio/espresso-5.4.0/pseudo/',
> outdir='/home/mlessio/espresso-5.4.0/tempdir/'
> /
> &system
> ibrav= 4, A=3.530, B=3.530, C=13.882, cosAB=-0.5, cosAC=0, cosBC=0,
> nat= 6, ntyp= 2,
> ecutwfc =60.
> occupations='smearing', smearing='gaussian', degauss=0.01
> nspin =1
> /
> &electrons
> mixing_mode = 'plain'
> mixing_beta = 0.7
> conv_thr = 1.0d-10
> /
> &ions
> /
> ATOMIC_SPECIES
> Mo 95.96 Mo_ONCV_PBE_FR-1.0.upf
> Te 127.6 Te_ONCV_PBE_FR-1.1.upf
> ATOMIC_POSITIONS {crystal}
> Te 0.333333334 0.666666643 0.625000034
> Te 0.666666641 0.333333282 0.375000000
> Te 0.666666641 0.333333282 0.125000000
> Te 0.333333334 0.666666643 0.874999966
> Mo 0.333333334 0.666666643 0.250000000
> Mo 0.666666641 0.333333282 0.750000000
>
> K_POINTS {automatic}
> 8 8 2 0 0 0
>
>
> _______________________________________________
> users mailing list
> users at lists.quantum-espresso.org
> https://lists.quantum-espresso.org/mailman/listinfo/users
>
--
Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e Fisiche,
Univ. Udine, via delle Scienze 208, 33100 Udine, Italy
Phone +39-0432-558216, fax +39-0432-558222
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20180830/ffc42e44/attachment.html>
More information about the users
mailing list