[QE-users] Issue with running parallel version of QE 6.3 on more than 16 cpu

Paolo Giannozzi p.giannozzi at gmail.com
Thu Aug 30 18:08:53 CEST 2018


Please report the exact conditions under which you are running the
24-processor case: something like
  mpirun -np 24 pw.x -nk .. -nd .. -whatever_option

Paolo

On Wed, Aug 29, 2018 at 11:49 PM, Martina Lessio <ml4132 at columbia.edu>
wrote:

> Dear all,
>
> I have been successfully using QE 5.4 for a while now but recently decided
> to install the newest version hoping that some issues I have been
> experiencing with 5.4 would be resolved. However, I now have some issues
> when running version 6.3 in parallel. In particular, if I run a sample
> calculation (input file provided below) on more than 16 processors the
> calculation crashes after printing this line "Starting wfcs are random" and
> the following error message is printed in the output file:
> [compute-0-5.local:5241] *** An error occurred in MPI_Bcast
> [compute-0-5.local:5241] *** on communicator MPI COMMUNICATOR 20 SPLIT
> FROM 18
> [compute-0-5.local:5241] *** MPI_ERR_TRUNCATE: message truncated
> [compute-0-5.local:5241] *** MPI_ERRORS_ARE_FATAL: your MPI job will now
> abort
> --------------------------------------------------------------------------
> mpirun has exited due to process rank 16 with PID 5243 on
> node compute-0-5.local exiting improperly. There are two reasons this
> could occur:
>
> 1. this process did not call "init" before exiting, but others in
> the job did. This can cause a job to hang indefinitely while it waits
> for all processes to call "init". By rule, if one process calls "init",
> then ALL processes must call "init" prior to termination.
>
> 2. this process called "init", but exited without calling "finalize".
> By rule, all processes that call "init" MUST call "finalize" prior to
> exiting or it will be considered an "abnormal termination"
>
> This may have caused other processes in the application to be
> terminated by signals sent by mpirun (as reported here).
> --------------------------------------------------------------------------
> [compute-0-5.local:05226] 1 more process has sent help message
> help-mpi-errors.txt / mpi_errors_are_fatal
> [compute-0-5.local:05226] Set MCA parameter "orte_base_help_aggregate" to
> 0 to see all help / error messages
>
>
> Note that I have been running QE 5.4 on 24 cpu on this same computer
> cluster without any issue. I am copying my input file at the end of this
> email.
>
> Any help with this would be greatly appreciated.
> Thank you in advance.
>
> All the best,
> Martina
>
> Martina Lessio
> Department of Chemistry
> Columbia University
>
> *Input file:*
> &control
>     calculation = 'relax'
>     restart_mode='from_scratch',
>     prefix='MoTe2_bulk_opt_1',
>     pseudo_dir = '/home/mlessio/espresso-5.4.0/pseudo/',
>     outdir='/home/mlessio/espresso-5.4.0/tempdir/'
>  /
>  &system
>     ibrav= 4, A=3.530, B=3.530, C=13.882, cosAB=-0.5, cosAC=0, cosBC=0,
>     nat= 6, ntyp= 2,
>     ecutwfc =60.
>     occupations='smearing', smearing='gaussian', degauss=0.01
>     nspin =1
>  /
>  &electrons
>     mixing_mode = 'plain'
>     mixing_beta = 0.7
>     conv_thr =  1.0d-10
>  /
>  &ions
>  /
> ATOMIC_SPECIES
>  Mo  95.96 Mo_ONCV_PBE_FR-1.0.upf
>  Te  127.6 Te_ONCV_PBE_FR-1.1.upf
> ATOMIC_POSITIONS {crystal}
> Te     0.333333334         0.666666643         0.625000034
> Te     0.666666641         0.333333282         0.375000000
> Te     0.666666641         0.333333282         0.125000000
> Te     0.333333334         0.666666643         0.874999966
> Mo     0.333333334         0.666666643         0.250000000
> Mo     0.666666641         0.333333282         0.750000000
>
> K_POINTS {automatic}
>   8 8 2 0 0 0
>
>
> _______________________________________________
> users mailing list
> users at lists.quantum-espresso.org
> https://lists.quantum-espresso.org/mailman/listinfo/users
>



-- 
Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e Fisiche,
Univ. Udine, via delle Scienze 208, 33100 Udine, Italy
Phone +39-0432-558216, fax +39-0432-558222
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20180830/ffc42e44/attachment.html>


More information about the users mailing list