[QE-users] Issue with running parallel version of QE 6.3 on more than 16 cpu

Martina Lessio ml4132 at columbia.edu
Thu Aug 30 18:23:28 CEST 2018


Dear Paolo,

Apologies for not including those details. The sample error message
reported in my previous email was the result of a calculation run on 18 cpu
(but I get similar messages when running on other cpu numbers larger than
16) using the following submission command:
mpirun -np 18 pw.x < MoTe2_opt.in

Thank you in advance for your help.

All the best,
Martina

On Thu, Aug 30, 2018 at 12:09 PM Paolo Giannozzi <p.giannozzi at gmail.com>
wrote:

> Please report the exact conditions under which you are running the
> 24-processor case: something like
>   mpirun -np 24 pw.x -nk .. -nd .. -whatever_option
>
> Paolo
>
> On Wed, Aug 29, 2018 at 11:49 PM, Martina Lessio <ml4132 at columbia.edu>
> wrote:
>
>> Dear all,
>>
>> I have been successfully using QE 5.4 for a while now but recently
>> decided to install the newest version hoping that some issues I have been
>> experiencing with 5.4 would be resolved. However, I now have some issues
>> when running version 6.3 in parallel. In particular, if I run a sample
>> calculation (input file provided below) on more than 16 processors the
>> calculation crashes after printing this line "Starting wfcs are random" and
>> the following error message is printed in the output file:
>> [compute-0-5.local:5241] *** An error occurred in MPI_Bcast
>> [compute-0-5.local:5241] *** on communicator MPI COMMUNICATOR 20 SPLIT
>> FROM 18
>> [compute-0-5.local:5241] *** MPI_ERR_TRUNCATE: message truncated
>> [compute-0-5.local:5241] *** MPI_ERRORS_ARE_FATAL: your MPI job will now
>> abort
>> --------------------------------------------------------------------------
>> mpirun has exited due to process rank 16 with PID 5243 on
>> node compute-0-5.local exiting improperly. There are two reasons this
>> could occur:
>>
>> 1. this process did not call "init" before exiting, but others in
>> the job did. This can cause a job to hang indefinitely while it waits
>> for all processes to call "init". By rule, if one process calls "init",
>> then ALL processes must call "init" prior to termination.
>>
>> 2. this process called "init", but exited without calling "finalize".
>> By rule, all processes that call "init" MUST call "finalize" prior to
>> exiting or it will be considered an "abnormal termination"
>>
>> This may have caused other processes in the application to be
>> terminated by signals sent by mpirun (as reported here).
>> --------------------------------------------------------------------------
>> [compute-0-5.local:05226] 1 more process has sent help message
>> help-mpi-errors.txt / mpi_errors_are_fatal
>> [compute-0-5.local:05226] Set MCA parameter "orte_base_help_aggregate" to
>> 0 to see all help / error messages
>>
>>
>> Note that I have been running QE 5.4 on 24 cpu on this same computer
>> cluster without any issue. I am copying my input file at the end of this
>> email.
>>
>> Any help with this would be greatly appreciated.
>> Thank you in advance.
>>
>> All the best,
>> Martina
>>
>> Martina Lessio
>> Department of Chemistry
>> Columbia University
>>
>> *Input file:*
>> &control
>>     calculation = 'relax'
>>     restart_mode='from_scratch',
>>     prefix='MoTe2_bulk_opt_1',
>>     pseudo_dir = '/home/mlessio/espresso-5.4.0/pseudo/',
>>     outdir='/home/mlessio/espresso-5.4.0/tempdir/'
>>  /
>>  &system
>>     ibrav= 4, A=3.530, B=3.530, C=13.882, cosAB=-0.5, cosAC=0, cosBC=0,
>>     nat= 6, ntyp= 2,
>>     ecutwfc =60.
>>     occupations='smearing', smearing='gaussian', degauss=0.01
>>     nspin =1
>>  /
>>  &electrons
>>     mixing_mode = 'plain'
>>     mixing_beta = 0.7
>>     conv_thr =  1.0d-10
>>  /
>>  &ions
>>  /
>> ATOMIC_SPECIES
>>  Mo  95.96 Mo_ONCV_PBE_FR-1.0.upf
>>  Te  127.6 Te_ONCV_PBE_FR-1.1.upf
>> ATOMIC_POSITIONS {crystal}
>> Te     0.333333334         0.666666643         0.625000034
>> Te     0.666666641         0.333333282         0.375000000
>> Te     0.666666641         0.333333282         0.125000000
>> Te     0.333333334         0.666666643         0.874999966
>> Mo     0.333333334         0.666666643         0.250000000
>> Mo     0.666666641         0.333333282         0.750000000
>>
>> K_POINTS {automatic}
>>   8 8 2 0 0 0
>>
>>
>> _______________________________________________
>> users mailing list
>> users at lists.quantum-espresso.org
>> https://lists.quantum-espresso.org/mailman/listinfo/users
>>
>
>
>
> --
> Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e Fisiche,
> Univ. Udine, via delle Scienze 208, 33100 Udine, Italy
> Phone +39-0432-558216, fax +39-0432-558222
>
> _______________________________________________
> users mailing list
> users at lists.quantum-espresso.org
> https://lists.quantum-espresso.org/mailman/listinfo/users



-- 
Martina Lessio, Ph.D.
Frontiers of Science Lecturer in Discipline
Postdoctoral Research Scientist
Department of Chemistry
Columbia University
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20180830/bad7085b/attachment.html>


More information about the users mailing list