[QE-users] Issue with running parallel version of QE 6.3 on more than 16 cpu

Paolo Giannozzi p.giannozzi at gmail.com
Thu Aug 30 19:19:20 CEST 2018


In my opinion, if everything works on less than 16 processors and nothing
on more than 16, there is something wrong with your MPI environment

On Thu, Aug 30, 2018 at 6:49 PM, Martina Lessio <ml4132 at columbia.edu> wrote:

> Dear Paolo,
>
> Thanks for testing my input file. I guess this means there is something
> wrong with how I compiled the code, although it's hard to understand why
> the error only occurs when I submit my job on more than 16 processors.
>
> Thanks again for your time.
>
> All the best,
> Martina
>
> On Thu, Aug 30, 2018 at 12:43 PM Paolo Giannozzi <p.giannozzi at gmail.com>
> wrote:
>
>> It works for me, on 18, 24, 32 processors, at least for the development
>> version. I ran it on a 16-processor machine, but it doesn't matter how many
>> physical cores one has (the code knows nothing about the actual number of
>> cores, only about the number of MPI processes)
>>
>> Paolo
>>
>> On Thu, Aug 30, 2018 at 6:23 PM, Martina Lessio <ml4132 at columbia.edu>
>> wrote:
>>
>>> Dear Paolo,
>>>
>>> Apologies for not including those details. The sample error message
>>> reported in my previous email was the result of a calculation run on 18 cpu
>>> (but I get similar messages when running on other cpu numbers larger than
>>> 16) using the following submission command:
>>> mpirun -np 18 pw.x < MoTe2_opt.in
>>>
>>> Thank you in advance for your help.
>>>
>>> All the best,
>>> Martina
>>>
>>> On Thu, Aug 30, 2018 at 12:09 PM Paolo Giannozzi <p.giannozzi at gmail.com>
>>> wrote:
>>>
>>>> Please report the exact conditions under which you are running the
>>>> 24-processor case: something like
>>>>   mpirun -np 24 pw.x -nk .. -nd .. -whatever_option
>>>>
>>>> Paolo
>>>>
>>>> On Wed, Aug 29, 2018 at 11:49 PM, Martina Lessio <ml4132 at columbia.edu>
>>>> wrote:
>>>>
>>>>> Dear all,
>>>>>
>>>>> I have been successfully using QE 5.4 for a while now but recently
>>>>> decided to install the newest version hoping that some issues I have been
>>>>> experiencing with 5.4 would be resolved. However, I now have some issues
>>>>> when running version 6.3 in parallel. In particular, if I run a sample
>>>>> calculation (input file provided below) on more than 16 processors the
>>>>> calculation crashes after printing this line "Starting wfcs are random" and
>>>>> the following error message is printed in the output file:
>>>>> [compute-0-5.local:5241] *** An error occurred in MPI_Bcast
>>>>> [compute-0-5.local:5241] *** on communicator MPI COMMUNICATOR 20 SPLIT
>>>>> FROM 18
>>>>> [compute-0-5.local:5241] *** MPI_ERR_TRUNCATE: message truncated
>>>>> [compute-0-5.local:5241] *** MPI_ERRORS_ARE_FATAL: your MPI job will
>>>>> now abort
>>>>> ------------------------------------------------------------
>>>>> --------------
>>>>> mpirun has exited due to process rank 16 with PID 5243 on
>>>>> node compute-0-5.local exiting improperly. There are two reasons this
>>>>> could occur:
>>>>>
>>>>> 1. this process did not call "init" before exiting, but others in
>>>>> the job did. This can cause a job to hang indefinitely while it waits
>>>>> for all processes to call "init". By rule, if one process calls "init",
>>>>> then ALL processes must call "init" prior to termination.
>>>>>
>>>>> 2. this process called "init", but exited without calling "finalize".
>>>>> By rule, all processes that call "init" MUST call "finalize" prior to
>>>>> exiting or it will be considered an "abnormal termination"
>>>>>
>>>>> This may have caused other processes in the application to be
>>>>> terminated by signals sent by mpirun (as reported here).
>>>>> ------------------------------------------------------------
>>>>> --------------
>>>>> [compute-0-5.local:05226] 1 more process has sent help message
>>>>> help-mpi-errors.txt / mpi_errors_are_fatal
>>>>> [compute-0-5.local:05226] Set MCA parameter "orte_base_help_aggregate"
>>>>> to 0 to see all help / error messages
>>>>>
>>>>>
>>>>> Note that I have been running QE 5.4 on 24 cpu on this same computer
>>>>> cluster without any issue. I am copying my input file at the end of this
>>>>> email.
>>>>>
>>>>> Any help with this would be greatly appreciated.
>>>>> Thank you in advance.
>>>>>
>>>>> All the best,
>>>>> Martina
>>>>>
>>>>> Martina Lessio
>>>>> Department of Chemistry
>>>>> Columbia University
>>>>>
>>>>> *Input file:*
>>>>> &control
>>>>>     calculation = 'relax'
>>>>>     restart_mode='from_scratch',
>>>>>     prefix='MoTe2_bulk_opt_1',
>>>>>     pseudo_dir = '/home/mlessio/espresso-5.4.0/pseudo/',
>>>>>     outdir='/home/mlessio/espresso-5.4.0/tempdir/'
>>>>>  /
>>>>>  &system
>>>>>     ibrav= 4, A=3.530, B=3.530, C=13.882, cosAB=-0.5, cosAC=0, cosBC=0,
>>>>>     nat= 6, ntyp= 2,
>>>>>     ecutwfc =60.
>>>>>     occupations='smearing', smearing='gaussian', degauss=0.01
>>>>>     nspin =1
>>>>>  /
>>>>>  &electrons
>>>>>     mixing_mode = 'plain'
>>>>>     mixing_beta = 0.7
>>>>>     conv_thr =  1.0d-10
>>>>>  /
>>>>>  &ions
>>>>>  /
>>>>> ATOMIC_SPECIES
>>>>>  Mo  95.96 Mo_ONCV_PBE_FR-1.0.upf
>>>>>  Te  127.6 Te_ONCV_PBE_FR-1.1.upf
>>>>> ATOMIC_POSITIONS {crystal}
>>>>> Te     0.333333334         0.666666643         0.625000034
>>>>> Te     0.666666641         0.333333282         0.375000000
>>>>> Te     0.666666641         0.333333282         0.125000000
>>>>> Te     0.333333334         0.666666643         0.874999966
>>>>> Mo     0.333333334         0.666666643         0.250000000
>>>>> Mo     0.666666641         0.333333282         0.750000000
>>>>>
>>>>> K_POINTS {automatic}
>>>>>   8 8 2 0 0 0
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> users at lists.quantum-espresso.org
>>>>> https://lists.quantum-espresso.org/mailman/listinfo/users
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e Fisiche,
>>>> Univ. Udine, via delle Scienze 208, 33100 Udine, Italy
>>>> Phone +39-0432-558216, fax +39-0432-558222
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> users at lists.quantum-espresso.org
>>>> https://lists.quantum-espresso.org/mailman/listinfo/users
>>>
>>>
>>>
>>> --
>>> Martina Lessio, Ph.D.
>>> Frontiers of Science Lecturer in Discipline
>>> Postdoctoral Research Scientist
>>> Department of Chemistry
>>> Columbia University
>>>
>>> _______________________________________________
>>> users mailing list
>>> users at lists.quantum-espresso.org
>>> https://lists.quantum-espresso.org/mailman/listinfo/users
>>>
>>
>>
>>
>> --
>> Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e Fisiche,
>> Univ. Udine, via delle Scienze 208, 33100 Udine, Italy
>> Phone +39-0432-558216, fax +39-0432-558222
>>
>> _______________________________________________
>> users mailing list
>> users at lists.quantum-espresso.org
>> https://lists.quantum-espresso.org/mailman/listinfo/users
>
>
>
> --
> Martina Lessio, Ph.D.
> Frontiers of Science Lecturer in Discipline
> Postdoctoral Research Scientist
> Department of Chemistry
> Columbia University
>
> _______________________________________________
> users mailing list
> users at lists.quantum-espresso.org
> https://lists.quantum-espresso.org/mailman/listinfo/users
>



-- 
Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e Fisiche,
Univ. Udine, via delle Scienze 208, 33100 Udine, Italy
Phone +39-0432-558216, fax +39-0432-558222
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20180830/239979da/attachment.html>


More information about the users mailing list