[QE-users] Issue with running parallel version of QE 6.3 on more than 16 cpu

Martina Lessio ml4132 at columbia.edu
Thu Aug 30 19:32:00 CEST 2018


Thank you so much, Paolo. I will report your feedback to the IT staff in
charge of maintaining the cluster I use and see of they have any suggestion.

All the best,
Martina

On Thu, Aug 30, 2018 at 1:20 PM Paolo Giannozzi <p.giannozzi at gmail.com>
wrote:

> In my opinion, if everything works on less than 16 processors and nothing
> on more than 16, there is something wrong with your MPI environment
>
> On Thu, Aug 30, 2018 at 6:49 PM, Martina Lessio <ml4132 at columbia.edu>
> wrote:
>
>> Dear Paolo,
>>
>> Thanks for testing my input file. I guess this means there is something
>> wrong with how I compiled the code, although it's hard to understand why
>> the error only occurs when I submit my job on more than 16 processors.
>>
>> Thanks again for your time.
>>
>> All the best,
>> Martina
>>
>> On Thu, Aug 30, 2018 at 12:43 PM Paolo Giannozzi <p.giannozzi at gmail.com>
>> wrote:
>>
>>> It works for me, on 18, 24, 32 processors, at least for the development
>>> version. I ran it on a 16-processor machine, but it doesn't matter how many
>>> physical cores one has (the code knows nothing about the actual number of
>>> cores, only about the number of MPI processes)
>>>
>>> Paolo
>>>
>>> On Thu, Aug 30, 2018 at 6:23 PM, Martina Lessio <ml4132 at columbia.edu>
>>> wrote:
>>>
>>>> Dear Paolo,
>>>>
>>>> Apologies for not including those details. The sample error message
>>>> reported in my previous email was the result of a calculation run on 18 cpu
>>>> (but I get similar messages when running on other cpu numbers larger than
>>>> 16) using the following submission command:
>>>> mpirun -np 18 pw.x < MoTe2_opt.in
>>>>
>>>> Thank you in advance for your help.
>>>>
>>>> All the best,
>>>> Martina
>>>>
>>>> On Thu, Aug 30, 2018 at 12:09 PM Paolo Giannozzi <p.giannozzi at gmail.com>
>>>> wrote:
>>>>
>>>>> Please report the exact conditions under which you are running the
>>>>> 24-processor case: something like
>>>>>   mpirun -np 24 pw.x -nk .. -nd .. -whatever_option
>>>>>
>>>>> Paolo
>>>>>
>>>>> On Wed, Aug 29, 2018 at 11:49 PM, Martina Lessio <ml4132 at columbia.edu>
>>>>> wrote:
>>>>>
>>>>>> Dear all,
>>>>>>
>>>>>> I have been successfully using QE 5.4 for a while now but recently
>>>>>> decided to install the newest version hoping that some issues I have been
>>>>>> experiencing with 5.4 would be resolved. However, I now have some issues
>>>>>> when running version 6.3 in parallel. In particular, if I run a sample
>>>>>> calculation (input file provided below) on more than 16 processors the
>>>>>> calculation crashes after printing this line "Starting wfcs are random" and
>>>>>> the following error message is printed in the output file:
>>>>>> [compute-0-5.local:5241] *** An error occurred in MPI_Bcast
>>>>>> [compute-0-5.local:5241] *** on communicator MPI COMMUNICATOR 20
>>>>>> SPLIT FROM 18
>>>>>> [compute-0-5.local:5241] *** MPI_ERR_TRUNCATE: message truncated
>>>>>> [compute-0-5.local:5241] *** MPI_ERRORS_ARE_FATAL: your MPI job will
>>>>>> now abort
>>>>>>
>>>>>> --------------------------------------------------------------------------
>>>>>> mpirun has exited due to process rank 16 with PID 5243 on
>>>>>> node compute-0-5.local exiting improperly. There are two reasons this
>>>>>> could occur:
>>>>>>
>>>>>> 1. this process did not call "init" before exiting, but others in
>>>>>> the job did. This can cause a job to hang indefinitely while it waits
>>>>>> for all processes to call "init". By rule, if one process calls
>>>>>> "init",
>>>>>> then ALL processes must call "init" prior to termination.
>>>>>>
>>>>>> 2. this process called "init", but exited without calling "finalize".
>>>>>> By rule, all processes that call "init" MUST call "finalize" prior to
>>>>>> exiting or it will be considered an "abnormal termination"
>>>>>>
>>>>>> This may have caused other processes in the application to be
>>>>>> terminated by signals sent by mpirun (as reported here).
>>>>>>
>>>>>> --------------------------------------------------------------------------
>>>>>> [compute-0-5.local:05226] 1 more process has sent help message
>>>>>> help-mpi-errors.txt / mpi_errors_are_fatal
>>>>>> [compute-0-5.local:05226] Set MCA parameter
>>>>>> "orte_base_help_aggregate" to 0 to see all help / error messages
>>>>>>
>>>>>>
>>>>>> Note that I have been running QE 5.4 on 24 cpu on this same computer
>>>>>> cluster without any issue. I am copying my input file at the end of this
>>>>>> email.
>>>>>>
>>>>>> Any help with this would be greatly appreciated.
>>>>>> Thank you in advance.
>>>>>>
>>>>>> All the best,
>>>>>> Martina
>>>>>>
>>>>>> Martina Lessio
>>>>>> Department of Chemistry
>>>>>> Columbia University
>>>>>>
>>>>>> *Input file:*
>>>>>> &control
>>>>>>     calculation = 'relax'
>>>>>>     restart_mode='from_scratch',
>>>>>>     prefix='MoTe2_bulk_opt_1',
>>>>>>     pseudo_dir = '/home/mlessio/espresso-5.4.0/pseudo/',
>>>>>>     outdir='/home/mlessio/espresso-5.4.0/tempdir/'
>>>>>>  /
>>>>>>  &system
>>>>>>     ibrav= 4, A=3.530, B=3.530, C=13.882, cosAB=-0.5, cosAC=0,
>>>>>> cosBC=0,
>>>>>>     nat= 6, ntyp= 2,
>>>>>>     ecutwfc =60.
>>>>>>     occupations='smearing', smearing='gaussian', degauss=0.01
>>>>>>     nspin =1
>>>>>>  /
>>>>>>  &electrons
>>>>>>     mixing_mode = 'plain'
>>>>>>     mixing_beta = 0.7
>>>>>>     conv_thr =  1.0d-10
>>>>>>  /
>>>>>>  &ions
>>>>>>  /
>>>>>> ATOMIC_SPECIES
>>>>>>  Mo  95.96 Mo_ONCV_PBE_FR-1.0.upf
>>>>>>  Te  127.6 Te_ONCV_PBE_FR-1.1.upf
>>>>>> ATOMIC_POSITIONS {crystal}
>>>>>> Te     0.333333334         0.666666643         0.625000034
>>>>>> Te     0.666666641         0.333333282         0.375000000
>>>>>> Te     0.666666641         0.333333282         0.125000000
>>>>>> Te     0.333333334         0.666666643         0.874999966
>>>>>> Mo     0.333333334         0.666666643         0.250000000
>>>>>> Mo     0.666666641         0.333333282         0.750000000
>>>>>>
>>>>>> K_POINTS {automatic}
>>>>>>   8 8 2 0 0 0
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> users at lists.quantum-espresso.org
>>>>>> https://lists.quantum-espresso.org/mailman/listinfo/users
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e Fisiche,
>>>>> Univ. Udine, via delle Scienze 208, 33100 Udine, Italy
>>>>> Phone +39-0432-558216, fax +39-0432-558222
>>>>>
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> users at lists.quantum-espresso.org
>>>>> https://lists.quantum-espresso.org/mailman/listinfo/users
>>>>
>>>>
>>>>
>>>> --
>>>> Martina Lessio, Ph.D.
>>>> Frontiers of Science Lecturer in Discipline
>>>> Postdoctoral Research Scientist
>>>> Department of Chemistry
>>>> Columbia University
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> users at lists.quantum-espresso.org
>>>> https://lists.quantum-espresso.org/mailman/listinfo/users
>>>>
>>>
>>>
>>>
>>> --
>>> Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e Fisiche,
>>> Univ. Udine, via delle Scienze 208, 33100 Udine, Italy
>>> Phone +39-0432-558216, fax +39-0432-558222
>>>
>>> _______________________________________________
>>> users mailing list
>>> users at lists.quantum-espresso.org
>>> https://lists.quantum-espresso.org/mailman/listinfo/users
>>
>>
>>
>> --
>> Martina Lessio, Ph.D.
>> Frontiers of Science Lecturer in Discipline
>> Postdoctoral Research Scientist
>> Department of Chemistry
>> Columbia University
>>
>> _______________________________________________
>> users mailing list
>> users at lists.quantum-espresso.org
>> https://lists.quantum-espresso.org/mailman/listinfo/users
>>
>
>
>
> --
> Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e Fisiche,
> Univ. Udine, via delle Scienze 208, 33100 Udine, Italy
> Phone +39-0432-558216, fax +39-0432-558222
>
> _______________________________________________
> users mailing list
> users at lists.quantum-espresso.org
> https://lists.quantum-espresso.org/mailman/listinfo/users



-- 
Martina Lessio, Ph.D.
Frontiers of Science Lecturer in Discipline
Postdoctoral Research Scientist
Department of Chemistry
Columbia University
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20180830/780cf85e/attachment.html>


More information about the users mailing list