[QE-users] Issue with running parallel version of QE 6.3 on more than 16 cpu

Paolo Giannozzi p.giannozzi at gmail.com
Thu Aug 30 18:43:02 CEST 2018


It works for me, on 18, 24, 32 processors, at least for the development
version. I ran it on a 16-processor machine, but it doesn't matter how many
physical cores one has (the code knows nothing about the actual number of
cores, only about the number of MPI processes)

Paolo

On Thu, Aug 30, 2018 at 6:23 PM, Martina Lessio <ml4132 at columbia.edu> wrote:

> Dear Paolo,
>
> Apologies for not including those details. The sample error message
> reported in my previous email was the result of a calculation run on 18 cpu
> (but I get similar messages when running on other cpu numbers larger than
> 16) using the following submission command:
> mpirun -np 18 pw.x < MoTe2_opt.in
>
> Thank you in advance for your help.
>
> All the best,
> Martina
>
> On Thu, Aug 30, 2018 at 12:09 PM Paolo Giannozzi <p.giannozzi at gmail.com>
> wrote:
>
>> Please report the exact conditions under which you are running the
>> 24-processor case: something like
>>   mpirun -np 24 pw.x -nk .. -nd .. -whatever_option
>>
>> Paolo
>>
>> On Wed, Aug 29, 2018 at 11:49 PM, Martina Lessio <ml4132 at columbia.edu>
>> wrote:
>>
>>> Dear all,
>>>
>>> I have been successfully using QE 5.4 for a while now but recently
>>> decided to install the newest version hoping that some issues I have been
>>> experiencing with 5.4 would be resolved. However, I now have some issues
>>> when running version 6.3 in parallel. In particular, if I run a sample
>>> calculation (input file provided below) on more than 16 processors the
>>> calculation crashes after printing this line "Starting wfcs are random" and
>>> the following error message is printed in the output file:
>>> [compute-0-5.local:5241] *** An error occurred in MPI_Bcast
>>> [compute-0-5.local:5241] *** on communicator MPI COMMUNICATOR 20 SPLIT
>>> FROM 18
>>> [compute-0-5.local:5241] *** MPI_ERR_TRUNCATE: message truncated
>>> [compute-0-5.local:5241] *** MPI_ERRORS_ARE_FATAL: your MPI job will now
>>> abort
>>> ------------------------------------------------------------
>>> --------------
>>> mpirun has exited due to process rank 16 with PID 5243 on
>>> node compute-0-5.local exiting improperly. There are two reasons this
>>> could occur:
>>>
>>> 1. this process did not call "init" before exiting, but others in
>>> the job did. This can cause a job to hang indefinitely while it waits
>>> for all processes to call "init". By rule, if one process calls "init",
>>> then ALL processes must call "init" prior to termination.
>>>
>>> 2. this process called "init", but exited without calling "finalize".
>>> By rule, all processes that call "init" MUST call "finalize" prior to
>>> exiting or it will be considered an "abnormal termination"
>>>
>>> This may have caused other processes in the application to be
>>> terminated by signals sent by mpirun (as reported here).
>>> ------------------------------------------------------------
>>> --------------
>>> [compute-0-5.local:05226] 1 more process has sent help message
>>> help-mpi-errors.txt / mpi_errors_are_fatal
>>> [compute-0-5.local:05226] Set MCA parameter "orte_base_help_aggregate"
>>> to 0 to see all help / error messages
>>>
>>>
>>> Note that I have been running QE 5.4 on 24 cpu on this same computer
>>> cluster without any issue. I am copying my input file at the end of this
>>> email.
>>>
>>> Any help with this would be greatly appreciated.
>>> Thank you in advance.
>>>
>>> All the best,
>>> Martina
>>>
>>> Martina Lessio
>>> Department of Chemistry
>>> Columbia University
>>>
>>> *Input file:*
>>> &control
>>>     calculation = 'relax'
>>>     restart_mode='from_scratch',
>>>     prefix='MoTe2_bulk_opt_1',
>>>     pseudo_dir = '/home/mlessio/espresso-5.4.0/pseudo/',
>>>     outdir='/home/mlessio/espresso-5.4.0/tempdir/'
>>>  /
>>>  &system
>>>     ibrav= 4, A=3.530, B=3.530, C=13.882, cosAB=-0.5, cosAC=0, cosBC=0,
>>>     nat= 6, ntyp= 2,
>>>     ecutwfc =60.
>>>     occupations='smearing', smearing='gaussian', degauss=0.01
>>>     nspin =1
>>>  /
>>>  &electrons
>>>     mixing_mode = 'plain'
>>>     mixing_beta = 0.7
>>>     conv_thr =  1.0d-10
>>>  /
>>>  &ions
>>>  /
>>> ATOMIC_SPECIES
>>>  Mo  95.96 Mo_ONCV_PBE_FR-1.0.upf
>>>  Te  127.6 Te_ONCV_PBE_FR-1.1.upf
>>> ATOMIC_POSITIONS {crystal}
>>> Te     0.333333334         0.666666643         0.625000034
>>> Te     0.666666641         0.333333282         0.375000000
>>> Te     0.666666641         0.333333282         0.125000000
>>> Te     0.333333334         0.666666643         0.874999966
>>> Mo     0.333333334         0.666666643         0.250000000
>>> Mo     0.666666641         0.333333282         0.750000000
>>>
>>> K_POINTS {automatic}
>>>   8 8 2 0 0 0
>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> users at lists.quantum-espresso.org
>>> https://lists.quantum-espresso.org/mailman/listinfo/users
>>>
>>
>>
>>
>> --
>> Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e Fisiche,
>> Univ. Udine, via delle Scienze 208, 33100 Udine, Italy
>> Phone +39-0432-558216, fax +39-0432-558222
>>
>> _______________________________________________
>> users mailing list
>> users at lists.quantum-espresso.org
>> https://lists.quantum-espresso.org/mailman/listinfo/users
>
>
>
> --
> Martina Lessio, Ph.D.
> Frontiers of Science Lecturer in Discipline
> Postdoctoral Research Scientist
> Department of Chemistry
> Columbia University
>
> _______________________________________________
> users mailing list
> users at lists.quantum-espresso.org
> https://lists.quantum-espresso.org/mailman/listinfo/users
>



-- 
Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e Fisiche,
Univ. Udine, via delle Scienze 208, 33100 Udine, Italy
Phone +39-0432-558216, fax +39-0432-558222
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20180830/89e8049b/attachment.html>


More information about the users mailing list