[Pw_forum] problem with DFT+U

Sergi Vela sergi.vela at gmail.com
Wed Nov 23 16:13:16 CET 2016


Dear Paolo,

Unfortunately, there's not much to report so far. Many "relax" jobs for a
system of ca. 500 atoms (including Fe) fail giving the same message Davide
reported long time ago:
_________________

Fatal error in PMPI_Bcast: Other MPI error, error stack:
PMPI_Bcast(2434)........: MPI_Bcast(buf=0x8b25e30, count=7220,
MPI_DOUBLE_PRECISION, root=0, comm=0x84000007) failed
MPIR_Bcast_impl(1807)...:
MPIR_Bcast(1835)........:
I_MPIR_Bcast_intra(2016): Failure during collective
MPIR_Bcast_intra(1665)..: Failure during collective
_________________

It only occurs in some architectures. The same inputs work for me in 2
other machines, so it seems to be related to the compilation. The support
team of the HPC center I'm working on is trying to identify the problem.
Also, it seems to occur randomly. In the sense that for some DFT+U
calculations of the same type (same cutoffs, pp's, system) there is no
problem at all.

I'll try to be more helpful next time, and I'll keep you updated.

Bests,
Sergi

2016-11-23 15:21 GMT+01:00 Paolo Giannozzi <p.giannozzi at gmail.com>:

> Thank you, but unless an example demonstrating the problem is provided, or
> at least some information on where this message come from is supplied,
> there is close to nothing that can be done
>
> Paolo
>
> On Wed, Nov 23, 2016 at 10:05 AM, Sergi Vela <sergi.vela at gmail.com> wrote:
>
>> Dear Colleagues,
>>
>> Just to report that I'm having exactly the same problem with DFT+U. The
>> same message is appearing randomly only when I use the Hubbard term. I
>> could test versions 5.2 and 6.0 and it occurs in both.
>>
>> All my best,
>> Sergi
>>
>> 2015-07-16 18:43 GMT+02:00 Paolo Giannozzi <p.giannozzi at gmail.com>:
>>
>>> There are many well-known problems of DFT+U, but none that is known to
>>> crash jobs with an obscure message.
>>>
>>> Rank 21 [Thu Jul 16 15:51:04 2015] [c4-2c0s15n2] Fatal error in
>>>> PMPI_Bcast: Message truncated, error stack:
>>>> PMPI_Bcast(1615)..................: MPI_Bcast(buf=0x75265e0,
>>>> count=160, MPI_DOUBLE_PRECISION, root=0, comm=0xc4000000) failed
>>>>
>>>
>>> this signals a mismatch between what is sent and what is received in a
>>> broadcast operation. This may be due to an obvious bug, that however should
>>> show up at the first iteration, not after XX. Apart compiler or MPI library
>>> bugs, another reason is the one described in sec.8.3 of the developer
>>> manual: different processes following a different execution paths. From
>>> time to time, cases like this are found  (the latest occurrence, in band
>>> parallelization of exact exchange) and easily fixed. Unfortunately, finding
>>> them (that is: where this happens) typically requires a painstaking
>>> parallel debugging.
>>>
>>> Paolo
>>> --
>>> Paolo Giannozzi, Dept. Chemistry&Physics&Environment,
>>> Univ. Udine, via delle Scienze 208, 33100 Udine, Italy
>>> Phone +39-0432-558216, fax +39-0432-558222
>>>
>>> _______________________________________________
>>> Pw_forum mailing list
>>> Pw_forum at pwscf.org
>>> http://pwscf.org/mailman/listinfo/pw_forum
>>>
>>
>>
>> _______________________________________________
>> Pw_forum mailing list
>> Pw_forum at pwscf.org
>> http://pwscf.org/mailman/listinfo/pw_forum
>>
>
>
>
> --
> Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e Fisiche,
> Univ. Udine, via delle Scienze 208, 33100 Udine, Italy
> Phone +39-0432-558216, fax +39-0432-558222
>
>
> _______________________________________________
> Pw_forum mailing list
> Pw_forum at pwscf.org
> http://pwscf.org/mailman/listinfo/pw_forum
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20161123/a0704ba9/attachment.html>


More information about the users mailing list