[QE-developers] Bug in non self-consistent calculation

Thu Jun 13 17:19:41 CEST 2019

Convergence thresholds are different. The scf calculation starts with a
loose convergence criterion, then the convergence threshold is tightened as
self-consistency is approached. The non-scf calculation uses a much tighter
convergence threshold from the beginning.

On Thu, Jun 13, 2019 at 4:05 PM Lorenzo Monacelli <mesonepigreco at gmail.com>
wrote:

> Thank you Paolo for the explanation. I misunderstood what that function
> was doing.
>
> Still the thing that makes me worry is that my workaround to have band
> structure with many bands and many atoms that crashes with S matrix error
> in nscf calculation is to run the same input file as with calculation =
> "scf". It takes 10 times more (it does almost 10 iterations to converge the
> self-consistency)  but it never experience the S matrix issue... this looks
> strange. As far as I understand, this numerical issue should affect the
> diagonalization in the same way both the self consistent and the non self
> consistent calculation.
>
> Bests,
>
> Lorenzo
>
>
> On 12/06/19 22:20, Paolo Giannozzi wrote:
>
> - The overlap matrix is not the identity matrix because corrections
> vectors are not orthogonal to trial vectors and not orthogonal among them.
> - Iterative algorithm is not the best way to solve for many eigenvectors:
> it is devised to solve a number of eigenvalues that is a small fraction of
> the matrix dimension. The more eigenvalues, the less convenient and the
> more unstable it becomes.
> - I am quite sure that there is no "true" bug (uninitialized variables and
> the like) and that the algorithm is "analytically" correct, so to speak.
> Under some circumstances, the overlap matrix has a very small negative
> eigenvalue. An "analytically computed" overlap matrix can't have a negative
> eigenvalue, by construction. One can find a workaround, but you need one
> for each of the many cases: real, hermitian, with serial or parallel
> subspace diagonalization, ... I would prefer to understand what triggers
> the appearance of a negative eigenvalue, but it is not that simple.
>
> Paolo
> On Wed, Jun 12, 2019 at 3:55 PM Lorenzo Monacelli <mesonepigreco at gmail.com>
> wrote:
>
>> Dear all,
>>
>> Thank you for your replies.
>>
>> I have two different versions on two different machines. The one I sent
>> you my results was compiled with gfortran and standard lapack/blas as
>> provided from the ubuntu-software library (16.04). The other is compiled
>> with the intel compiler and MKL and runs on a cluster. Both of them
>> experienced the issue quite randomly. I attach my make.inc of both
>> compilations.
>>
>> The same issue, at least on a different input file, was experienced by
>> the pw.x (v 6.2) already pre-installed in the Spanish MARENOSTRUM cluster,
>> that I assume was correctly compiled.
>>
>> I noticed that a way to reproduce the error is asking for many bands in
>> the nscf calculation in a system with many atoms (with few symmetries) in
>> the cell ( with 96 atoms it almost impossible for me to run a nscf
>> calculation).
>>
>> It is possible that the different behavior on different machines is
>> actually suggesting that the bug could be located in some variable
>> ill-initialized (that its automatic initialization is maybe demanded to the
>> compiler)?
>>
>> Another question: How does cdiaghg work? I assumed that the S matrix
>> should be the identity for local norm conserving pseudos and GGA xc
>> functionals, but if I enforce it to be the identity at the begining of the
>> subroutine the code is no more able to converge any calculation (even in
>> the scf, where now it works). I am a bit skeptical thinking that this is
>> just an error of LAPACK or MPI: why does SCF with the same input (that
>> should solve the same problem as the nscf but many times) works very well
>> (even with many atoms and even if I ask many bands)?
>>
>> Bests,
>>
>> Lorenzo
>>
>>
>> On 12/06/19 14:37, Paolo Giannozzi wrote:
>>
>> I was about to write the same, before noticing that the crash occurs
>> randomly (one run completes, a subsequent one doesn't). Unless some
>> regularity is found (that is: under conditions xyz, the code always
>> crashes) it will be impossible to locate the origin of the problem. Note
>> that the origin of the problem might well be in mathematical libraries, or
>> in MPI. I am 100% sure that in at least some cases diagonalization failures
>> were due to some misbehavior of mathematical libraries (but this was many
>> years ago, on machines that do not exist any longer). Also: a frequent
>> source of random crashes in parallel execution is explained in sec.7.3 of
>> the developer manual,
>> http://www.quantum-espresso.org/Doc/developer_man/developer_man.html#SECTION00080000000000000000
>>
>> Paolo
>>
>> On Wed, Jun 12, 2019 at 2:03 PM Davide Ceresoli <davide.ceresoli at cnr.it>
>> wrote:
>>
>>> Dear Lorenzo,
>>>      is your QE compiled with a decent compiler and with decent
>>> libraries?
>>> Your inputs works perfectly for me, with no crashes.
>>>
>>> HTH.
>>> D.
>>>
>>>
>>>
>>> On 6/12/19 12:29 PM, Lorenzo Monacelli wrote:
>>> > Dear QE developers,
>>> >
>>> > I think I found a bad bug in the non self-consistent calculation of
>>> pw.x
>>> >
>>> > While the self consistent calculation ends properly, when running a
>>> non
>>> > self-consistent calculation results in a crash with the error:
>>> >
>>> >
>>>  %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
>>> >       task #         0
>>> >       from cdiaghg : error #        40
>>> >       S matrix not positive definite
>>> >
>>>  %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
>>> >
>>> > I checked the cdiaghg subroutine, the S matrix should be the overlap
>>> matrix for
>>> > the eigenvalue problem Hv = eSv
>>> >
>>> > That, in case of local Norm Conserving pseudo of Hydrogen (my
>>> calculation)  I
>>> > suppose it should be the identity, however, if I enforce it to be the
>>> indentity
>>> > at the beginning of cdiaghg the code says that it is not able to
>>> converge the
>>> > scf caclulation either.
>>> >
>>> > I attach the input of the scf calculation (that converges) and the one
>>> of the
>>> > non-self-consistent calculation (that produces this output).
>>> >
>>> > I also tried to switch the diagonalization method to cg as suggested
>>> as fix, but
>>> > nothing changes.
>>> >
>>> > I modified also the cdiaghg subroutine, to print the S matrix, that
>>> you find
>>> > attached (random numbers, seems to be uninitialized).
>>> >
>>> > In both the diagonalization methods if I enforce S to be the identity
>>> matrix the
>>> > code crashes by saying that it was not able to converge.
>>> >
>>> > The problem seems to arise especially if I request for more bands with
>>> the nbnd
>>> > flag in system (but sometimes it occurs even if no extra band is
>>> required).
>>> >
>>> > The QE version I used is the current version in the develop branch of
>>> gitlab,
>>> > but I noticed the same error occurring also with 6.3 and 6.2 in other
>>> cases.
>>> >
>>> > If I ask for exactly the same input file a scf calculation (instead of
>>> a nscf)
>>> > everything goes fine (same K points, same diagonalization, same number
>>> of
>>> > extrabands), but indeed, this is not what I would like to do...
>>> >
>>> > I I run the nscf calculation after a scf calculation with exactly the
>>> same input
>>> > (that works), the nscf calculation fails (this means that the crash is
>>> not
>>> > caused by a bad starting point for the density).
>>> >
>>> > All these make me really think of a bug in the nscf calculation,
>>> rather than a
>>> > wrong input.
>>> >
>>> > Best regards,
>>> >
>>> > Lorenzo Monacelli
>>> >
>>> >
>>> > P.S.
>>> >
>>> > In the attached file the pw_* are the nscf input and output, the scf*
>>> are the
>>> > scf input and output. I run
>>> >
>>> >
>>> >
>>> > _______________________________________________
>>> > developers mailing list
>>> > developers at lists.quantum-espresso.org
>>> > https://lists.quantum-espresso.org/mailman/listinfo/developers
>>> >
>>>
>>> --
>>> +--------------------------------------------------------------+
>>>    Davide Ceresoli
>>>    CNR Institute of Molecular Science and Technology (CNR-ISTM)
>>>    c/o University of Milan, via Golgi 19, 20133 Milan, Italy
>>>    Email: davide.ceresoli at cnr.it
>>>    Phone: +39-02-50314276, +39-347-1001570 (mobile)
>>>    Skype: dceresoli
>>>    Website: http://sites.google.com/site/dceresoli/
>>> +--------------------------------------------------------------+
>>> _______________________________________________
>>> developers mailing list
>>> developers at lists.quantum-espresso.org
>>> https://lists.quantum-espresso.org/mailman/listinfo/developers
>>>
>>
>>
>> --
>> Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e Fisiche,
>> Univ. Udine, via delle Scienze 208, 33100 Udine, Italy
>> Phone +39-0432-558216, fax +39-0432-558222
>>
>>
>> _______________________________________________
>> developers mailing listdevelopers at lists.quantum-espresso.orghttps://lists.quantum-espresso.org/mailman/listinfo/developers
>>
>> _______________________________________________
>> developers mailing list
>> developers at lists.quantum-espresso.org
>> https://lists.quantum-espresso.org/mailman/listinfo/developers
>>
>
>
> --
> Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e Fisiche,
> Univ. Udine, via delle Scienze 208, 33100 Udine, Italy
> Phone +39-0432-558216, fax +39-0432-558222
>
>
> _______________________________________________
> developers mailing listdevelopers at lists.quantum-espresso.orghttps://lists.quantum-espresso.org/mailman/listinfo/developers
>
> _______________________________________________
> developers mailing list
> developers at lists.quantum-espresso.org
> https://lists.quantum-espresso.org/mailman/listinfo/developers
>

-- 
Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e Fisiche,
Univ. Udine, via delle Scienze 208, 33100 Udine, Italy
Phone +39-0432-558216, fax +39-0432-558222
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/developers/attachments/20190613/663e9d09/attachment-0001.html>