[QE-developers] Bug in non self-consistent calculation

Thu Jun 13 16:04:22 CEST 2019

Thank you Paolo for the explanation. I misunderstood what that function 
was doing.

Still the thing that makes me worry is that my workaround to have band 
structure with many bands and many atoms that crashes with S matrix 
error in nscf calculation is to run the same input file as with 
calculation = "scf". It takes 10 times more (it does almost 10 
iterations to converge the self-consistency)  but it never experience 
the S matrix issue... this looks strange. As far as I understand, this 
numerical issue should affect the diagonalization in the same way both 
the self consistent and the non self consistent calculation.

Bests,

Lorenzo

On 12/06/19 22:20, Paolo Giannozzi wrote:
> - The overlap matrix is not the identity matrix because corrections 
> vectors are not orthogonal to trial vectors and not orthogonal among them.
> - Iterative algorithm is not the best way to solve for many 
> eigenvectors: it is devised to solve a number of eigenvalues that is a 
> small fraction of the matrix dimension. The more eigenvalues, the less 
> convenient and the more unstable it becomes.
> - I am quite sure that there is no "true" bug (uninitialized variables 
> and the like) and that the algorithm is "analytically" correct, so to 
> speak. Under some circumstances, the overlap matrix has a very small 
> negative eigenvalue. An "analytically computed" overlap matrix can't 
> have a negative eigenvalue, by construction. One can find a 
> workaround, but you need one for each of the many cases: real, 
> hermitian, with serial or parallel subspace diagonalization, ... I 
> would prefer to understand what triggers the appearance of a negative 
> eigenvalue, but it is not that simple.
>
> Paolo
> On Wed, Jun 12, 2019 at 3:55 PM Lorenzo Monacelli 
> <mesonepigreco at gmail.com <mailto:mesonepigreco at gmail.com>> wrote:
>
>     Dear all,
>
>     Thank you for your replies.
>
>     I have two different versions on two different machines. The one I
>     sent you my results was compiled with gfortran and standard
>     lapack/blas as provided from the ubuntu-software library (16.04).
>     The other is compiled with the intel compiler and MKL and runs on
>     a cluster. Both of them experienced the issue quite randomly. I
>     attach my make.inc of both compilations.
>
>     The same issue, at least on a different input file, was
>     experienced by the pw.x (v 6.2) already pre-installed in the
>     Spanish MARENOSTRUM cluster, that I assume was correctly compiled.
>
>     I noticed that a way to reproduce the error is asking for many
>     bands in the nscf calculation in a system with many atoms (with
>     few symmetries) in the cell ( with 96 atoms it almost impossible
>     for me to run a nscf calculation).
>
>     It is possible that the different behavior on different machines
>     is actually suggesting that the bug could be located in some
>     variable ill-initialized (that its automatic initialization is
>     maybe demanded to the compiler)?
>
>     Another question: How does cdiaghg work? I assumed that the S
>     matrix should be the identity for local norm conserving pseudos
>     and GGA xc functionals, but if I enforce it to be the identity at
>     the begining of the subroutine the code is no more able to
>     converge any calculation (even in the scf, where now it works). I
>     am a bit skeptical thinking that this is just an error of LAPACK
>     or MPI: why does SCF with the same input (that should solve the
>     same problem as the nscf but many times) works very well (even
>     with many atoms and even if I ask many bands)?
>
>     Bests,
>
>     Lorenzo
>
>
>     On 12/06/19 14:37, Paolo Giannozzi wrote:
>>     I was about to write the same, before noticing that the crash
>>     occurs randomly (one run completes, a subsequent one doesn't).
>>     Unless some regularity is found (that is: under conditions xyz,
>>     the code always crashes) it will be impossible to locate the
>>     origin of the problem. Note that the origin of the problem might
>>     well be in mathematical libraries, or in MPI. I am 100% sure that
>>     in at least some cases diagonalization failures were due to some
>>     misbehavior of mathematical libraries (but this was many years
>>     ago, on machines that do not exist any longer). Also: a frequent
>>     source of random crashes in parallel execution is explained in
>>     sec.7.3 of the developer manual,
>>     http://www.quantum-espresso.org/Doc/developer_man/developer_man.html#SECTION00080000000000000000
>>
>>     Paolo
>>
>>     On Wed, Jun 12, 2019 at 2:03 PM Davide Ceresoli
>>     <davide.ceresoli at cnr.it <mailto:davide.ceresoli at cnr.it>> wrote:
>>
>>         Dear Lorenzo,
>>              is your QE compiled with a decent compiler and with
>>         decent libraries?
>>         Your inputs works perfectly for me, with no crashes.
>>
>>         HTH.
>>         D.
>>
>>
>>
>>         On 6/12/19 12:29 PM, Lorenzo Monacelli wrote:
>>         > Dear QE developers,
>>         >
>>         > I think I found a bad bug in the non self-consistent
>>         calculation of pw.x
>>         >
>>         > While the self consistent calculation ends properly, when
>>         running a non
>>         > self-consistent calculation results in a crash with the error:
>>         >
>>         >
>>          %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
>>         >       task #         0
>>         >       from cdiaghg : error #        40
>>         >       S matrix not positive definite
>>         >
>>          %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
>>         >
>>         > I checked the cdiaghg subroutine, the S matrix should be
>>         the overlap matrix for
>>         > the eigenvalue problem Hv = eSv
>>         >
>>         > That, in case of local Norm Conserving pseudo of Hydrogen
>>         (my calculation)  I
>>         > suppose it should be the identity, however, if I enforce it
>>         to be the indentity
>>         > at the beginning of cdiaghg the code says that it is not
>>         able to converge the
>>         > scf caclulation either.
>>         >
>>         > I attach the input of the scf calculation (that converges)
>>         and the one of the
>>         > non-self-consistent calculation (that produces this output).
>>         >
>>         > I also tried to switch the diagonalization method to cg as
>>         suggested as fix, but
>>         > nothing changes.
>>         >
>>         > I modified also the cdiaghg subroutine, to print the S
>>         matrix, that you find
>>         > attached (random numbers, seems to be uninitialized).
>>         >
>>         > In both the diagonalization methods if I enforce S to be
>>         the identity matrix the
>>         > code crashes by saying that it was not able to converge.
>>         >
>>         > The problem seems to arise especially if I request for more
>>         bands with the nbnd
>>         > flag in system (but sometimes it occurs even if no extra
>>         band is required).
>>         >
>>         > The QE version I used is the current version in the develop
>>         branch of gitlab,
>>         > but I noticed the same error occurring also with 6.3 and
>>         6.2 in other cases.
>>         >
>>         > If I ask for exactly the same input file a scf calculation
>>         (instead of a nscf)
>>         > everything goes fine (same K points, same diagonalization,
>>         same number of
>>         > extrabands), but indeed, this is not what I would like to do...
>>         >
>>         > I I run the nscf calculation after a scf calculation with
>>         exactly the same input
>>         > (that works), the nscf calculation fails (this means that
>>         the crash is not
>>         > caused by a bad starting point for the density).
>>         >
>>         > All these make me really think of a bug in the nscf
>>         calculation, rather than a
>>         > wrong input.
>>         >
>>         > Best regards,
>>         >
>>         > Lorenzo Monacelli
>>         >
>>         >
>>         > P.S.
>>         >
>>         > In the attached file the pw_* are the nscf input and
>>         output, the scf* are the
>>         > scf input and output. I run
>>         >
>>         >
>>         >
>>         > _______________________________________________
>>         > developers mailing list
>>         > developers at lists.quantum-espresso.org
>>         <mailto:developers at lists.quantum-espresso.org>
>>         > https://lists.quantum-espresso.org/mailman/listinfo/developers
>>         >
>>
>>         -- 
>>         +--------------------------------------------------------------+
>>            Davide Ceresoli
>>            CNR Institute of Molecular Science and Technology (CNR-ISTM)
>>            c/o University of Milan, via Golgi 19, 20133 Milan, Italy
>>            Email: davide.ceresoli at cnr.it <mailto:davide.ceresoli at cnr.it>
>>            Phone: +39-02-50314276, +39-347-1001570 (mobile)
>>            Skype: dceresoli
>>            Website: http://sites.google.com/site/dceresoli/
>>         +--------------------------------------------------------------+
>>         _______________________________________________
>>         developers mailing list
>>         developers at lists.quantum-espresso.org
>>         <mailto:developers at lists.quantum-espresso.org>
>>         https://lists.quantum-espresso.org/mailman/listinfo/developers
>>
>>
>>
>>     -- 
>>     Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e Fisiche,
>>     Univ. Udine, via delle Scienze 208, 33100 Udine, Italy
>>     Phone +39-0432-558216, fax +39-0432-558222
>>
>>
>>     _______________________________________________
>>     developers mailing list
>>     developers at lists.quantum-espresso.org  <mailto:developers at lists.quantum-espresso.org>
>>     https://lists.quantum-espresso.org/mailman/listinfo/developers
>     _______________________________________________
>     developers mailing list
>     developers at lists.quantum-espresso.org
>     <mailto:developers at lists.quantum-espresso.org>
>     https://lists.quantum-espresso.org/mailman/listinfo/developers
>
>
>
> -- 
> Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e Fisiche,
> Univ. Udine, via delle Scienze 208, 33100 Udine, Italy
> Phone +39-0432-558216, fax +39-0432-558222
>
>
> _______________________________________________
> developers mailing list
> developers at lists.quantum-espresso.org
> https://lists.quantum-espresso.org/mailman/listinfo/developers
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/developers/attachments/20190613/98a5b01c/attachment-0001.html>