[QE-developers] Bug in non self-consistent calculation

Paolo Giannozzi p.giannozzi at gmail.com
Wed Jun 12 22:20:28 CEST 2019


- The overlap matrix is not the identity matrix because corrections vectors
are not orthogonal to trial vectors and not orthogonal among them.
- Iterative algorithm is not the best way to solve for many eigenvectors:
it is devised to solve a number of eigenvalues that is a small fraction of
the matrix dimension. The more eigenvalues, the less convenient and the
more unstable it becomes.
- I am quite sure that there is no "true" bug (uninitialized variables and
the like) and that the algorithm is "analytically" correct, so to speak.
Under some circumstances, the overlap matrix has a very small negative
eigenvalue. An "analytically computed" overlap matrix can't have a negative
eigenvalue, by construction. One can find a workaround, but you need one
for each of the many cases: real, hermitian, with serial or parallel
subspace diagonalization, ... I would prefer to understand what triggers
the appearance of a negative eigenvalue, but it is not that simple.

Paolo
On Wed, Jun 12, 2019 at 3:55 PM Lorenzo Monacelli <mesonepigreco at gmail.com>
wrote:

> Dear all,
>
> Thank you for your replies.
>
> I have two different versions on two different machines. The one I sent
> you my results was compiled with gfortran and standard lapack/blas as
> provided from the ubuntu-software library (16.04). The other is compiled
> with the intel compiler and MKL and runs on a cluster. Both of them
> experienced the issue quite randomly. I attach my make.inc of both
> compilations.
>
> The same issue, at least on a different input file, was experienced by the
> pw.x (v 6.2) already pre-installed in the Spanish MARENOSTRUM cluster, that
> I assume was correctly compiled.
>
> I noticed that a way to reproduce the error is asking for many bands in
> the nscf calculation in a system with many atoms (with few symmetries) in
> the cell ( with 96 atoms it almost impossible for me to run a nscf
> calculation).
>
> It is possible that the different behavior on different machines is
> actually suggesting that the bug could be located in some variable
> ill-initialized (that its automatic initialization is maybe demanded to the
> compiler)?
>
> Another question: How does cdiaghg work? I assumed that the S matrix
> should be the identity for local norm conserving pseudos and GGA xc
> functionals, but if I enforce it to be the identity at the begining of the
> subroutine the code is no more able to converge any calculation (even in
> the scf, where now it works). I am a bit skeptical thinking that this is
> just an error of LAPACK or MPI: why does SCF with the same input (that
> should solve the same problem as the nscf but many times) works very well
> (even with many atoms and even if I ask many bands)?
>
> Bests,
>
> Lorenzo
>
>
> On 12/06/19 14:37, Paolo Giannozzi wrote:
>
> I was about to write the same, before noticing that the crash occurs
> randomly (one run completes, a subsequent one doesn't). Unless some
> regularity is found (that is: under conditions xyz, the code always
> crashes) it will be impossible to locate the origin of the problem. Note
> that the origin of the problem might well be in mathematical libraries, or
> in MPI. I am 100% sure that in at least some cases diagonalization failures
> were due to some misbehavior of mathematical libraries (but this was many
> years ago, on machines that do not exist any longer). Also: a frequent
> source of random crashes in parallel execution is explained in sec.7.3 of
> the developer manual,
> http://www.quantum-espresso.org/Doc/developer_man/developer_man.html#SECTION00080000000000000000
>
> Paolo
>
> On Wed, Jun 12, 2019 at 2:03 PM Davide Ceresoli <davide.ceresoli at cnr.it>
> wrote:
>
>> Dear Lorenzo,
>>      is your QE compiled with a decent compiler and with decent libraries?
>> Your inputs works perfectly for me, with no crashes.
>>
>> HTH.
>> D.
>>
>>
>>
>> On 6/12/19 12:29 PM, Lorenzo Monacelli wrote:
>> > Dear QE developers,
>> >
>> > I think I found a bad bug in the non self-consistent calculation of pw.x
>> >
>> > While the self consistent calculation ends properly, when running a non
>> > self-consistent calculation results in a crash with the error:
>> >
>> >
>>  %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
>> >       task #         0
>> >       from cdiaghg : error #        40
>> >       S matrix not positive definite
>> >
>>  %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
>> >
>> > I checked the cdiaghg subroutine, the S matrix should be the overlap
>> matrix for
>> > the eigenvalue problem Hv = eSv
>> >
>> > That, in case of local Norm Conserving pseudo of Hydrogen (my
>> calculation)  I
>> > suppose it should be the identity, however, if I enforce it to be the
>> indentity
>> > at the beginning of cdiaghg the code says that it is not able to
>> converge the
>> > scf caclulation either.
>> >
>> > I attach the input of the scf calculation (that converges) and the one
>> of the
>> > non-self-consistent calculation (that produces this output).
>> >
>> > I also tried to switch the diagonalization method to cg as suggested as
>> fix, but
>> > nothing changes.
>> >
>> > I modified also the cdiaghg subroutine, to print the S matrix, that you
>> find
>> > attached (random numbers, seems to be uninitialized).
>> >
>> > In both the diagonalization methods if I enforce S to be the identity
>> matrix the
>> > code crashes by saying that it was not able to converge.
>> >
>> > The problem seems to arise especially if I request for more bands with
>> the nbnd
>> > flag in system (but sometimes it occurs even if no extra band is
>> required).
>> >
>> > The QE version I used is the current version in the develop branch of
>> gitlab,
>> > but I noticed the same error occurring also with 6.3 and 6.2 in other
>> cases.
>> >
>> > If I ask for exactly the same input file a scf calculation (instead of
>> a nscf)
>> > everything goes fine (same K points, same diagonalization, same number
>> of
>> > extrabands), but indeed, this is not what I would like to do...
>> >
>> > I I run the nscf calculation after a scf calculation with exactly the
>> same input
>> > (that works), the nscf calculation fails (this means that the crash is
>> not
>> > caused by a bad starting point for the density).
>> >
>> > All these make me really think of a bug in the nscf calculation, rather
>> than a
>> > wrong input.
>> >
>> > Best regards,
>> >
>> > Lorenzo Monacelli
>> >
>> >
>> > P.S.
>> >
>> > In the attached file the pw_* are the nscf input and output, the scf*
>> are the
>> > scf input and output. I run
>> >
>> >
>> >
>> > _______________________________________________
>> > developers mailing list
>> > developers at lists.quantum-espresso.org
>> > https://lists.quantum-espresso.org/mailman/listinfo/developers
>> >
>>
>> --
>> +--------------------------------------------------------------+
>>    Davide Ceresoli
>>    CNR Institute of Molecular Science and Technology (CNR-ISTM)
>>    c/o University of Milan, via Golgi 19, 20133 Milan, Italy
>>    Email: davide.ceresoli at cnr.it
>>    Phone: +39-02-50314276, +39-347-1001570 (mobile)
>>    Skype: dceresoli
>>    Website: http://sites.google.com/site/dceresoli/
>> +--------------------------------------------------------------+
>> _______________________________________________
>> developers mailing list
>> developers at lists.quantum-espresso.org
>> https://lists.quantum-espresso.org/mailman/listinfo/developers
>>
>
>
> --
> Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e Fisiche,
> Univ. Udine, via delle Scienze 208, 33100 Udine, Italy
> Phone +39-0432-558216, fax +39-0432-558222
>
>
> _______________________________________________
> developers mailing listdevelopers at lists.quantum-espresso.orghttps://lists.quantum-espresso.org/mailman/listinfo/developers
>
> _______________________________________________
> developers mailing list
> developers at lists.quantum-espresso.org
> https://lists.quantum-espresso.org/mailman/listinfo/developers
>


-- 
Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e Fisiche,
Univ. Udine, via delle Scienze 208, 33100 Udine, Italy
Phone +39-0432-558216, fax +39-0432-558222
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/developers/attachments/20190612/d4c7d027/attachment.html>


More information about the developers mailing list