[QE-developers] Bug in non self-consistent calculation
Lorenzo Monacelli
mesonepigreco at gmail.com
Thu Jun 13 16:04:22 CEST 2019
Thank you Paolo for the explanation. I misunderstood what that function
was doing.
Still the thing that makes me worry is that my workaround to have band
structure with many bands and many atoms that crashes with S matrix
error in nscf calculation is to run the same input file as with
calculation = "scf". It takes 10 times more (it does almost 10
iterations to converge the self-consistency) but it never experience
the S matrix issue... this looks strange. As far as I understand, this
numerical issue should affect the diagonalization in the same way both
the self consistent and the non self consistent calculation.
Bests,
Lorenzo
On 12/06/19 22:20, Paolo Giannozzi wrote:
> - The overlap matrix is not the identity matrix because corrections
> vectors are not orthogonal to trial vectors and not orthogonal among them.
> - Iterative algorithm is not the best way to solve for many
> eigenvectors: it is devised to solve a number of eigenvalues that is a
> small fraction of the matrix dimension. The more eigenvalues, the less
> convenient and the more unstable it becomes.
> - I am quite sure that there is no "true" bug (uninitialized variables
> and the like) and that the algorithm is "analytically" correct, so to
> speak. Under some circumstances, the overlap matrix has a very small
> negative eigenvalue. An "analytically computed" overlap matrix can't
> have a negative eigenvalue, by construction. One can find a
> workaround, but you need one for each of the many cases: real,
> hermitian, with serial or parallel subspace diagonalization, ... I
> would prefer to understand what triggers the appearance of a negative
> eigenvalue, but it is not that simple.
>
> Paolo
> On Wed, Jun 12, 2019 at 3:55 PM Lorenzo Monacelli
> <mesonepigreco at gmail.com <mailto:mesonepigreco at gmail.com>> wrote:
>
> Dear all,
>
> Thank you for your replies.
>
> I have two different versions on two different machines. The one I
> sent you my results was compiled with gfortran and standard
> lapack/blas as provided from the ubuntu-software library (16.04).
> The other is compiled with the intel compiler and MKL and runs on
> a cluster. Both of them experienced the issue quite randomly. I
> attach my make.inc of both compilations.
>
> The same issue, at least on a different input file, was
> experienced by the pw.x (v 6.2) already pre-installed in the
> Spanish MARENOSTRUM cluster, that I assume was correctly compiled.
>
> I noticed that a way to reproduce the error is asking for many
> bands in the nscf calculation in a system with many atoms (with
> few symmetries) in the cell ( with 96 atoms it almost impossible
> for me to run a nscf calculation).
>
> It is possible that the different behavior on different machines
> is actually suggesting that the bug could be located in some
> variable ill-initialized (that its automatic initialization is
> maybe demanded to the compiler)?
>
> Another question: How does cdiaghg work? I assumed that the S
> matrix should be the identity for local norm conserving pseudos
> and GGA xc functionals, but if I enforce it to be the identity at
> the begining of the subroutine the code is no more able to
> converge any calculation (even in the scf, where now it works). I
> am a bit skeptical thinking that this is just an error of LAPACK
> or MPI: why does SCF with the same input (that should solve the
> same problem as the nscf but many times) works very well (even
> with many atoms and even if I ask many bands)?
>
> Bests,
>
> Lorenzo
>
>
> On 12/06/19 14:37, Paolo Giannozzi wrote:
>> I was about to write the same, before noticing that the crash
>> occurs randomly (one run completes, a subsequent one doesn't).
>> Unless some regularity is found (that is: under conditions xyz,
>> the code always crashes) it will be impossible to locate the
>> origin of the problem. Note that the origin of the problem might
>> well be in mathematical libraries, or in MPI. I am 100% sure that
>> in at least some cases diagonalization failures were due to some
>> misbehavior of mathematical libraries (but this was many years
>> ago, on machines that do not exist any longer). Also: a frequent
>> source of random crashes in parallel execution is explained in
>> sec.7.3 of the developer manual,
>> http://www.quantum-espresso.org/Doc/developer_man/developer_man.html#SECTION00080000000000000000
>>
>> Paolo
>>
>> On Wed, Jun 12, 2019 at 2:03 PM Davide Ceresoli
>> <davide.ceresoli at cnr.it <mailto:davide.ceresoli at cnr.it>> wrote:
>>
>> Dear Lorenzo,
>> is your QE compiled with a decent compiler and with
>> decent libraries?
>> Your inputs works perfectly for me, with no crashes.
>>
>> HTH.
>> D.
>>
>>
>>
>> On 6/12/19 12:29 PM, Lorenzo Monacelli wrote:
>> > Dear QE developers,
>> >
>> > I think I found a bad bug in the non self-consistent
>> calculation of pw.x
>> >
>> > While the self consistent calculation ends properly, when
>> running a non
>> > self-consistent calculation results in a crash with the error:
>> >
>> >
>> %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
>> > task # 0
>> > from cdiaghg : error # 40
>> > S matrix not positive definite
>> >
>> %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
>> >
>> > I checked the cdiaghg subroutine, the S matrix should be
>> the overlap matrix for
>> > the eigenvalue problem Hv = eSv
>> >
>> > That, in case of local Norm Conserving pseudo of Hydrogen
>> (my calculation) I
>> > suppose it should be the identity, however, if I enforce it
>> to be the indentity
>> > at the beginning of cdiaghg the code says that it is not
>> able to converge the
>> > scf caclulation either.
>> >
>> > I attach the input of the scf calculation (that converges)
>> and the one of the
>> > non-self-consistent calculation (that produces this output).
>> >
>> > I also tried to switch the diagonalization method to cg as
>> suggested as fix, but
>> > nothing changes.
>> >
>> > I modified also the cdiaghg subroutine, to print the S
>> matrix, that you find
>> > attached (random numbers, seems to be uninitialized).
>> >
>> > In both the diagonalization methods if I enforce S to be
>> the identity matrix the
>> > code crashes by saying that it was not able to converge.
>> >
>> > The problem seems to arise especially if I request for more
>> bands with the nbnd
>> > flag in system (but sometimes it occurs even if no extra
>> band is required).
>> >
>> > The QE version I used is the current version in the develop
>> branch of gitlab,
>> > but I noticed the same error occurring also with 6.3 and
>> 6.2 in other cases.
>> >
>> > If I ask for exactly the same input file a scf calculation
>> (instead of a nscf)
>> > everything goes fine (same K points, same diagonalization,
>> same number of
>> > extrabands), but indeed, this is not what I would like to do...
>> >
>> > I I run the nscf calculation after a scf calculation with
>> exactly the same input
>> > (that works), the nscf calculation fails (this means that
>> the crash is not
>> > caused by a bad starting point for the density).
>> >
>> > All these make me really think of a bug in the nscf
>> calculation, rather than a
>> > wrong input.
>> >
>> > Best regards,
>> >
>> > Lorenzo Monacelli
>> >
>> >
>> > P.S.
>> >
>> > In the attached file the pw_* are the nscf input and
>> output, the scf* are the
>> > scf input and output. I run
>> >
>> >
>> >
>> > _______________________________________________
>> > developers mailing list
>> > developers at lists.quantum-espresso.org
>> <mailto:developers at lists.quantum-espresso.org>
>> > https://lists.quantum-espresso.org/mailman/listinfo/developers
>> >
>>
>> --
>> +--------------------------------------------------------------+
>> Davide Ceresoli
>> CNR Institute of Molecular Science and Technology (CNR-ISTM)
>> c/o University of Milan, via Golgi 19, 20133 Milan, Italy
>> Email: davide.ceresoli at cnr.it <mailto:davide.ceresoli at cnr.it>
>> Phone: +39-02-50314276, +39-347-1001570 (mobile)
>> Skype: dceresoli
>> Website: http://sites.google.com/site/dceresoli/
>> +--------------------------------------------------------------+
>> _______________________________________________
>> developers mailing list
>> developers at lists.quantum-espresso.org
>> <mailto:developers at lists.quantum-espresso.org>
>> https://lists.quantum-espresso.org/mailman/listinfo/developers
>>
>>
>>
>> --
>> Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e Fisiche,
>> Univ. Udine, via delle Scienze 208, 33100 Udine, Italy
>> Phone +39-0432-558216, fax +39-0432-558222
>>
>>
>> _______________________________________________
>> developers mailing list
>> developers at lists.quantum-espresso.org <mailto:developers at lists.quantum-espresso.org>
>> https://lists.quantum-espresso.org/mailman/listinfo/developers
> _______________________________________________
> developers mailing list
> developers at lists.quantum-espresso.org
> <mailto:developers at lists.quantum-espresso.org>
> https://lists.quantum-espresso.org/mailman/listinfo/developers
>
>
>
> --
> Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e Fisiche,
> Univ. Udine, via delle Scienze 208, 33100 Udine, Italy
> Phone +39-0432-558216, fax +39-0432-558222
>
>
> _______________________________________________
> developers mailing list
> developers at lists.quantum-espresso.org
> https://lists.quantum-espresso.org/mailman/listinfo/developers
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/developers/attachments/20190613/98a5b01c/attachment-0001.html>
More information about the developers
mailing list