[QE-users] scf convergency depends on the number of mpi processes !!!

Tue Nov 29 06:01:25 CET 2022

Dear Paolo,

Hi.

Thank you for your comments.

Your comments mean that the "small numerical differences" may cause to 
system converge to different metastable states in DFT+U, as is my case.

To solve this problem, I compiled the QE-7.1 with the following config.:

"./configure FC=ifort F77=ifort MPIF90=mpif90 CC=icc CXX=icpc F90FLAGS=-O0 
FCFLAGS=-O0 FFLAGS=-O0 CFLAGS=-O0"

and the results for "-np 2" and "-np 8" becomes very much similar (slightly 
different), and so maybe acceptable.

However, I also tried compiling using the following command to increase 
precision:

./configure FC=ifort F77=ifort MPIF90=mpif90 CC=icc CXX=icpc F90FLAGS="-O0 
-double-size 128" FCFLAGS="-O0 -double-size 128" FFLAGS="-O0 -double-size 
128" CFLAGS="-O0"

and during "make pw" I get the following error message:

mpif90 -O0 -double-size 128 -fpp -allow nofpp_comments -D__DFTI -D__MPI 
-D__SCALAPACK  -I/home/mahmoud/QE71O0-double_128/external/devxlib/src -I. 
-I/home/mahmoud/QE71O0-double_128/include 
-I/home/mahmoud/QE71O0-double_128/FoX/finclude  
-I/opt/intel/2017.8/compilers_and_libraries_2017.8.262/linux/mkl/include  -c 
cdiaghg.f90
cdiaghg.f90(540): error #6285: There is no matching specific subroutine for 
this generic subroutine call.   [SQR_SETMAT]
     CALL sqr_setmat( 'U', n, ( 0.D0, 0.D0 ), ss, size(ss,1), idesc )
     ----------^
     cdiaghg.f90(559): error #6285: There is no matching specific subroutine 
for this generic subroutine call.   [SQR_MM_CANNON]
          CALL sqr_mm_cannon( 'N', 'N', n, ( 1.D0, 0.D0 ), ss, nx, hh, nx, ( 
0.D0, 0.D0 ), v, nx, idesc )
          ----------^
          cdiaghg.f90(567): error #6285: There is no matching specific 
subroutine for this generic subroutine call.   [SQR_MM_CANNON]
               CALL sqr_mm_cannon( 'N', 'C', n, ( 1.D0, 0.D0 ), v, nx, ss, 
nx, ( 0.D0, 0.D0 ), hh, nx, idesc )
               ----------^
               cdiaghg.f90(572): error #6285: There is no matching specific 
subroutine for this generic subroutine call.   [SQR_SETMAT]
                    CALL sqr_setmat( 'H', n, ( 0.D0, 0.D0 ), hh, size(hh,1), 
idesc )
                    ----------^
                    cdiaghg.f90(607): error #6285: There is no matching 
specific subroutine for this generic subroutine call.   [SQR_MM_CANNON]
                         CALL sqr_mm_cannon( 'C', 'N', n, ( 1.D0, 0.D0 ), 
ss, nx, hh, nx, ( 0.D0, 0.D0 ), v, nx, idesc )
                         ----------^
                         cdiaghg.f90(431): warning #6843: A dummy argument 
with an explicit INTENT(OUT) declaration is not given an explicit value.   
[V]
                         SUBROUTINE laxlib_pcdiaghg( n, h, s, ldh, e, v, 
idesc )
                         ---------------------------------------------^
                         compilation aborted for cdiaghg.f90 (code 1)
                         make[1]: *** [cdiaghg.o] Error 1
                         make[1]: Leaving directory 
`/home/mahmoud/QE71O0-double_128/LAXlib'

So, I could not succeed here.

Thirdly, from the QE development site at gillab I downloaded the code 
"q-e-mixed_precision" with the hope that maybe it is somehow a cure. It was 
QE-6.5 and developed by Carlo Cavazzoni. The compilation was as usual 
(without using "-O0" flags), and did not find any improvement over the 
default configuration.

Could you please give me an advice on how can I compile PWscf with any 
desired higher precisions than the default?

Thank you so much.

Best regards,

Mahmoud

In systems with a difficult self-consistency, it is possible that the
small numerical differences, coming from execution on different numbers
of processors, are sufficient to drive the system towards
non-convergence, or convergence towards a different final state (DFT+U
seems to be especially unstable in this respect).

If you are using some exotic, little tested feature, it is conceivable
that some overlooked bug in parallelization exists. In this case, the
problem is easily reproducible and clearly bound to a specific feature,
though.

I do not see any other possibility

Paolo

On 28/11/2022 17:05, Mahmoud Payami Shabestari via users wrote:
> Hi All,
> I am experiencing that the success in scf cycle depends on the number of
> processes used in mpirun. That is, for example, for a given scf input
> when I use "mpirun -np *2* pw.x < input.in",
> it gives the converged result in a reasonable number of iteration; but,
> on the other hand, if I instead use "mpirun -np *8* pw.x < input.in",
> the system does not converge numerically.
> First I thought it is somehow a problem of numerical error accumulation.
> So, I used in compilation of QE the FFTW3 as prescribed in the manual.
> And even used "-lfftw3l" (long-double) to cure this problem, but some
> other numerical problems happened in vc-relax job.
> I would appreciate if anybody helps to understand the roots of this
> dependency and how to control it.
> Bests,
> Mahmoud Payami
> NSTRI, AEOI, Tehran, Iran
> Email: mpayami at aeoi.org.ir
> Phone: +98 (0)21 82066504
> --------------------------------------------------------
>
> _______________________________________________
> The Quantum ESPRESSO community stands by the Ukrainian
> people and expresses its concerns about the devastating
> effects that the Russian military offensive has on their
> country and on the free and peaceful scientific, cultural,
> and economic cooperation amongst peoples
> _______________________________________________
> Quantum ESPRESSO is supported by MaX (www.max-centre.eu)
> users mailing list users at lists.quantum-espresso.org
> https://lists.quantum-espresso.org/mailman/listinfo/users

--
Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e Fisiche,
Univ. Udine, via delle Scienze 206, 33100 Udine Italy, +39-0432-558216
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20221129/5be8cc26/attachment.html>