[Q-e-developers] error in restarting spin-polarized SCF with QE 5.1.1

Paolo Giannozzi paolo.giannozzi at uniud.it
Thu Nov 27 22:34:39 CET 2014


I can't reproduce the problem you mention (at least not
with a smaller cell size: 50 a.u. doesn't fit into my PC).
There are some "fluctuations" in the number of iterations
that suggest that the restarting algorithm may not be perfect,
but I don't get any error (at least in serial and parallel
execution on 2 processors).

P.

On Wed, 2014-11-26 at 15:37 -0600, Marco Govoni wrote:
> Hi, 
> 
> Same problem on a O2 molecule (nspin = 2). 
> 
> The problem shows up when nspin = 2 and the SCF is interrupted after >= 3 iterations. 
> For example if max_seconds is set to let the code terminate (cleanly) before iteration 4 of the SCF loop is done, the restart will try to read kpoint 3 instead of SCF iteration 3. But this is a gamma_only simulation + spin, so only 2 kpoints are present. Crash.  
> If I reduce max_seconds and I let the code terminate (cleanly) before iteration 3 is done, the restart tries to read kpoint 2 which is possible in this case, there is no crash error. Not sure what is read is good though. 
> 
> Alright here is the input, scaled down to a molecule for fast debugging. Please try to play with max_seconds and restart_mode, so that you start from_scratch and have a clean interruption after 3 completed SCF iterations and then try to restart from there. 
> 
> Thanks for you support. 
> 
> Marco 
> 
> 
> !
> !  Input for O2 molecule triplet ground state.   At the experimental geometry 1.212 angstroms
> !
> !  Either can fix occupations and the spin state through: tot_magnetization = 2
> !     or
> !  can allow system to have flexibility to find local minimum spin state by commenting out tot_magnetization and then ucommenting the following:
> !   occupations = 'smearing',
> !   degauss     = 0.01D0,
> !   smearing    = 'gauss',
> !   starting_magnetization(1)=0.7,
> !   starting_magnetization(2)=0.7,
> !
> !
> &CONTROL
>    calculation  = 'scf',
>    verbosity    = 'high',
>    outdir       = './',
>    pseudo_dir   = './'
>    prefix       = 'O2-triplet-PBE-SCF-tm80',
>    max_seconds  = 60
>    restart_mode = 'restart'
> /
> &SYSTEM
>    nosym       = .TRUE.,
>    ibrav       = 1,
>    celldm(1)   = 50.d0,
>    nspin       = 2,
>    nat         = 2,
>    ntyp        = 2,
>    ecutwfc     = 80,
>    tot_magnetization = 2,
>    nbnd        = 10,
> !   occupations = 'smearing',
> !   degauss     = 0.01D0,
> !   smearing    = 'gauss',
> !   starting_magnetization(1)=0.7,
> !   starting_magnetization(2)=0.7,
> /
> &ELECTRONS
>    conv_thr    = 1.D-6,
>    mixing_beta = 0.5D0,
> /
> ATOMIC_SPECIES
>  O1   15.999   O.pbe-mt.UPF
>  O2   15.999   O.pbe-mt.UPF
> ATOMIC_POSITIONS { bohr }
>  O1        0.000000000   0.000000000   0.000000000
>  O2        2.400000000   0.000000000   0.000000000
> K_POINTS { gamma }
> 
> 
> 
> --
> ----------------------------
> Marco Govoni, Ph.D.
> ----------------------------
> Institute for Molecular Engineering 
> The University of Chicago
> 5747 South Ellis Avenue 
> Chicago, IL 60637 
> http://galligroup.uchicago.edu/People/mgovoni.php
> ----------------------------
> 
> On Nov 26, 2014, at 12:43 PM, Marco Govoni <mgovoni at uchicago.edu> wrote:
> 
> > Hi, 
> > 
> > I have problems in restarting the SCF simulation. 
> > 
> > I’m running a spin-polarized (nspin=2) SCF simulation (besides task and diag, I’m not activating other parallelization levels than R&G division). 
> > I set max_seconds and from_scratch, yielding a clean interruption of the SCF (unconverged) cycle (few iterations only are done). 
> > Then when I restart, the code crashes giving the follow message. 
> > 
> >     Calculation restarted from scf iteration #     4
> > 
> >     total cpu time spent up to now is       14.3 secs
> > 
> >     per-process dynamical memory:   131.3 Mb
> > 
> >     Self-consistent Calculation
> > 
> >     iteration #  4     ecut=   120.00 Ry     beta=0.20
> >     Calculation restarted from kpoint #     3
> >     Davidson diagonalization with overlap
> >     ethr =  1.00E-02,  avg # of iterations = 11.0
> > Application 228717 exit codes: 134
> > Application 228717 exit signals: Killed
> > 
> > This is a gamma_only simulation so there must be a typo in “ kpoint # 3 “, maybe it is scf iteration. 
> > Plus the code exists and a trace of the errors gives 
> > 
> > pw.x               0000000001C1CA89  Unknown               Unknown  Unknown
> > pw.x               0000000001C1B35E  Unknown               Unknown  Unknown
> > pw.x               0000000001BCF642  Unknown               Unknown  Unknown
> > pw.x               0000000001B4B998  Unknown               Unknown  Unknown
> > pw.x               0000000001B521B2  Unknown               Unknown  Unknown
> > pw.x               0000000000CCBED0  Unknown               Unknown  Unknown
> > pw.x               0000000000D787FB  Unknown               Unknown  Unknown
> > pw.x               0000000001C38131  Unknown               Unknown  Unknown
> > pw.x               0000000001A32B32  Unknown               Unknown  Unknown
> > pw.x               0000000001A25710  Unknown               Unknown  Unknown
> > pw.x               0000000001A258BD  Unknown               Unknown  Unknown
> > pw.x               00000000019E3D43  Unknown               Unknown  Unknown
> > pw.x               00000000007F4003  reduce_base_real_         223  mp_base.f90
> > pw.x               00000000007E1AFC  mp_mp_mp_sum_rt_         1382  mp.f90
> > pw.x               00000000005D17AC  sum_band_IP_sum_b         548  sum_band.f90
> > pw.x               00000000005C4982  sum_band_                 123  sum_band.f90
> > pw.x               00000000004786EF  electrons_scf_            478  electrons.f90
> > pw.x               0000000000475C0D  electrons_                133  electrons.f90
> > pw.x               00000000004011BC  run_pwscf_                 90  run_pwscf.f90
> > pw.x               0000000000401023  MAIN__                     30  pwscf.f90
> > pw.x               0000000000400F76  Unknown               Unknown  Unknown
> > pw.x               0000000001C31D81  Unknown               Unknown  Unknown
> > pw.x               0000000000400E41  Unknown               Unknown  Unknown
> > 
> > Let me know. 
> > 
> > Marco
> > 
> > 
> > --
> > ----------------------------
> > Marco Govoni, Ph.D.
> > ----------------------------
> > Institute for Molecular Engineering 
> > The University of Chicago
> > 5747 South Ellis Avenue 
> > Chicago, IL 60637 
> > http://galligroup.uchicago.edu/People/mgovoni.php
> > ----------------------------
> > 
> 
> 
> _______________________________________________
> Q-e-developers mailing list
> Q-e-developers at qe-forge.org
> http://qe-forge.org/mailman/listinfo/q-e-developers





More information about the developers mailing list