[Q-e-developers] error in restarting spin-polarized SCF with QE 5.1.1
Marco Govoni
mgovoni at uchicago.edu
Wed Nov 26 22:37:45 CET 2014
Hi,
Same problem on a O2 molecule (nspin = 2).
The problem shows up when nspin = 2 and the SCF is interrupted after >= 3 iterations.
For example if max_seconds is set to let the code terminate (cleanly) before iteration 4 of the SCF loop is done, the restart will try to read kpoint 3 instead of SCF iteration 3. But this is a gamma_only simulation + spin, so only 2 kpoints are present. Crash.
If I reduce max_seconds and I let the code terminate (cleanly) before iteration 3 is done, the restart tries to read kpoint 2 which is possible in this case, there is no crash error. Not sure what is read is good though.
Alright here is the input, scaled down to a molecule for fast debugging. Please try to play with max_seconds and restart_mode, so that you start from_scratch and have a clean interruption after 3 completed SCF iterations and then try to restart from there.
Thanks for you support.
Marco
!
! Input for O2 molecule triplet ground state. At the experimental geometry 1.212 angstroms
!
! Either can fix occupations and the spin state through: tot_magnetization = 2
! or
! can allow system to have flexibility to find local minimum spin state by commenting out tot_magnetization and then ucommenting the following:
! occupations = 'smearing',
! degauss = 0.01D0,
! smearing = 'gauss',
! starting_magnetization(1)=0.7,
! starting_magnetization(2)=0.7,
!
!
&CONTROL
calculation = 'scf',
verbosity = 'high',
outdir = './',
pseudo_dir = './'
prefix = 'O2-triplet-PBE-SCF-tm80',
max_seconds = 60
restart_mode = 'restart'
/
&SYSTEM
nosym = .TRUE.,
ibrav = 1,
celldm(1) = 50.d0,
nspin = 2,
nat = 2,
ntyp = 2,
ecutwfc = 80,
tot_magnetization = 2,
nbnd = 10,
! occupations = 'smearing',
! degauss = 0.01D0,
! smearing = 'gauss',
! starting_magnetization(1)=0.7,
! starting_magnetization(2)=0.7,
/
&ELECTRONS
conv_thr = 1.D-6,
mixing_beta = 0.5D0,
/
ATOMIC_SPECIES
O1 15.999 O.pbe-mt.UPF
O2 15.999 O.pbe-mt.UPF
ATOMIC_POSITIONS { bohr }
O1 0.000000000 0.000000000 0.000000000
O2 2.400000000 0.000000000 0.000000000
K_POINTS { gamma }
--
----------------------------
Marco Govoni, Ph.D.
----------------------------
Institute for Molecular Engineering
The University of Chicago
5747 South Ellis Avenue
Chicago, IL 60637
http://galligroup.uchicago.edu/People/mgovoni.php
----------------------------
On Nov 26, 2014, at 12:43 PM, Marco Govoni <mgovoni at uchicago.edu> wrote:
> Hi,
>
> I have problems in restarting the SCF simulation.
>
> I’m running a spin-polarized (nspin=2) SCF simulation (besides task and diag, I’m not activating other parallelization levels than R&G division).
> I set max_seconds and from_scratch, yielding a clean interruption of the SCF (unconverged) cycle (few iterations only are done).
> Then when I restart, the code crashes giving the follow message.
>
> Calculation restarted from scf iteration # 4
>
> total cpu time spent up to now is 14.3 secs
>
> per-process dynamical memory: 131.3 Mb
>
> Self-consistent Calculation
>
> iteration # 4 ecut= 120.00 Ry beta=0.20
> Calculation restarted from kpoint # 3
> Davidson diagonalization with overlap
> ethr = 1.00E-02, avg # of iterations = 11.0
> Application 228717 exit codes: 134
> Application 228717 exit signals: Killed
>
> This is a gamma_only simulation so there must be a typo in “ kpoint # 3 “, maybe it is scf iteration.
> Plus the code exists and a trace of the errors gives
>
> pw.x 0000000001C1CA89 Unknown Unknown Unknown
> pw.x 0000000001C1B35E Unknown Unknown Unknown
> pw.x 0000000001BCF642 Unknown Unknown Unknown
> pw.x 0000000001B4B998 Unknown Unknown Unknown
> pw.x 0000000001B521B2 Unknown Unknown Unknown
> pw.x 0000000000CCBED0 Unknown Unknown Unknown
> pw.x 0000000000D787FB Unknown Unknown Unknown
> pw.x 0000000001C38131 Unknown Unknown Unknown
> pw.x 0000000001A32B32 Unknown Unknown Unknown
> pw.x 0000000001A25710 Unknown Unknown Unknown
> pw.x 0000000001A258BD Unknown Unknown Unknown
> pw.x 00000000019E3D43 Unknown Unknown Unknown
> pw.x 00000000007F4003 reduce_base_real_ 223 mp_base.f90
> pw.x 00000000007E1AFC mp_mp_mp_sum_rt_ 1382 mp.f90
> pw.x 00000000005D17AC sum_band_IP_sum_b 548 sum_band.f90
> pw.x 00000000005C4982 sum_band_ 123 sum_band.f90
> pw.x 00000000004786EF electrons_scf_ 478 electrons.f90
> pw.x 0000000000475C0D electrons_ 133 electrons.f90
> pw.x 00000000004011BC run_pwscf_ 90 run_pwscf.f90
> pw.x 0000000000401023 MAIN__ 30 pwscf.f90
> pw.x 0000000000400F76 Unknown Unknown Unknown
> pw.x 0000000001C31D81 Unknown Unknown Unknown
> pw.x 0000000000400E41 Unknown Unknown Unknown
>
> Let me know.
>
> Marco
>
>
> --
> ----------------------------
> Marco Govoni, Ph.D.
> ----------------------------
> Institute for Molecular Engineering
> The University of Chicago
> 5747 South Ellis Avenue
> Chicago, IL 60637
> http://galligroup.uchicago.edu/People/mgovoni.php
> ----------------------------
>
More information about the developers
mailing list