[Pw_forum] WFC convergence in NEB calculation

Mon Oct 13 21:54:45 CEST 2008

J K wrote:

>> What else could one try, to get the wfc somehow converged?

Paolo Giannozzi wrote:

>hard to say (especially withouth the ouput). Are you sure that spin  
>polarization
>is not a source of trouble close to the transition state?

I have not tried this up to now, because i would suspect that in a reaction 
where a proton/hydride transfer is involved most species should be more like 
ions and not radicals (of course i might be plain wrong here). 
I was also concerned about this aspect, but when i did a quick and 
rough test calculation, the wfc convergence was not particularly better, and 
one SCF step took a lot longer time. Therefore, i just killed it impatiently 
iirc.

Paolo Giannozzi wrote:

>One possible trick could be to go on with the calculation even if not  
>converged.
>I am actually considering adding yet another option  
>("sloppy_convergence"?)
>doing exactly this. It should be easy: in PW/electrons.f90, set  
>"conv_elec" to
>.true. at the last iteration, after the call to "mix_rho". No warranty.

I have done this right after your first response, and recompiled the code with 
the modified routine. I set the maximum number of SCF steps to 150, expecting  
that either the wfc gonna converge in this 150 iterations, or i get 
the forces with that wfc what i have in the given moment after 
150 step. Insted of this i just got a crash and a core dump.

For my case this workaround is not really good anyway, because the wfc is so 
bad (the total energy fluctuates in the very first decimal), that the forces 
would not really make any physical sense. 

J K wrote:

>> Another question: when i restart an NEB calculation, how can i  
>> restart the wfc
>> for those images, which were already converged for a given NEB  
>> iteration?
>> [...] even for those images, where the wfc was converged, i still  
>> need to
>> spend again like 5-9 SCF cycles/image.

Paolo Giannozzi wrote:

>are you sure that the code is using the same set of coordinates that  
>were used
>in the previous calculation? Maybe the restart doesn't work as expected.

I have looked into this, and i think you are completely right, because the 
code restarts the nuclear positions from the *.path file, which in a case of 
a crash contains the coordinates from the previous NEB iteration. I was 
thinking that it is updated after the convergence is achieved on any movable 
image, and not just after a successfull NEB optimization step.

I tried to use the cg scheme for those images where the wfc convergence was 
tricky, but it was not successfull. I am really afraid, that somehow i 
managed to misscompile the code, because if i try to crank up the density 
cutoff above 5 times the pwcutoff, the code crashes right after the wfc 
initialization in the wfc diagonalization. I think this is a really bad sign.
In fact, i completely forgot to mention, that i was able to produce a relative 
stable binary with mkl 8.1 only. With newer mkl versions in a geometry 
optimization after several successfull steps when the nuclei were close to 
the minimum  geometry the code crashed very often in chdiag. This was 
reproduceable for a TiO2 and a Cu slab as well. With mkl 10 it was a pain to 
actually produce somehow a binary (i had to modify some flags to get it 
working), but the binary crashed right after the wfc initialisation for all 
my test calculations. 
Than i have seen in the forum, that mkl 10 does not give any gain compared 
to older mkl versions, so i gave up on it. Of course, to clarify this i should 
attache all the config files, the machine architecture, the ifc compiler and 
mkl versions and so on. 
Apparently i am not alone having this issue on the new quad core Intel Xeon 
machines. For example, on dual Opteron i produced a rock solid binary with 
Atlas and with mkl 9.0 as well. Now someone could say, that i am not really 
supposed to use mkl on an Opteron, but it worked. It was only around 4% 
slower than the Atlas version, and the numbers were looking the same (within 
the numerical noise). Of course, i would prefer to use the dual Quad Xeon 
machines, because they are faster. I would suspect, that this is more like an 
issue with the Quad Xeon architecture, or with our particular system 
environment/installation.

Yours Sincerely,
   Janos.

 ==================================================================
   Janos Kiss   e-mail: janos.kiss at theochem.ruhr-uni-bochum.de       
 Lehrstuhl fuer Theoretische Chemie  Phone: +49 (0)234/32-26485 
 NC 03/297                                  +49 (0)234 32 26754
 Ruhr-Universitaet Bochum            Fax:   +49 (0)234/32-14045
 D-44780 Bochum            http://www.theochem.ruhr-uni-bochum.de
 ==================================================================