cao at qtp.ufl.edu
Thu Dec 9 21:08:15 CET 2004
Well, I see this behavior almost every time when I do vc-relax on an
Itanium2 cluster. I was wondering if it was due to MKL, but it's not.
However, pw.x runs OK under a IBM RS6000 cluster.
A "definitely-go-hell" input file:
calculation = 'vc-relax'
title = 'alpha-quartz'
restart_mode = 'from_scratch'
nstep = 500, iprint = 20
tstress = .true., tprnfor = .true.
dt = 10.0
prefix = 'sio2'
pseudo_dir = '../../pp/'
outdir = './'
ibrav = 4, celldm(1) = 9.6, celldm(3) = 1.10,
nat = 9, ntyp = 2
ecutwfc = 80.0
nosym = .true.
conv_thr = 1.0D-7
mixing_mode = 'plain', mixing_beta = 0.2D0
diagonalization = 'david'
ion_dynamics = 'damp'
minimization_scheme = 'damped-dyn', damp = 0.5
reset_vel = .true.
cell_dynamics = 'damp-pr'
cell_factor = 1.5D0
Si 2.0 Si.pbe-rrkj.UPF
O 2.0 O.pbe-rrkjus.UPF
Si 0.46880915 0.00000000 0.66666667
Si 0.00000000 0.46880915 0.33333333
Si -0.46880915 -0.46880915 0.00000000
O 0.40622487 0.26840588 0.77904595
O -0.26840588 0.13576296 0.44752077
O -0.13576296 -0.40622487 0.11246441
O 0.26840588 0.40622487 -0.77904595
O -0.40622487 -0.13576296 -0.11246441
O 0.13576296 -0.26840588 -0.44752077
On Thu, 9 Dec 2004, Paolo Giannozzi wrote:
> On Thursday 09 December 2004 17:35, Chao Cao wrote:
> > Thanks for reply. But does this part of the manual actually apply
> > here? Coz the program actually didn't crash, it was running but
> > doing weird stuff without any output.
> I have occasionally seen this behavior in the past:
> - when one MPI process encounters an error, while the others
> don't. This was happening on the SP3 in Princeton using k-point
> parallelization: if the diagonalization crashed on a specific k-point,
> the code hung. It was a problem with diagonalization libraries
> that has never been clarified.
> - when one MPI process encounters a condition that forces one
> process to follow a different flow from that of other processes.
> For instance: one process thinks that he has converged, while
> the others don't. This used to happen on the first T3D: processes
> that were doing exactly the same calculation produced slightly
> different answers. Of course it was a problem of the operating
> system, not of the code.
> The only way I know to track similar problems is to follow the
> code flow and to put stops until you understand where and
> why things go wrong. Unfortunately it is a time-consuming
> process and the origin of the problem could be quite subtle,
> or not even directly related to bugs in the code. Anyway: if
> you have a test that hangs reliably, please submit it
> Paolo Giannozzi e-mail: giannozz at nest.sns.it
> Scuola Normale Superiore Phone: +39/050-509876, Fax:-563513
> Piazza dei Cavalieri 7 I-56126 Pisa, Italy
More information about the users