[Pw_forum] possible i/o bug in turbo_lanczos.x and turbo_davidson.x 5.3.0
Giuseppe Mattioli
giuseppe.mattioli at ism.cnr.it
Thu Feb 4 17:59:43 CET 2016
Silent crash on bluegene with 5.2.1 (I have no time to compile 5.3.0 now. I may try tomorrow if you think it is important).
Program turboTDDFT v.5.2.1 starts on 4Feb2016 at 17:56:55
This program is part of the open-source Quantum ESPRESSO suite
for quantum simulation of materials; please cite
"P. Giannozzi et al., J. Phys.:Condens. Matter 21 395502 (2009);
URL http://www.quantum-espresso.org",
in publications or presentations arising from this work. More details at
http://www.quantum-espresso.org/quote
Parallel version (MPI & OpenMP), running on 2048 processor cores
Number of MPI processes: 512
Threads/MPI process: 4
R & G space division: proc/nbgrp/npool/nimage = 512
Reading data from directory:
/gpfs/scratch/userexternal/gmattiol/test/tddft/run/tmp/././l0-5.3.0.save
Info: using nr1, nr2, nr3 values from input
Info: using nr1, nr2, nr3 values from input
IMPORTANT: XC functional enforced from input :
Exchange-correlation = SLA PW PBE PBE ( 1 4 3 4 0 0)
Any further DFT definition will be discarded
Please, verify this is what you really want
Parallelization info
--------------------
sticks: dense smooth PW G-vecs: dense smooth PW
Min 78 38 8 12054 4220 492
Max 80 40 10 12104 4300 550
Sum 40733 20369 5097 6186431 2186841 273425
Tot 20367 10185 2549
negative rho (up, down): 9.597E-02 0.000E+00
Subspace diagonalization in iterative solution of the eigenvalue problem:
scalapack distributed-memory algorithm (size of sub-group: 16* 16 procs)
Warning: There are virtual states in the input file, trying to disregard in response calculation
Ultrasoft (Vanderbilt) Pseudopotentials
Normal read
Gamma point algorithm
2016-02-04 17:57:18.063 (WARN ) [0x40000ee8d50] :7014845:ibm.runjob.client.Job: terminated by signal 6
2016-02-04 17:57:18.065 (WARN ) [0x40000ee8d50] :7014845:ibm.runjob.client.Job: abnormal termination by signal 6 from rank 295
On Thursday, February 04, 2016 03:46:10 PM Timrov Iurii wrote:
> Dear Giuseppe,
>
> As far as I understand the code crashes when it tries to write the vectors "d0psi" to the disc. First thing to do, I think, is to check that you
> have enough space on the disc. If this is not the issue, then let's continue looking for a reason.
>
> You may want to look in the routine TDDFPT/src/lr_solve_e.f90 at lines 110-138 where the code writes vectors to the disc in parallel. Please make
> sure that the "outdir" is the same in PWscf and in Lanczos/Davidson (and don't specify wfcdir). If this does not solve the problem, could you
> report please also the output of Lanczos/Davidson (better Lanczos)?
>
> HTH
>
> Best regards,
> Iurii Timrov
> Post-doctoral researcher
> THEOS - École Polytechnique Fédérale de Lausanne
> Lausanne, Switzerland
>
>
> ________________________________________
> From: pw_forum-bounces at pwscf.org <pw_forum-bounces at pwscf.org> on behalf of Giuseppe Mattioli <giuseppe.mattioli at ism.cnr.it>
> Sent: Thursday, February 4, 2016 11:34 AM
> To: pw_forum at pwscf.org
> Subject: [Pw_forum] possible i/o bug in turbo_lanczos.x and turbo_davidson.x 5.3.0
>
> Dear All
> I'm having problems when performing nontrivial runs of turbo_davidson.x and turbo_lanczos.x with 5.2.1 and 5.3.0 versions of QE.
> Let me say first that "trivial" runs (CH4 molecule with same pseudopotentials and cutoffs but a smaller 30 a.u.^3 cubic cell) work fine with all the
> tested versions.
> However, the input files for a nontrivial case that leads to crash should run on a decent pc in about 1 hr, so they provide a significant but not
> huge test. *Note* that if I run the same input files with the 5.1.1 version (compiled against the very same environment) everything goes (more
> slowly but) fine! The 5.3.0 (and 5.2.1) crashes have been reproduced on two different machines (intel 8 cores 16GB RAM, amd 32 cores 64 GB RAM), so
> they should not be considered as erratic.
>
> here is the pw.x run. The PPs are quite old and can be found in the online library (or provided by me on demand).
>
> &control
> calculation = 'scf'
> restart_mode='from_scratch',
> prefix='l0-5.3.0',
> pseudo_dir = '/home/mattioligi/PP_UPF/',
> outdir='/home/mattioligi/cocat/test_tddft/5.2.1/l0/5.3.0/run/tmp/',
> nstep=300,
> max_seconds=80000,
> disk_io='low',
> tprnfor=.true.,
> /
> &system
> ibrav=1, celldm(1)=40.000000,
> nat=42, ntyp=4, nbnd=75,
> ecutwfc = 40.0,
> ecutrho = 320.0,
> nspin=1,
> /
> &electrons
> diagonalization='david',
> mixing_mode='plain'
> mixing_beta=0.1
> conv_thr=1.0d-8
> electron_maxstep=100
> /
> &ions
> /
> ATOMIC_SPECIES
> O 15.999 O_pbe.van.UPF
> N 14.007 N.pbe-van_bm.UPF
> C 12.011 C_pbe.van.UPF
> H 1.008 H_pbe.van.UPF
> ATOMIC_POSITIONS {angstrom}
> C 4.815369179 12.355337788 8.111406911
> C 5.639537337 12.072210478 7.018248617
> C 6.373883049 10.886794669 6.974735758
> H 5.707874252 12.778745273 6.179910928
> C 4.734413944 11.441350355 9.166316558
> H 4.235443595 13.287281698 8.140567718
> C 6.304598307 9.977077773 8.041477142
> H 7.012644682 10.659891408 6.111132336
> C 5.477180541 10.260422385 9.138835842
> H 4.092409998 11.653694694 10.031778418
> H 5.418528381 9.546881383 9.971310698
> N 7.058612774 8.759574945 8.006208499
> C 6.384981399 7.544139013 8.340645249
> C 6.997532612 6.588483316 9.168188787
> C 5.084708421 7.308024697 7.864810575
> C 6.325550737 5.410241765 9.493833204
> H 8.006262126 6.776794433 9.557919083
> C 4.414663626 6.134355690 8.210976959
> H 4.597637090 8.055249046 7.224770074
> C 5.030975670 5.176070562 9.020776666
> H 6.819890970 4.670618768 10.138154855
> H 3.397721512 5.964689741 7.832306200
> H 4.503298572 4.249946635 9.284425745
> C 8.412602212 8.773905175 7.652414992
> C 9.197305040 9.938168667 7.841458619
> C 9.043381168 7.634703664 7.098599788
> C 10.533008285 9.972397555 7.486007356
> H 8.740413757 10.828552107 8.290447985
> C 10.383506998 7.674400214 6.758021800
> H 8.466388332 6.717306584 6.931252215
> C 11.175184928 8.838234071 6.927523312
> H 11.098162573 10.894629696 7.663657304
> H 10.849606517 6.778483121 6.322529487
> C 12.554045113 8.768090174 6.529797787
> C 13.538745611 9.729179498 6.474718127
> H 12.882286114 7.769870632 6.203237321
> C 13.338246843 11.096686263 6.810664645
> N 13.160471613 12.223162736 7.083088078
> C 14.914360413 9.407055683 6.034105289
> O 15.832284936 10.221452163 5.956798921
> O 15.091537629 8.085358800 5.710801225
> H 16.043983143 8.016066678 5.436328923
> K_POINTS {gamma}
>
> And here are the turbo_lanczos.x and turbo davidson.x input files
>
> lanczos
>
> &lr_input
> prefix="l0-5.3.0",
> outdir='/state/partition1/mattioligi/34339',
> wfcdir='/state/partition1/mattioligi/34339',
> restart_step=6,
> restart=.false.
> /
> &lr_control
> itermax=12,
> ipol=4,
> /
>
> davidson
>
> &lr_input
> prefix="l0-5.3.0",
> outdir='/state/partition1/mattioligi/34340',
> restart=.false.
> /
> &lr_dav
> num_eign=2
> num_init=4
> num_basis_max=10
> residue_conv_thr=1.0E-4
> start=0.1
> finish=1.5
> step=0.0002
> broadening=0.005
> reference=0.2
> p_nbnd_occ=5
> p_nbnd_virt=5
> poor_of_ram=.false.
> poor_of_ram2=.false.
> /
>
> In both cases and on both machines the CRASH report is something like
>
> %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
> task # 1
> from davcio : error # 20
> error while writing from file "/state/partition1/mattioligi/34340/l0-5.3.0.d0psi.32"
> %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
>
> I suppose that it is some kind of I/O error, but I warmly require your opinion...:-)
> Thank you in advance
> Giuseppe
>
> ********************************************************
> - Article premier - Les hommes naissent et demeurent
> libres et égaux en droits. Les distinctions sociales
> ne peuvent être fondées que sur l'utilité commune
> - Article 2 - Le but de toute association politique
> est la conservation des droits naturels et
> imprescriptibles de l'homme. Ces droits sont la liberté,
> la propriété, la sûreté et la résistance à l'oppression.
> ********************************************************
>
> Giuseppe Mattioli
> CNR - ISTITUTO DI STRUTTURA DELLA MATERIA
> v. Salaria Km 29,300 - C.P. 10
> I 00015 - Monterotondo Stazione (RM), Italy
> Tel + 39 06 90672836 - Fax +39 06 90672316
> E-mail: <giuseppe.mattioli at ism.cnr.it>
> http://www.ism.cnr.it/english/staff/mattiolig
> ResearcherID: F-6308-2012
>
> _______________________________________________
> Pw_forum mailing list
> Pw_forum at pwscf.org
> http://pwscf.org/mailman/listinfo/pw_forum
********************************************************
- Article premier - Les hommes naissent et demeurent
libres et égaux en droits. Les distinctions sociales
ne peuvent être fondées que sur l'utilité commune
- Article 2 - Le but de toute association politique
est la conservation des droits naturels et
imprescriptibles de l'homme. Ces droits sont la liberté,
la propriété, la sûreté et la résistance à l'oppression.
********************************************************
Giuseppe Mattioli
CNR - ISTITUTO DI STRUTTURA DELLA MATERIA
v. Salaria Km 29,300 - C.P. 10
I 00015 - Monterotondo Stazione (RM), Italy
Tel + 39 06 90672836 - Fax +39 06 90672316
E-mail: <giuseppe.mattioli at ism.cnr.it>
http://www.ism.cnr.it/english/staff/mattiolig
ResearcherID: F-6308-2012
More information about the users
mailing list