[Pw_forum] possible i/o bug in turbo_lanczos.x and turbo_davidson.x 5.3.0

Giuseppe Mattioli giuseppe.mattioli at ism.cnr.it
Thu Feb 4 17:48:06 CET 2016


Dear Iurii
Thanks for your help.

> First thing to do, I think, is to check that you
> have enough space on the disc.

Yes, there is

> You may want to look in the routine TDDFPT/src/lr_solve_e.f90 at lines 110-138 where the code writes vectors to the disc in parallel.

I've checked that the lines are very similar (substantially identical?) in 5.3.0 and 5.1.1. So it is very strange that the latter works when the 
former does not.

> Please make
> sure that the "outdir" is the same in PWscf and in Lanczos/Davidson (and don't specify wfcdir).

Done, I've also tried to place "outdir" in local or distributed filesystem: always same results

> If this does not solve the problem, could you
> report please also the output of Lanczos/Davidson (better Lanczos)?

Lanczos output attached

I'm sending the same test to the fermi at cineca bluegene machine to see if the xlf compilers like my scripts more than the intel compilers...

Best Wishes
Giuseppe

On Thursday, February 04, 2016 03:46:10 PM Timrov Iurii wrote:
> Dear Giuseppe,
> 
> As far as I understand the code crashes when it tries to write the vectors "d0psi" to the disc. First thing to do, I think, is to check that you
> have enough space on the disc. If this is not the issue, then let's continue looking for a reason.
> 
> You may want to look in the routine TDDFPT/src/lr_solve_e.f90 at lines 110-138 where the code writes vectors to the disc in parallel. Please make
> sure that the "outdir" is the same in PWscf and in Lanczos/Davidson (and don't specify wfcdir). If this does not solve the problem, could you
> report please also the output of Lanczos/Davidson (better Lanczos)?
> 
> HTH
> 
> Best regards,
> Iurii Timrov
> Post-doctoral researcher
> THEOS - École Polytechnique Fédérale de Lausanne
> Lausanne, Switzerland
> 
> 
> ________________________________________
> From: pw_forum-bounces at pwscf.org <pw_forum-bounces at pwscf.org> on behalf of Giuseppe Mattioli <giuseppe.mattioli at ism.cnr.it>
> Sent: Thursday, February 4, 2016 11:34 AM
> To: pw_forum at pwscf.org
> Subject: [Pw_forum] possible i/o bug in turbo_lanczos.x and turbo_davidson.x    5.3.0
> 
> Dear All
> I'm having problems when performing nontrivial runs of turbo_davidson.x and turbo_lanczos.x with 5.2.1 and 5.3.0 versions of QE.
> Let me say first that "trivial" runs (CH4 molecule with same pseudopotentials and cutoffs but a smaller 30 a.u.^3 cubic cell) work fine with all the
> tested versions.
> However, the input files for a nontrivial case that leads to crash should run on a decent pc in about 1 hr, so they provide a significant but not
> huge test. *Note* that if I run the same input files with the 5.1.1 version (compiled against the very same environment) everything goes (more
> slowly but) fine! The 5.3.0 (and 5.2.1) crashes have been reproduced on two different machines (intel 8 cores 16GB RAM, amd 32 cores 64 GB RAM), so
> they should not be considered as erratic.
> 
> here is the pw.x run. The PPs are quite old and can be found in the online library (or provided by me on demand).
> 
>  &control
>     calculation = 'scf'
>     restart_mode='from_scratch',
>     prefix='l0-5.3.0',
>     pseudo_dir = '/home/mattioligi/PP_UPF/',
>     outdir='/home/mattioligi/cocat/test_tddft/5.2.1/l0/5.3.0/run/tmp/',
>     nstep=300,
>     max_seconds=80000,
>     disk_io='low',
>     tprnfor=.true.,
>  /
>  &system
>     ibrav=1, celldm(1)=40.000000,
>     nat=42, ntyp=4, nbnd=75,
>     ecutwfc = 40.0,
>     ecutrho = 320.0,
>     nspin=1,
>  /
>  &electrons
>     diagonalization='david',
>     mixing_mode='plain'
>     mixing_beta=0.1
>     conv_thr=1.0d-8
>     electron_maxstep=100
>  /
>  &ions
>  /
> ATOMIC_SPECIES
> O    15.999    O_pbe.van.UPF
> N    14.007    N.pbe-van_bm.UPF
> C    12.011    C_pbe.van.UPF
> H     1.008    H_pbe.van.UPF
> ATOMIC_POSITIONS {angstrom}
> C        4.815369179  12.355337788   8.111406911
> C        5.639537337  12.072210478   7.018248617
> C        6.373883049  10.886794669   6.974735758
> H        5.707874252  12.778745273   6.179910928
> C        4.734413944  11.441350355   9.166316558
> H        4.235443595  13.287281698   8.140567718
> C        6.304598307   9.977077773   8.041477142
> H        7.012644682  10.659891408   6.111132336
> C        5.477180541  10.260422385   9.138835842
> H        4.092409998  11.653694694  10.031778418
> H        5.418528381   9.546881383   9.971310698
> N        7.058612774   8.759574945   8.006208499
> C        6.384981399   7.544139013   8.340645249
> C        6.997532612   6.588483316   9.168188787
> C        5.084708421   7.308024697   7.864810575
> C        6.325550737   5.410241765   9.493833204
> H        8.006262126   6.776794433   9.557919083
> C        4.414663626   6.134355690   8.210976959
> H        4.597637090   8.055249046   7.224770074
> C        5.030975670   5.176070562   9.020776666
> H        6.819890970   4.670618768  10.138154855
> H        3.397721512   5.964689741   7.832306200
> H        4.503298572   4.249946635   9.284425745
> C        8.412602212   8.773905175   7.652414992
> C        9.197305040   9.938168667   7.841458619
> C        9.043381168   7.634703664   7.098599788
> C       10.533008285   9.972397555   7.486007356
> H        8.740413757  10.828552107   8.290447985
> C       10.383506998   7.674400214   6.758021800
> H        8.466388332   6.717306584   6.931252215
> C       11.175184928   8.838234071   6.927523312
> H       11.098162573  10.894629696   7.663657304
> H       10.849606517   6.778483121   6.322529487
> C       12.554045113   8.768090174   6.529797787
> C       13.538745611   9.729179498   6.474718127
> H       12.882286114   7.769870632   6.203237321
> C       13.338246843  11.096686263   6.810664645
> N       13.160471613  12.223162736   7.083088078
> C       14.914360413   9.407055683   6.034105289
> O       15.832284936  10.221452163   5.956798921
> O       15.091537629   8.085358800   5.710801225
> H       16.043983143   8.016066678   5.436328923
> K_POINTS {gamma}
> 
> And here are the turbo_lanczos.x and turbo davidson.x input files
> 
> lanczos
> 
> &lr_input
>     prefix="l0-5.3.0",
>     outdir='/state/partition1/mattioligi/34339',
>     wfcdir='/state/partition1/mattioligi/34339',
>     restart_step=6,
>     restart=.false.
> /
> &lr_control
>     itermax=12,
>     ipol=4,
> /
> 
> davidson
> 
> &lr_input
>     prefix="l0-5.3.0",
>     outdir='/state/partition1/mattioligi/34340',
>     restart=.false.
> /
> &lr_dav
>     num_eign=2
>     num_init=4
>     num_basis_max=10
>     residue_conv_thr=1.0E-4
>     start=0.1
>     finish=1.5
>     step=0.0002
>     broadening=0.005
>     reference=0.2
>     p_nbnd_occ=5
>     p_nbnd_virt=5
>     poor_of_ram=.false.
>     poor_of_ram2=.false.
> /
> 
> In both cases and on both machines the CRASH report is something like
> 
>  %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
>      task #         1
>      from davcio : error #        20
>      error while writing from file "/state/partition1/mattioligi/34340/l0-5.3.0.d0psi.32"
>  %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
> 
> I suppose that it is some kind of I/O error, but I warmly require your opinion...:-)
> Thank you in advance
> Giuseppe
> 
> ********************************************************
> - Article premier - Les hommes naissent et demeurent
> libres et égaux en droits. Les distinctions sociales
> ne peuvent être fondées que sur l'utilité commune
> - Article 2 - Le but de toute association politique
> est la conservation des droits naturels et
> imprescriptibles de l'homme. Ces droits sont la liberté,
> la propriété, la sûreté et la résistance à l'oppression.
> ********************************************************
> 
>    Giuseppe Mattioli
>    CNR - ISTITUTO DI STRUTTURA DELLA MATERIA
>    v. Salaria Km 29,300 - C.P. 10
>    I 00015 - Monterotondo Stazione (RM), Italy
>    Tel + 39 06 90672836 - Fax +39 06 90672316
>    E-mail: <giuseppe.mattioli at ism.cnr.it>
>    http://www.ism.cnr.it/english/staff/mattiolig
>    ResearcherID: F-6308-2012
> 
> _______________________________________________
> Pw_forum mailing list
> Pw_forum at pwscf.org
> http://pwscf.org/mailman/listinfo/pw_forum

********************************************************
- Article premier - Les hommes naissent et demeurent
libres et égaux en droits. Les distinctions sociales
ne peuvent être fondées que sur l'utilité commune
- Article 2 - Le but de toute association politique
est la conservation des droits naturels et 
imprescriptibles de l'homme. Ces droits sont la liberté,
la propriété, la sûreté et la résistance à l'oppression.
********************************************************

   Giuseppe Mattioli                            
   CNR - ISTITUTO DI STRUTTURA DELLA MATERIA   
   v. Salaria Km 29,300 - C.P. 10                
   I 00015 - Monterotondo Stazione (RM), Italy    
   Tel + 39 06 90672836 - Fax +39 06 90672316    
   E-mail: <giuseppe.mattioli at ism.cnr.it>
   http://www.ism.cnr.it/english/staff/mattiolig
   ResearcherID: F-6308-2012
-------------- next part --------------

     Program turboTDDFT v.5.3.0 (svn rev. 11974) starts on  3Feb2016 at 11: 6:53 

     This program is part of the open-source Quantum ESPRESSO suite
     for quantum simulation of materials; please cite
         "P. Giannozzi et al., J. Phys.:Condens. Matter 21 395502 (2009);
          URL http://www.quantum-espresso.org", 
     in publications or presentations arising from this work. More details at
     http://www.quantum-espresso.org/quote

     Parallel version (MPI), running on     8 processors
     R & G space division:  proc/nbgrp/npool/nimage =       8

     Reading data from directory:
     /tmp/mattioli/107868.blade-07.sic.rm.cnr.it/l0-5.3.0.save

   Info: using nr1, nr2, nr3 values from input

   Info: using nr1, nr2, nr3 values from input

     IMPORTANT: XC functional enforced from input :
     Exchange-correlation      =  SLA  PW   PBE  PBE ( 1  4  3  4 0 0)
     Any further DFT definition will be discarded
     Please, verify this is what you really want

 
     Parallelization info
     --------------------
     sticks:   dense  smooth     PW     G-vecs:    dense   smooth      PW
     Min        5090    2546    636               773298   273310   34176
     Max        5092    2547    638               773308   273399   34182
     Sum       40733   20369   5097              6186431  2186841  273425
     Tot       20367   10185   2549
 

     negative rho (up, down):  9.597E-02 0.000E+00

     Subspace diagonalization in iterative solution of the eigenvalue problem:
     one sub-group per k-point group (pool) will be used
     scalapack distributed-memory algorithm (size of sub-group:  2*  2 procs)


     Warning: There are virtual states in the input file, trying to disregard in response calculation

     Ultrasoft (Vanderbilt) Pseudopotentials

     Normal read

     Gamma point algorithm

     LANCZOS LINEAR-RESPONSE SPECTRUM CALCULATION
      
     Number of Lanczos iterations =     12

     Starting Lanczos loop        1

     Lanczos iteration:      1   Pol:1
     lr_apply_liouvillian: not applying interaction
     alpha(00000001)=  0.000000
     beta (00000001)=  8.601483
     gamma(00000001)=  8.601483
     z1=       1  0.000000000000000E+00  0.000000000000000E+00
     z1=       2  0.000000000000000E+00  0.000000000000000E+00
     z1=       3  0.000000000000000E+00  0.000000000000000E+00

 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
     Error in routine davcio (22):
     error while writing from file "/tmp/mattioli/107868.blade-07.sic.rm.cnr.it/l0-5.3.0.restart_lanczos.11"
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

     stopping ...

 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
     Error in routine davcio (22):
     error while writing from file "/tmp/mattioli/107868.blade-07.sic.rm.cnr.it/l0-5.3.0.restart_lanczos.15"
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

     stopping ...

 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
     Error in routine davcio (22):

 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
     Error in routine davcio (22):
     error while writing from file "/tmp/mattioli/107868.blade-07.sic.rm.cnr.it/l0-5.3.0.restart_lanczos.13"
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

     stopping ...

 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
     Error in routine davcio (22):
     error while writing from file "/tmp/mattioli/107868.blade-07.sic.rm.cnr.it/l0-5.3.0.restart_lanczos.17"
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

     stopping ...
     Error in routine davcio (22):
     error while writing from file "/tmp/mattioli/107868.blade-07.sic.rm.cnr.it/l0-5.3.0.restart_lanczos.14"
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

     stopping ...
     error while writing from file "/tmp/mattioli/107868.blade-07.sic.rm.cnr.it/l0-5.3.0.restart_lanczos.18"
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

     stopping ...
     Error in routine davcio (22):
     error while writing from file "/tmp/mattioli/107868.blade-07.sic.rm.cnr.it/l0-5.3.0.restart_lanczos.12"
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

     stopping ...

 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
     Error in routine davcio (22):
     error while writing from file "/tmp/mattioli/107868.blade-07.sic.rm.cnr.it/l0-5.3.0.restart_lanczos.16"
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

     stopping ...


More information about the users mailing list