[Pw_forum] possible i/o bug in turbo_lanczos.x and turbo_davidson.x 5.3.0

Giuseppe Mattioli giuseppe.mattioli at ism.cnr.it
Thu Feb 4 17:59:43 CET 2016


Silent crash on bluegene with 5.2.1 (I have no time to compile 5.3.0 now. I may try tomorrow if you think it is important).


     Program turboTDDFT v.5.2.1 starts on  4Feb2016 at 17:56:55

     This program is part of the open-source Quantum ESPRESSO suite
     for quantum simulation of materials; please cite
         "P. Giannozzi et al., J. Phys.:Condens. Matter 21 395502 (2009);
          URL http://www.quantum-espresso.org",
     in publications or presentations arising from this work. More details at
     http://www.quantum-espresso.org/quote

     Parallel version (MPI & OpenMP), running on    2048 processor cores
     Number of MPI processes:               512
     Threads/MPI process:                     4
     R & G space division:  proc/nbgrp/npool/nimage =     512

     Reading data from directory:
     /gpfs/scratch/userexternal/gmattiol/test/tddft/run/tmp/././l0-5.3.0.save

   Info: using nr1, nr2, nr3 values from input

   Info: using nr1, nr2, nr3 values from input

     IMPORTANT: XC functional enforced from input :
     Exchange-correlation      =  SLA  PW   PBE  PBE ( 1  4  3  4 0 0)
     Any further DFT definition will be discarded
     Please, verify this is what you really want


     Parallelization info
     --------------------
     sticks:   dense  smooth     PW     G-vecs:    dense   smooth      PW
     Min          78      38      8                12054     4220     492
     Max          80      40     10                12104     4300     550
     Sum       40733   20369   5097              6186431  2186841  273425
     Tot       20367   10185   2549


     negative rho (up, down):  9.597E-02 0.000E+00

     Subspace diagonalization in iterative solution of the eigenvalue problem:
     scalapack distributed-memory algorithm (size of sub-group: 16* 16 procs)


     Warning: There are virtual states in the input file, trying to disregard in response calculation

     Ultrasoft (Vanderbilt) Pseudopotentials

     Normal read

     Gamma point algorithm
2016-02-04 17:57:18.063 (WARN ) [0x40000ee8d50] :7014845:ibm.runjob.client.Job: terminated by signal 6
2016-02-04 17:57:18.065 (WARN ) [0x40000ee8d50] :7014845:ibm.runjob.client.Job: abnormal termination by signal 6 from rank 295
On Thursday, February 04, 2016 03:46:10 PM Timrov Iurii wrote:
> Dear Giuseppe,
> 
> As far as I understand the code crashes when it tries to write the vectors "d0psi" to the disc. First thing to do, I think, is to check that you
> have enough space on the disc. If this is not the issue, then let's continue looking for a reason.
> 
> You may want to look in the routine TDDFPT/src/lr_solve_e.f90 at lines 110-138 where the code writes vectors to the disc in parallel. Please make
> sure that the "outdir" is the same in PWscf and in Lanczos/Davidson (and don't specify wfcdir). If this does not solve the problem, could you
> report please also the output of Lanczos/Davidson (better Lanczos)?
> 
> HTH
> 
> Best regards,
> Iurii Timrov
> Post-doctoral researcher
> THEOS - École Polytechnique Fédérale de Lausanne
> Lausanne, Switzerland
> 
> 
> ________________________________________
> From: pw_forum-bounces at pwscf.org <pw_forum-bounces at pwscf.org> on behalf of Giuseppe Mattioli <giuseppe.mattioli at ism.cnr.it>
> Sent: Thursday, February 4, 2016 11:34 AM
> To: pw_forum at pwscf.org
> Subject: [Pw_forum] possible i/o bug in turbo_lanczos.x and turbo_davidson.x    5.3.0
> 
> Dear All
> I'm having problems when performing nontrivial runs of turbo_davidson.x and turbo_lanczos.x with 5.2.1 and 5.3.0 versions of QE.
> Let me say first that "trivial" runs (CH4 molecule with same pseudopotentials and cutoffs but a smaller 30 a.u.^3 cubic cell) work fine with all the
> tested versions.
> However, the input files for a nontrivial case that leads to crash should run on a decent pc in about 1 hr, so they provide a significant but not
> huge test. *Note* that if I run the same input files with the 5.1.1 version (compiled against the very same environment) everything goes (more
> slowly but) fine! The 5.3.0 (and 5.2.1) crashes have been reproduced on two different machines (intel 8 cores 16GB RAM, amd 32 cores 64 GB RAM), so
> they should not be considered as erratic.
> 
> here is the pw.x run. The PPs are quite old and can be found in the online library (or provided by me on demand).
> 
>  &control
>     calculation = 'scf'
>     restart_mode='from_scratch',
>     prefix='l0-5.3.0',
>     pseudo_dir = '/home/mattioligi/PP_UPF/',
>     outdir='/home/mattioligi/cocat/test_tddft/5.2.1/l0/5.3.0/run/tmp/',
>     nstep=300,
>     max_seconds=80000,
>     disk_io='low',
>     tprnfor=.true.,
>  /
>  &system
>     ibrav=1, celldm(1)=40.000000,
>     nat=42, ntyp=4, nbnd=75,
>     ecutwfc = 40.0,
>     ecutrho = 320.0,
>     nspin=1,
>  /
>  &electrons
>     diagonalization='david',
>     mixing_mode='plain'
>     mixing_beta=0.1
>     conv_thr=1.0d-8
>     electron_maxstep=100
>  /
>  &ions
>  /
> ATOMIC_SPECIES
> O    15.999    O_pbe.van.UPF
> N    14.007    N.pbe-van_bm.UPF
> C    12.011    C_pbe.van.UPF
> H     1.008    H_pbe.van.UPF
> ATOMIC_POSITIONS {angstrom}
> C        4.815369179  12.355337788   8.111406911
> C        5.639537337  12.072210478   7.018248617
> C        6.373883049  10.886794669   6.974735758
> H        5.707874252  12.778745273   6.179910928
> C        4.734413944  11.441350355   9.166316558
> H        4.235443595  13.287281698   8.140567718
> C        6.304598307   9.977077773   8.041477142
> H        7.012644682  10.659891408   6.111132336
> C        5.477180541  10.260422385   9.138835842
> H        4.092409998  11.653694694  10.031778418
> H        5.418528381   9.546881383   9.971310698
> N        7.058612774   8.759574945   8.006208499
> C        6.384981399   7.544139013   8.340645249
> C        6.997532612   6.588483316   9.168188787
> C        5.084708421   7.308024697   7.864810575
> C        6.325550737   5.410241765   9.493833204
> H        8.006262126   6.776794433   9.557919083
> C        4.414663626   6.134355690   8.210976959
> H        4.597637090   8.055249046   7.224770074
> C        5.030975670   5.176070562   9.020776666
> H        6.819890970   4.670618768  10.138154855
> H        3.397721512   5.964689741   7.832306200
> H        4.503298572   4.249946635   9.284425745
> C        8.412602212   8.773905175   7.652414992
> C        9.197305040   9.938168667   7.841458619
> C        9.043381168   7.634703664   7.098599788
> C       10.533008285   9.972397555   7.486007356
> H        8.740413757  10.828552107   8.290447985
> C       10.383506998   7.674400214   6.758021800
> H        8.466388332   6.717306584   6.931252215
> C       11.175184928   8.838234071   6.927523312
> H       11.098162573  10.894629696   7.663657304
> H       10.849606517   6.778483121   6.322529487
> C       12.554045113   8.768090174   6.529797787
> C       13.538745611   9.729179498   6.474718127
> H       12.882286114   7.769870632   6.203237321
> C       13.338246843  11.096686263   6.810664645
> N       13.160471613  12.223162736   7.083088078
> C       14.914360413   9.407055683   6.034105289
> O       15.832284936  10.221452163   5.956798921
> O       15.091537629   8.085358800   5.710801225
> H       16.043983143   8.016066678   5.436328923
> K_POINTS {gamma}
> 
> And here are the turbo_lanczos.x and turbo davidson.x input files
> 
> lanczos
> 
> &lr_input
>     prefix="l0-5.3.0",
>     outdir='/state/partition1/mattioligi/34339',
>     wfcdir='/state/partition1/mattioligi/34339',
>     restart_step=6,
>     restart=.false.
> /
> &lr_control
>     itermax=12,
>     ipol=4,
> /
> 
> davidson
> 
> &lr_input
>     prefix="l0-5.3.0",
>     outdir='/state/partition1/mattioligi/34340',
>     restart=.false.
> /
> &lr_dav
>     num_eign=2
>     num_init=4
>     num_basis_max=10
>     residue_conv_thr=1.0E-4
>     start=0.1
>     finish=1.5
>     step=0.0002
>     broadening=0.005
>     reference=0.2
>     p_nbnd_occ=5
>     p_nbnd_virt=5
>     poor_of_ram=.false.
>     poor_of_ram2=.false.
> /
> 
> In both cases and on both machines the CRASH report is something like
> 
>  %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
>      task #         1
>      from davcio : error #        20
>      error while writing from file "/state/partition1/mattioligi/34340/l0-5.3.0.d0psi.32"
>  %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
> 
> I suppose that it is some kind of I/O error, but I warmly require your opinion...:-)
> Thank you in advance
> Giuseppe
> 
> ********************************************************
> - Article premier - Les hommes naissent et demeurent
> libres et égaux en droits. Les distinctions sociales
> ne peuvent être fondées que sur l'utilité commune
> - Article 2 - Le but de toute association politique
> est la conservation des droits naturels et
> imprescriptibles de l'homme. Ces droits sont la liberté,
> la propriété, la sûreté et la résistance à l'oppression.
> ********************************************************
> 
>    Giuseppe Mattioli
>    CNR - ISTITUTO DI STRUTTURA DELLA MATERIA
>    v. Salaria Km 29,300 - C.P. 10
>    I 00015 - Monterotondo Stazione (RM), Italy
>    Tel + 39 06 90672836 - Fax +39 06 90672316
>    E-mail: <giuseppe.mattioli at ism.cnr.it>
>    http://www.ism.cnr.it/english/staff/mattiolig
>    ResearcherID: F-6308-2012
> 
> _______________________________________________
> Pw_forum mailing list
> Pw_forum at pwscf.org
> http://pwscf.org/mailman/listinfo/pw_forum

********************************************************
- Article premier - Les hommes naissent et demeurent
libres et égaux en droits. Les distinctions sociales
ne peuvent être fondées que sur l'utilité commune
- Article 2 - Le but de toute association politique
est la conservation des droits naturels et 
imprescriptibles de l'homme. Ces droits sont la liberté,
la propriété, la sûreté et la résistance à l'oppression.
********************************************************

   Giuseppe Mattioli                            
   CNR - ISTITUTO DI STRUTTURA DELLA MATERIA   
   v. Salaria Km 29,300 - C.P. 10                
   I 00015 - Monterotondo Stazione (RM), Italy    
   Tel + 39 06 90672836 - Fax +39 06 90672316    
   E-mail: <giuseppe.mattioli at ism.cnr.it>
   http://www.ism.cnr.it/english/staff/mattiolig
   ResearcherID: F-6308-2012




More information about the users mailing list