<div dir="ltr"><div><div><div><div><div><span class=""></span>Dear All,<br></div><span class=""><span><br></span></span></div><span class=""><span> </span></span>I have found that calculations involving band group parallelism that worked correctly using QE 5.2.0 produce errors in version 5.3.0 (see below for an example input file). In particular, when I run a PBE0 calculation with
either nbgrp or ndiag set to 1, everything runs correctly; however, when I run a calculation with both nbgrp and ndiag set greater than 1, the
calculation immediately fails with the following error messages:<br><br>Rank 48 [Mon Jan 25 09:52:04 2016] [c0-0c0s14n2] Fatal error in PMPI_Group_incl: Invalid rank, error stack:<br>PMPI_Group_incl(173).............: MPI_Group_incl(group=0x88000002, n=36, ranks=0x53a3c80, new_group=0x7fffffff6794) failed<br>MPIR_Group_check_valid_ranks(259): Duplicate ranks in rank array at index 12, has value 0 which is also the value at index 0<br>Rank 93 [Mon Jan 25 09:52:04 2016] [c0-0c0s14n3] Fatal error in PMPI_Group_incl: Invalid rank, error stack:<br>PMPI_Group_incl(173).............: MPI_Group_incl(group=0x88000002, n=36, ranks=0x538fdf0, new_group=0x7fffffff6794) failed<br>MPIR_Group_check_valid_ranks(259): Duplicate ranks in rank array at index 12, has value 0 which is also the value at index 0<br>etc...<br><br><div> The error is apparently related to a change in Modules/mp_global.f90 on line 80. Here, the line previously read:<br><br>CALL mp_start_diag ( ndiag_, intra_BGRP_comm )<br><br></div><div>In QE 5.3.0, this has been changed to:<br></div><div><br>CALL mp_start_diag ( ndiag_, intra_POOL_comm )<br></div><div><br></div><div> The call using intra_BGRP_comm still exists in version 5.3.0 of the code, but is commented out, and the surrounding comments indicate that it should be possible to switch back to the old parallelization by commenting/uncommenting as desired. When I do this, I find that instead of the error messages described above, I get the following error messages:<br><br></div><div>Error in routine cdiaghg(193):<br></div><div> problems computing cholesky<br><br> Am I missing something, or are these errors the result of a bug?<br></div><br></div>Best Regards,<br><br></div>Dr. Taylor Barnes,<br></div>Lawrence Berkeley National Laboratory<br><div><div><div><div><br><br>=================<br></div><div>Run Command:<br>=================<br><br>srun -n 96 pw.x -nbgrp 4 -in input > input.out<br></div><div><br><br><br>=================<br></div><div>Input File:<br>=================<br><br>&control<br>prefix = 'water'<br>calculation = 'scf'<br>restart_mode = 'from_scratch'<br>wf_collect = .true.<br>disk_io = 'none'<br>tstress = .false.<br>tprnfor = .false.<br>outdir = './'<br>wfcdir = './'<br>pseudo_dir = '/global/homes/t/tabarnes/espresso/pseudo'<br>/<br>&system<br>ibrav = 1<br>celldm(1) = 15.249332837<br>nat = 48<br>ntyp = 2<br>ecutwfc = 130<br>input_dft = 'pbe0'<br>/<br>&electrons<br>diago_thr_init=5.0d-4<br>mixing_mode = 'plain'<br>mixing_beta = 0.7<br>mixing_ndim = 8<br>diagonalization = 'david'<br>diago_david_ndim = 4<br>diago_full_acc = .true.<br>electron_maxstep=3<br>scf_must_converge=.false.<br>/<br>ATOMIC_SPECIES<br>O 15.999 O.pbe-mt_fhi.UPF<br>H 1.008 H.pbe-mt_fhi.UPF<br>ATOMIC_POSITIONS alat<br> O 0.405369 0.567356 0.442192<br> H 0.471865 0.482160 0.381557<br> H 0.442867 0.572759 0.560178<br> O 0.584679 0.262476 0.215740<br> H 0.689058 0.204790 0.249459<br> H 0.503275 0.179176 0.173433<br> O 0.613936 0.468084 0.701359<br> H 0.720162 0.421081 0.658182<br> H 0.629377 0.503798 0.819016<br> O 0.692499 0.571474 0.008796<br> H 0.815865 0.562339 0.016182<br> H 0.640331 0.489132 0.085318<br> O 0.138542 0.767947 0.322270<br> H 0.052664 0.771819 0.411531<br> H 0.239736 0.710419 0.364788<br> O 0.127282 0.623278 0.765792<br> H 0.075781 0.693268 0.677441<br> H 0.243000 0.662182 0.787094<br> O 0.572799 0.844477 0.542529<br> H 0.556579 0.966998 0.533420<br> H 0.548297 0.791340 0.433292<br> O -0.007677 0.992860 0.095967<br> H 0.064148 1.011844 -0.003219<br> H 0.048026 0.913005 0.172625<br> O 0.035337 0.547318 0.085085<br> H 0.072732 0.625835 0.173379<br> H 0.089917 0.576762 -0.022194<br> O 0.666008 0.900155 0.183677<br> H 0.773299 0.937456 0.134145<br> H 0.609289 0.822407 0.105606<br> O 0.443447 0.737755 0.836152<br> H 0.526041 0.665651 0.893906<br> H 0.483300 0.762549 0.721464<br> O 0.934493 0.378765 0.627850<br> H 1.012721 0.449242 0.693201<br> H 0.955703 0.394823 0.506816<br> O 0.006386 0.270244 0.269327<br> H 0.021231 0.364797 0.190612<br> H 0.021863 0.163251 0.208755<br> O 0.936337 0.855942 0.611999<br> H 0.956610 0.972475 0.648965<br> H 0.815045 0.839173 0.592915<br> O 0.228881 0.037509 0.849634<br> H 0.263938 0.065862 0.734213<br> H 0.282576 -0.068680 0.884220<br> O 0.346187 0.176679 0.553828<br> H 0.247521 0.218347 0.491489<br> H 0.402671 0.271609 0.610010<br>K_POINTS automatic<br>1 1 1 1 1 1<br><br></div><div><br><br></div></div></div></div></div>