<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Aptos;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
font-size:11.0pt;
font-family:"Aptos",sans-serif;
mso-ligatures:standardcontextual;}
p.MsoListParagraph, li.MsoListParagraph, div.MsoListParagraph
{mso-style-priority:34;
margin-top:0in;
margin-right:0in;
margin-bottom:0in;
margin-left:.5in;
font-size:11.0pt;
font-family:"Aptos",sans-serif;
mso-ligatures:standardcontextual;}
span.EmailStyle17
{mso-style-type:personal-compose;
font-family:"Aptos",sans-serif;
color:windowtext;}
.MsoChpDefault
{mso-style-type:export-only;
font-size:11.0pt;}
@page WordSection1
{size:8.5in 11.0in;
margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
{page:WordSection1;}
/* List Definitions */
@list l0
{mso-list-id:1793357413;
mso-list-type:hybrid;
mso-list-template-ids:-1758032756 67698703 67698713 67698715 67698703 67698713 67698715 67698703 67698713 67698715;}
@list l0:level1
{mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;}
@list l0:level2
{mso-level-number-format:alpha-lower;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;}
@list l0:level3
{mso-level-number-format:roman-lower;
mso-level-tab-stop:none;
mso-level-number-position:right;
text-indent:-9.0pt;}
@list l0:level4
{mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;}
@list l0:level5
{mso-level-number-format:alpha-lower;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;}
@list l0:level6
{mso-level-number-format:roman-lower;
mso-level-tab-stop:none;
mso-level-number-position:right;
text-indent:-9.0pt;}
@list l0:level7
{mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;}
@list l0:level8
{mso-level-number-format:alpha-lower;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;}
@list l0:level9
{mso-level-number-format:roman-lower;
mso-level-tab-stop:none;
mso-level-number-position:right;
text-indent:-9.0pt;}
ol
{margin-bottom:0in;}
ul
{margin-bottom:0in;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang="EN-US" link="#467886" vlink="#96607D" style="word-wrap:break-word">
<div class="WordSection1">
<p class="MsoNormal">Hi all,<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">I’m having a few isolated failed tests in the test-suite as well as a general OpenFabrics initialization error and want to check why these are happening and if it’s “OK”. I’m able to get all tests that don’t skip to pass with serial compilation
using gfortran 13.1.0. I only get failures when I switch to parallel compilation using openmpi/4.1.6. Can anyone help steer me in a direction for how to get a robust parallel compilation? Thanks in advance!<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Some details on my configuration:<o:p></o:p></p>
<p class="MsoNormal">GCC/Gfortran 13.1.0<o:p></o:p></p>
<p class="MsoNormal">QE 7.4.1<o:p></o:p></p>
<p class="MsoNormal">Openmpi 4.1.6<o:p></o:p></p>
<p class="MsoNormal">Running make run-tests NPROCS=12<o:p></o:p></p>
<p class="MsoNormal">Red Hat Enterprise Linux 8<o:p></o:p></p>
<p class="MsoNormal">Using QE internal BLAS & LAPACK<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal"><u>Many of the tests are having errors like the following, even if they pass</u>:<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">--------------------------------------------------------------------------<o:p></o:p></p>
<p class="MsoNormal">WARNING: There was an error initializing an OpenFabrics device.<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal"> Local host: pn5657<o:p></o:p></p>
<p class="MsoNormal"> Local device: mlx5_0<o:p></o:p></p>
<p class="MsoNormal">--------------------------------------------------------------------------<o:p></o:p></p>
<p class="MsoNormal">Note: The following floating-point exceptions are signalling: IEEE_UNDERFLOW_FLAG IEEE_DENORMAL<o:p></o:p></p>
<p class="MsoNormal">[pn5657:3197139] 11 more processes have sent help message help-mpi-btl-openib.txt / error in device init<o:p></o:p></p>
<p class="MsoNormal">[pn5657:3197139] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal"><u>Here are the tests that are failing</u>:<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<ol style="margin-top:0in" start="1" type="1">
<li class="MsoListParagraph" style="margin-left:0in;mso-list:l0 level1 lfo1">pw_plugins - plugin-pw2casino_1.in (arg(s): 1): **FAILED**.<o:p></o:p></li></ol>
<p class="MsoListParagraph">Different sets of data extracted from benchmark and test.<o:p></o:p></p>
<p class="MsoListParagraph"> Data only in benchmark: p1.<o:p></o:p></p>
<p class="MsoListParagraph"><o:p> </o:p></p>
<p class="MsoListParagraph">%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%<o:p></o:p></p>
<p class="MsoListParagraph"> Error in routine pw2casino (1):<o:p></o:p></p>
<p class="MsoListParagraph"> pool/band/image parallelization not (yet) implemented<o:p></o:p></p>
<p class="MsoListParagraph">%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%<o:p></o:p></p>
<p class="MsoListParagraph"><o:p> </o:p></p>
<p class="MsoListParagraph"> stopping ...<o:p></o:p></p>
<p class="MsoListParagraph"><o:p> </o:p></p>
<ol style="margin-top:0in" start="2" type="1">
<li class="MsoListParagraph" style="margin-left:0in;mso-list:l0 level1 lfo1">pw_vdw - xdm.in: **FAILED**.<o:p></o:p></li></ol>
<p class="MsoListParagraph">ef1<o:p></o:p></p>
<p class="MsoListParagraph"> ERROR: absolute error 5.62e-01 greater than 8.00e-02. (Test: 10.7872. Benchmark: 10.2253.)<o:p></o:p></p>
<p class="MsoListParagraph"> ERROR: relative error 5.50e-02 greater than 2.00e-02. (Test: 10.7872. Benchmark: 10.2253.)<o:p></o:p></p>
<p class="MsoListParagraph"><o:p> </o:p></p>
<ol style="margin-top:0in" start="3" type="1">
<li class="MsoListParagraph" style="margin-left:0in;mso-list:l0 level1 lfo1">cp_al_edft - Al.uspp.in: **FAILED**.<o:p></o:p></li></ol>
<p class="MsoListParagraph">t1<o:p></o:p></p>
<p class="MsoListParagraph"> ERROR: absolute error 1.75e-02 greater than 6.00e-03. (Test: 159.46581. Benchmark: 159.44833.)<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<ol style="margin-top:0in" start="4" type="1">
<li class="MsoListParagraph" style="margin-left:0in;mso-list:l0 level1 lfo1">ph_1d - ch4.scf.in (arg(s): 1): **FAILED**.<o:p></o:p></li></ol>
<p class="MsoListParagraph">n1<o:p></o:p></p>
<p class="MsoListParagraph"> ERROR: absolute error 6.00e+00 greater than 5.00e+00. (Test: 32.0. Benchmark: 26.0.)<o:p></o:p></p>
<p class="MsoListParagraph"><o:p> </o:p></p>
<ol style="margin-top:0in" start="5" type="1">
<li class="MsoListParagraph" style="margin-left:0in;mso-list:l0 level1 lfo1">/hpc/data/idunn/qe/7.4.1/test-suite/..//test-suite/run-hp.sh 2 Fe.scf.in test.out.070425-2.inp=Fe.scf.in.args=2 test.err.070425-2.inp=Fe.scf.in.args=2<o:p></o:p></li></ol>
<p class="MsoListParagraph">Running PW ...<o:p></o:p></p>
<p class="MsoListParagraph">mpirun -np 12 /hpc/data/idunn/qe/7.4.1/test-suite/..//bin/pw.x < Fe.scf.in > test.out.070425-2.inp=Fe.scf.in.args=2 2> test.err.070425-2.inp=Fe.scf.in.args=2<o:p></o:p></p>
<p class="MsoListParagraph">hp_metal_paw_magn - Fe.scf.in (arg(s): 2): **FAILED**.<o:p></o:p></p>
<p class="MsoListParagraph">n1<o:p></o:p></p>
<p class="MsoListParagraph"> ERROR: absolute error 6.00e+00 greater than 5.00e+00. (Test: 31.0. Benchmark: 25.0.)<o:p></o:p></p>
<p class="MsoListParagraph"><o:p> </o:p></p>
<ol style="margin-top:0in" start="6" type="1">
<li class="MsoListParagraph" style="margin-left:0in;mso-list:l0 level1 lfo1">/hpc/data/idunn/qe/7.4.1/test-suite/..//test-suite/run-hp.sh 4 bn.hp.in test.out.070425-2.inp=bn.hp.in.args=4 test.err.070425-2.inp=bn.hp.in.args=4<o:p></o:p></li></ol>
<p class="MsoListParagraph">Running HP ...<o:p></o:p></p>
<p class="MsoListParagraph">mpirun -np 12 /hpc/data/idunn/qe/7.4.1/test-suite/..//bin/hp.x < bn.hp.in > test.out.070425-2.inp=bn.hp.in.args=4 2> test.err.070425-2.inp=bn.hp.in.args=4<o:p></o:p></p>
<p class="MsoListParagraph">hp_soc_UV_paw_magn - bn.hp.in (arg(s): 4): **FAILED**.<o:p></o:p></p>
<p class="MsoListParagraph">v2<o:p></o:p></p>
<p class="MsoListParagraph"> ERROR: absolute error 1.37e-02 greater than 1.50e-03. (Test: -0.1254. Benchmark: -0.1117.)<o:p></o:p></p>
<p class="MsoListParagraph"> ERROR: relative error 1.23e-01 greater than 1.80e-04. (Test: -0.1254. Benchmark: -0.1117.)<o:p></o:p></p>
<p class="MsoListParagraph">v1<o:p></o:p></p>
<p class="MsoListParagraph"> ERROR: absolute error 1.72e+00 greater than 1.50e-03. (Test: 6.4294. Benchmark: 4.7069.)<o:p></o:p></p>
<p class="MsoListParagraph"> ERROR: relative error 3.66e-01 greater than 1.20e-04. (Test: 6.4294. Benchmark: 4.7069.)<o:p></o:p></p>
<p class="MsoListParagraph">u<o:p></o:p></p>
<p class="MsoListParagraph"> ERROR: absolute error 1.72e+00 greater than 1.50e-03. (Test: 6.4294. Benchmark: 4.7069.)<o:p></o:p></p>
<p class="MsoListParagraph"> ERROR: relative error 3.66e-01 greater than 1.20e-04. (Test: 6.4294. Benchmark: 4.7069.)<o:p></o:p></p>
<p class="MsoListParagraph"><o:p> </o:p></p>
<ol style="margin-top:0in" start="7" type="1">
<li class="MsoListParagraph" style="margin-left:0in;mso-list:l0 level1 lfo1">It seems all the KCW tests that need the kcw executable are failing with error messages like:<o:p></o:p></li></ol>
<p class="MsoListParagraph"><o:p> </o:p></p>
<p class="MsoListParagraph">mpirun was unable to launch the specified application as it could not access<o:p></o:p></p>
<p class="MsoListParagraph">or execute an executable:<o:p></o:p></p>
<p class="MsoListParagraph"><o:p> </o:p></p>
<p class="MsoListParagraph">Executable: /hpc/data/sm-euv_rs/idunn/qe/7.4.1/test-suite/..//bin/kcw.x<o:p></o:p></p>
<p class="MsoListParagraph">Node: pn5657<o:p></o:p></p>
<p class="MsoListParagraph"><o:p> </o:p></p>
<p class="MsoListParagraph">while attempting to start process rank 0.<o:p></o:p></p>
<p class="MsoListParagraph"><o:p> </o:p></p>
<p class="MsoListParagraph">I’m not sure why kcw.x isn’t in the bin folder.<o:p></o:p></p>
<p class="MsoListParagraph"><o:p> </o:p></p>
<p class="MsoListParagraph"><o:p> </o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal"><span style="mso-ligatures:none">Best regards,<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-ligatures:none">Ian Dunn (he/him)<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-ligatures:none">ASML Wilton MDEV Analysis Architect</span><span style="mso-ligatures:none"><o:p></o:p></span></p>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
--- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated
otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your
own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be
liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt.
</body>
</html>