[QE-users] Test-suite failures with parallel QE

Ian Dunn ian.dunn at asml.com
Tue May 6 19:37:01 CEST 2025


Hi Paolo,

I would like to optimize my QE installation by leveraging Intel compilation and MKL. I found that when I use the following configuration and compile with the Intel compiler I start getting errors again when running make run-tests. I should note that as a baseline, when I run the GCC parallel compiled program on the test-suite with a single processor I get no errors.

./configure --enable-parallel --prefix=/hpc/data/idunn/qe/7.4.1-intel --with-scalapack=intel --with-blas=intel --with-lapack=intel LAPACK_LIBS="-lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lpthread -lm -ldl"
make -j24 all
make install

The errors are:

pw_vdw - beef-spin.in: **FAILED**.
e1
    ERROR: absolute error 6.30e-02 greater than 8.00e-04. (Test: -43.532453.  Benchmark: -43.595437.)
    ERROR: relative error 1.44e-03 greater than 1.00e-04. (Test: -43.532453.  Benchmark: -43.595437.)

pw_vdw - beef.in: **FAILED**.
e1
    ERROR: absolute error 4.39e-02 greater than 8.00e-04. (Test: -16.043669.  Benchmark: -15.999767.)
    ERROR: relative error 2.74e-03 greater than 1.00e-04. (Test: -16.043669.  Benchmark: -15.999767.)
p1
    ERROR: absolute error 2.38e+01 greater than 2.00e+00. (Test: -29.09.  Benchmark: -52.85.)
eh1
    ERROR: absolute error 3.63e-02 greater than 1.00e-02. (Test: 5.6694.  Benchmark: 5.7057.)

pw_vdw - rVV10.in: **FAILED**.
e1
    ERROR: absolute error 8.28e-02 greater than 8.00e-04. (Test: -44.159775.  Benchmark: -44.242558.)
    ERROR: relative error 1.87e-03 greater than 1.00e-04. (Test: -44.159775.  Benchmark: -44.242558.)
p1
    ERROR: absolute error 1.41e+01 greater than 2.00e+00. (Test: -344.08.  Benchmark: -358.22.)
ef1
    ERROR: absolute error 1.42e-01 greater than 8.00e-02. (Test: 9.1176.  Benchmark: 8.9754.)

pw_vdw - vdW-DF3-opt1.in: **FAILED**.
e1
    ERROR: absolute error 6.23e-03 greater than 8.00e-04. (Test: -45.612504.  Benchmark: -45.618729.)
    ERROR: relative error 1.36e-04 greater than 1.00e-04. (Test: -45.612504.  Benchmark: -45.618729.)
eh1
    ERROR: absolute error 1.31e-01 greater than 1.00e-02. (Test: 6.6343.  Benchmark: 6.503.)
    ERROR: relative error 2.02e-02 greater than 1.00e-02. (Test: 6.6343.  Benchmark: 6.503.)
geom
    ERROR: absolute error 2.74e-02 greater than 5.00e-03. (Test: 2.36309.  Benchmark: 2.33567.)
geom
    ERROR: absolute error 3.09e-02 greater than 5.00e-03. (Test: 2.629017447.  Benchmark: 2.659883535.)
vol
    ERROR: absolute error 2.67e+00 greater than 1.50e-01. (Test: 227.61796.  Benchmark: 230.29032.)

pw_vdw - vdW-DF3-opt2.in: **FAILED**.
e1
    ERROR: absolute error 1.43e-02 greater than 8.00e-04. (Test: -45.61151.  Benchmark: -45.625762.)
    ERROR: relative error 3.12e-04 greater than 1.00e-04. (Test: -45.61151.  Benchmark: -45.625762.)
p1
    ERROR: absolute error 7.25e+00 greater than 2.00e+00. (Test: 9.48.  Benchmark: 16.73.)
eh1
    ERROR: absolute error 2.00e-02 greater than 1.00e-02. (Test: 6.2088.  Benchmark: 6.2288.)

pw_vdw - xdm.in: **FAILED**.
e1
    ERROR: absolute error 2.54e-02 greater than 8.00e-04. (Test: -70.766254.  Benchmark: -70.791677.)
    ERROR: relative error 3.59e-04 greater than 1.00e-04. (Test: -70.766254.  Benchmark: -70.791677.)
p1
    ERROR: absolute error 1.77e+01 greater than 2.00e+00. (Test: -2712.68.  Benchmark: -2730.42.)

Best regards,
Ian Dunn (he/him)
ASML Wilton MDEV Analysis Architect

-----Original Message-----
From: Paolo Giannozzi <paolo.giannozzi at uniud.it>
Sent: Monday, April 7, 2025 11:51 AM
To: Ian Dunn <ian.dunn at asml.com>
Cc: Quantum ESPRESSO users Forum <users at lists.quantum-espresso.org>
Subject: Re: [QE-users] Test-suite failures with parallel QE

[You don't often get email from paolo.giannozzi at uniud.it. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]

CAUTION: This message is from an external sender

On 07/04/2025 17:14, Ian Dunn via users wrote:

> Running make run-tests NPROCS=12

12 processors, for tests that are rather small, are quite a lot.

> WARNING: There was an error initializing an OpenFabrics device.

I don't think this is in any way QE-specific

> Note: The following floating-point exceptions are signalling:
> IEEE_UNDERFLOW_FLAG IEEE_DENORMAL

this is irrelevant

> _Here are the tests that are failing_:

many of them signal a different number of iterations ("n1"). This may happen and does not necessarily points to a problem. Some other failures are of unclear origin. The most puzzling one is pw_vdw/xdm.in, all others affecting rather exotic cases and not the main functionalities.

The errors in KCW testing mean that the KCW binary wasn't compiled ("make kcw") or that its compilation failed

Paolo
--
Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e Fisiche, Univ. Udine, via delle Scienze 206, 33100 Udine Italy, +39-0432-558216

--- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt.


More information about the users mailing list