[QE-users] [SUSPECT ATTACHMENT REMOVED] Cuda error on marconi100

Pietro Bonfa pietro.bonfa at unipr.it
Tue Aug 4 14:31:51 CEST 2020


Dear Mina,

the problems that you describe have different origins.

The first one is clearly related to the GPU implementation, and I kindly
ask you, if possible, to share QE's input and output files within an
issue on gitlab (here https://gitlab.com/QEF/q-e-gpu/-/issues ) to
investigate further.

The second problem is instead related to I/O and it's hard to understand
if the issue is related to the code or to a failure of the parallel
filesystem. By the way, I've experienced random problems with I/O on
Marconi100 as well.

Best regards,
Pietro




On 8/4/20 11:42 AM, Mina Taleblou wrote:
> Dear all,
>
> I am running a genetic algorithm from ASE (Atomic Simulation Environment
> ) on Marconi100, using quantum espresso as the calculator. The code
> (main.py) and the calculator file (local_calc.py) are attached.
> 'main.py' submits 10 jobs in parallel, and jobs are randomly stopped
> with this error:
> pw.x: cudahook.cc:649: CUresult device_free_callback(CUdeviceptr):
> Assertion `cacheNode != __null' failed.
>
> Also, other errors occur randomly as well, like:
> FIO-F-204/CLOSE/unit=4/illegal use of a read-only file.
>
> I would appreciate your help.
>
> Mina Taleblou
> Department of Nanotechnology
> University of Trieste
> --
> *Mina Taleblou*
>
> _______________________________________________
> Quantum ESPRESSO is supported by MaX (http://www.max-centre.eu/quantum-espresso
> users mailing list users at lists.quantum-espresso.org
> https://lists.quantum-espresso.org/mailman/listinfo/users
>

Firma il tuo 5 per mille all’Università di Parma e aiuta così i nostri studenti che vogliono realizzare un’esperienza di studio all’estero - Indica 00308780345 nella tua denuncia dei redditi.


More information about the users mailing list