[QE-users] [SUSPECT ATTACHMENT REMOVED] Cuda error on marconi100
Pietro Bonfa
pietro.bonfa at unipr.it
Tue Aug 4 14:31:51 CEST 2020
Dear Mina,
the problems that you describe have different origins.
The first one is clearly related to the GPU implementation, and I kindly
ask you, if possible, to share QE's input and output files within an
issue on gitlab (here https://gitlab.com/QEF/q-e-gpu/-/issues ) to
investigate further.
The second problem is instead related to I/O and it's hard to understand
if the issue is related to the code or to a failure of the parallel
filesystem. By the way, I've experienced random problems with I/O on
Marconi100 as well.
Best regards,
Pietro
On 8/4/20 11:42 AM, Mina Taleblou wrote:
> Dear all,
>
> I am running a genetic algorithm from ASE (Atomic Simulation Environment
> ) on Marconi100, using quantum espresso as the calculator. The code
> (main.py) and the calculator file (local_calc.py) are attached.
> 'main.py' submits 10 jobs in parallel, and jobs are randomly stopped
> with this error:
> pw.x: cudahook.cc:649: CUresult device_free_callback(CUdeviceptr):
> Assertion `cacheNode != __null' failed.
>
> Also, other errors occur randomly as well, like:
> FIO-F-204/CLOSE/unit=4/illegal use of a read-only file.
>
> I would appreciate your help.
>
> Mina Taleblou
> Department of Nanotechnology
> University of Trieste
> --
> *Mina Taleblou*
>
> _______________________________________________
> Quantum ESPRESSO is supported by MaX (http://www.max-centre.eu/quantum-espresso
> users mailing list users at lists.quantum-espresso.org
> https://lists.quantum-espresso.org/mailman/listinfo/users
>
Firma il tuo 5 per mille all’Università di Parma e aiuta così i nostri studenti che vogliono realizzare un’esperienza di studio all’estero - Indica 00308780345 nella tua denuncia dei redditi.
More information about the users
mailing list