[QE-developers] QE GPU test results
Andrea Ferretti
andrea.ferretti at nano.cnr.it
Wed Jun 21 12:03:12 CEST 2023
Dear Malgorzata,
when using GPUs, reducing tight MPI communication among tasks is
typically very effective, and, as a rule of thumb, I would try to maximize the
number of kpt-pools in the QE parallelism (until memory permits). Are you
already doing this ?
ideally, one should run as
mpirun -n $ntasks pw.x -np $ntasks < filein > fileout
Then one can further fine tune, but that is a second step and more
information about the system would be required.
best
Andrea
>
> Dear QE Team,
>
> we tested v.7.2 on Athena computer with GPU
> https://www.cyfronet.pl/en/19073,artykul,athena.html
>
> The simple case for nscf with large kmesh that runs 37 min at 96 cpu on Ares
> https://www.cyfronet.pl/en/computers/18827,artykul,ares_supercomputer.html
>
> gives the following results on Athena:
>
> 1-1-1-1: PWSCF : 1h 6m CPU 1h13m WALL
>
> 1-2-1-2: PWSCF : 1h 3m CPU 1h 8m WALL
>
> 1-4-1-4: PWSCF : 1h 1m CPU 1h 6m WALL
>
> 1-8-1-8: PWSCF : 1h12m CPU 1h17m WALL
>
> Where the configurations 1-2-3-4 mean:
> 1 node 2 tasks (processes MPI), 3 cpus-per-task, 4 cards GPGPU
>
> Is it possible to get it better?
>
> With best regards,
> Malgorzata
>
> dr hab. Malgorzata Wierzbowska, prof. IHPP PAS
>
> Institute of High Pressure Physics (Unipress)
>
> Polish Academy of Sciences
>
> Sokolowska 29/37, 01-142 Warsaw, Poland
>
> email: malwi45 at gmail.com
>
>
>
--
Andrea Ferretti, PhD
CNR Senior Researcher
Istituto Nanoscienze, S3 Center
via Campi 213/A, 41125, Modena, Italy
Tel: +39 059 2055322; Skype: andrea_ferretti
URL: http://www.nano.cnr.it
More information about the developers
mailing list