[QE-developers] QE GPU test results

Wed Jun 21 12:03:12 CEST 2023

Dear Malgorzata,

when using GPUs, reducing tight MPI communication among tasks is 
typically very effective, and, as a rule of thumb, I would try to maximize the 
number of kpt-pools in the QE parallelism (until memory permits). Are you 
already doing this ?

ideally, one should run as

mpirun -n $ntasks pw.x -np $ntasks < filein > fileout

Then one can further fine tune, but that is a second step and more 
information about the system would be required.

best
Andrea

>  
>  Dear QE Team,
> 
> we tested v.7.2 on Athena computer with GPU
> https://www.cyfronet.pl/en/19073,artykul,athena.html
> 
> The simple case for nscf with large kmesh that runs 37 min at 96 cpu on Ares
> https://www.cyfronet.pl/en/computers/18827,artykul,ares_supercomputer.html
> 
> gives the following results on Athena:
> 
> 1-1-1-1: PWSCF : 1h 6m CPU 1h13m WALL
> 
> 1-2-1-2: PWSCF : 1h 3m CPU 1h 8m WALL
> 
> 1-4-1-4: PWSCF : 1h 1m CPU 1h 6m WALL
> 
> 1-8-1-8: PWSCF : 1h12m CPU 1h17m WALL
> 
> Where the configurations  1-2-3-4 mean:
> 1 node 2 tasks (processes MPI), 3 cpus-per-task, 4 cards GPGPU
> 
> Is it possible to get it better?
> 
> With best regards,
> Malgorzata
> 
> dr hab. Malgorzata Wierzbowska, prof. IHPP PAS
> 
> Institute of High Pressure Physics (Unipress)
> 
> Polish Academy of Sciences
> 
> Sokolowska 29/37, 01-142 Warsaw, Poland
> 
> email: malwi45 at gmail.com
> 
> 
>

-- 
Andrea Ferretti, PhD
CNR Senior Researcher
Istituto Nanoscienze, S3 Center
via Campi 213/A, 41125, Modena, Italy
Tel: +39 059 2055322;  Skype: andrea_ferretti
URL: http://www.nano.cnr.it