[Pw_forum] GIPAW acceleration

Yasser Fowad AlWahedi yaalwahedi at pi.ac.ae
Sun Jul 16 10:26:13 CEST 2017

Dear Davide,

Thanks for your support and my apologies for the late reply.  PW and GIPAW are compiled using GNU compilers and the intel MKL libs. 

I am running DFT of Ni2P clusters of various surfaces over two computational rigs:

1) The university cluster: Each node consist of dual 8 cores/8 threads CPUs Xeons clocked at 2.2 GHz with 64 GB ram. I only use one node per simulation. For storage it uses a mechanical hard drive . (Later called C1)

2) My home pc: which is equipped with i7 5930K processor 6 cores 12 threads clocked at 3.9 GHz with 128 GB ram (Later called C2). For storage I use a Samsung 850 EVO SSD. 

Below table summarize the cases performed/running and the time of finish or expected time of finishing assuming linear extrapolation.

# of atoms	npool	Cores	# kpoints per pool	Computer	Time (hrs)
30		2	16	17			C1		28.9
38		1	16	25			C1		31.3
49		1	16	34			C1		124.9*
50		2	16	17			C1		474.6*
52		1	10	34			C2		295.2* 

* estimated time of finish

I understand that the cases are different and as such they will require more or less time to finish. 

But I noticed that the 50 and 52 cases which are quite similar (same k points and similar number of atoms) but done over two different systems attain substantially different time of finish. My guess it is probably due to the SSD being used to write off the data.  Considering that C2 uses less computational threads and more atoms but is expected to finish faster.  

I also noticed an interesting relation. GIPAW runs succeed if  number of cores (np) <= number of k points/npool. I checked this in the 38 atom case which kept failing whenever I chose a number of processors higher than the number of kpoints per pool. Although the SCF runs was finishing successfully all the time. This was also observed in other cases. Is this a general rule?

Below is the timing output of the 38 atoms case:

gipaw_setup  :      0.46s CPU      0.50s WALL (       1 calls)

     Linear response
     greenf       :  20177.91s CPU  20207.68s WALL (     600 calls)
     cgsolve      :  20057.24s CPU  20086.82s WALL (     600 calls)
     ch_psi       :  19536.93s CPU  19563.75s WALL (   44231 calls)
     h_psiq       :  13685.97s CPU  13707.40s WALL (   44231 calls)

     Apply operators
     h_psi        :  44527.30s CPU  46802.35s WALL ( 5434310 calls)
     apply_vel    :    262.98s CPU    263.30s WALL (     525 calls)

     Induced current
     j_para       :    559.19s CPU    560.39s WALL (     675 calls)
     biot_savart  :      0.05s CPU      0.06s WALL (       1 calls)

     Other routines

     General routines
     calbec       :  39849.22s CPU  37474.79s WALL (10917262 calls)
     fft          :      0.12s CPU      0.15s WALL (      42 calls)
     ffts         :      0.01s CPU      0.01s WALL (      10 calls)
     fftw         :   8220.39s CPU   9116.72s WALL (27084278 calls)
     davcio       :      0.02s CPU      1.88s WALL (     400 calls)

     Parallel routines
     fft_scatter  :   3533.10s CPU   3242.29s WALL (27084330 calls)


     GIPAW        : 112557.79s CPU 112726.12s WALL (       1 calls)


-----Original Message-----
From: pw_forum-bounces at pwscf.org [mailto:pw_forum-bounces at pwscf.org] On Behalf Of Davide Ceresoli
Sent: Thursday, July 13, 2017 8:30 PM
To: PWSCF Forum <pw_forum at pwscf.org>
Subject: Re: [Pw_forum] GIPAW acceleration

Dear Yasser,
     how many atoms? how many k-points? I/O can always be the reason, but in my experience if the system is very large, time is dominated by computation, not I/O.
You should get some speedup if diagonalization='cg' in GIPAW.

Anyway, if I have time, I will introduce a "disk_io" variable in the input file, to try to keep more data in memory instead that on disk.

Best regards,

On 07/13/2017 10:02 AM, Yasser Fowad AlWahedi wrote:
> Dear GIPAW users,
> For nmr shifts calculations, I am suffering from the extreme slowness 
> of GIPAW nmr shifts calculations.  I have noticed that GIPAW write off 
> the results frequently for restart purposes. In our clusters we have 
> mechanical hard drives which stores the off data for. Could that be a reason for its slowness?
> Yasser Al Wahedi
> Assistant Professor
> Khalifa University of Science and Technology

   Davide Ceresoli
   CNR Institute of Molecular Science and Technology (CNR-ISTM)
   c/o University of Milan, via Golgi 19, 20133 Milan, Italy
   Email: davide.ceresoli at istm.cnr.it
   Phone: +39-02-50314276, +39-347-1001570 (mobile)
   Skype: dceresoli
Pw_forum mailing list
Pw_forum at pwscf.org

More information about the users mailing list