[Pw_forum] GIPAW acceleration

Sun Jul 16 12:06:48 CEST 2017

Thanks Davide,

I am running using the smearing option since the system is metallic. 

I also noticed an interesting relation. GIPAW runs succeed if  number of cores (np) <= number of k points/npool. I checked this in the 38 atom case which kept failing whenever I chose a number of processors higher than the number of kpoints per pool. Although the SCF runs was finishing successfully all the time. This was also observed in other cases. Is this a general rule?

I will send you the files privately. 

Yasser

-----Original Message-----
From: Davide Ceresoli [mailto:davide.ceresoli at cnr.it] 
Sent: Sunday, July 16, 2017 1:49 PM
To: Yasser Fowad AlWahedi <yaalwahedi at pi.ac.ae>; PWSCF Forum <pw_forum at pwscf.org>
Subject: Re: [Pw_forum] GIPAW acceleration

Dear Yasser,
     no problem! First of all, it seems to me that I/O is not a problem.
In fact cputime ~= walltime and davcio routines consume only 1.88 s.

I compared calculations of similar size and I've got:
     wollastonite: 30 atoms, 36 k-points: 10h40m
     coesite:      48 atoms, 32 k-points: 19h20m
on a rather old (2008) Xeon E5520 2.27 GHz, 8 cores.

My timings are more favorable than your C1 results. However, if your system is a slab, the empty space carries a non-neglibigle extra cost.
You can try to minimize it as much as possible. NMR interactions are short-ranged, contrary to electrostatic interactions

Is your system metallic? even if it has a small band gap, I suggest using occupations='smearing'. This will speed up the linear-response in GIPAW, and convergence wrt k-points.

Finally, the clock difference between the i7 (3.5 GHz) and the Xeon (2.2 GHz) can explain the difference in timing. The clock ratio is ~1.6, similar to the walltime ratio.

In any case, if you send me privately input and output files, I can look them in detail.

Best wishes,
     Davide

On 07/16/2017 10:26 AM, Yasser Fowad AlWahedi wrote:
> Dear Davide,
>
> Thanks for your support and my apologies for the late reply.  PW and GIPAW are compiled using GNU compilers and the intel MKL libs.
>
> I am running DFT of Ni2P clusters of various surfaces over two computational rigs:
>
> 1) The university cluster: Each node consist of dual 8 cores/8 threads 
> CPUs Xeons clocked at 2.2 GHz with 64 GB ram. I only use one node per 
> simulation. For storage it uses a mechanical hard drive . (Later 
> called C1)
>
> 2) My home pc: which is equipped with i7 5930K processor 6 cores 12 threads clocked at 3.9 GHz with 128 GB ram (Later called C2). For storage I use a Samsung 850 EVO SSD.
>
> Below table summarize the cases performed/running and the time of finish or expected time of finishing assuming linear extrapolation.
>
>
> # of atoms	npool	Cores	# kpoints per pool	Computer	Time (hrs)
> 30		2	16	17			C1		28.9
> 38		1	16	25			C1		31.3
> 49		1	16	34			C1		124.9*
> 50		2	16	17			C1		474.6*
> 52		1	10	34			C2		295.2*
>
> * estimated time of finish
>
> I understand that the cases are different and as such they will require more or less time to finish.
>
> But I noticed that the 50 and 52 cases which are quite similar (same k points and similar number of atoms) but done over two different systems attain substantially different time of finish. My guess it is probably due to the SSD being used to write off the data.  Considering that C2 uses less computational threads and more atoms but is expected to finish faster.
>
> I also noticed an interesting relation. GIPAW runs succeed if  number of cores (np) <= number of k points/npool. I checked this in the 38 atom case which kept failing whenever I chose a number of processors higher than the number of kpoints per pool. Although the SCF runs was finishing successfully all the time. This was also observed in other cases. Is this a general rule?
>
> Below is the timing output of the 38 atoms case:
>
> gipaw_setup  :      0.46s CPU      0.50s WALL (       1 calls)
>
>      Linear response
>      greenf       :  20177.91s CPU  20207.68s WALL (     600 calls)
>      cgsolve      :  20057.24s CPU  20086.82s WALL (     600 calls)
>      ch_psi       :  19536.93s CPU  19563.75s WALL (   44231 calls)
>      h_psiq       :  13685.97s CPU  13707.40s WALL (   44231 calls)
>
>      Apply operators
>      h_psi        :  44527.30s CPU  46802.35s WALL ( 5434310 calls)
>      apply_vel    :    262.98s CPU    263.30s WALL (     525 calls)
>
>      Induced current
>      j_para       :    559.19s CPU    560.39s WALL (     675 calls)
>      biot_savart  :      0.05s CPU      0.06s WALL (       1 calls)
>
>      Other routines
>
>      General routines
>      calbec       :  39849.22s CPU  37474.79s WALL (10917262 calls)
>      fft          :      0.12s CPU      0.15s WALL (      42 calls)
>      ffts         :      0.01s CPU      0.01s WALL (      10 calls)
>      fftw         :   8220.39s CPU   9116.72s WALL (27084278 calls)
>      davcio       :      0.02s CPU      1.88s WALL (     400 calls)
>
>      Parallel routines
>      fft_scatter  :   3533.10s CPU   3242.29s WALL (27084330 calls)
>
>      Plugins
>
>      GIPAW        : 112557.79s CPU 112726.12s WALL (       1 calls)
>
> Yasser
>
>
>
>
> -----Original Message-----
> From: pw_forum-bounces at pwscf.org [mailto:pw_forum-bounces at pwscf.org] 
> On Behalf Of Davide Ceresoli
> Sent: Thursday, July 13, 2017 8:30 PM
> To: PWSCF Forum <pw_forum at pwscf.org>
> Subject: Re: [Pw_forum] GIPAW acceleration
>
> Dear Yasser,
>      how many atoms? how many k-points? I/O can always be the reason, but in my experience if the system is very large, time is dominated by computation, not I/O.
> You should get some speedup if diagonalization='cg' in GIPAW.
>
> Anyway, if I have time, I will introduce a "disk_io" variable in the input file, to try to keep more data in memory instead that on disk.
>
> Best regards,
>      Davide
>
>
> On 07/13/2017 10:02 AM, Yasser Fowad AlWahedi wrote:
>> Dear GIPAW users,
>>
>>
>>
>> For nmr shifts calculations, I am suffering from the extreme slowness 
>> of GIPAW nmr shifts calculations.  I have noticed that GIPAW write 
>> off the results frequently for restart purposes. In our clusters we 
>> have mechanical hard drives which stores the off data for. Could that be a reason for its slowness?
>>
>>
>>
>> Yasser Al Wahedi
>>
>> Assistant Professor
>>
>> Khalifa University of Science and Technology
>>
>

-- 
+--------------------------------------------------------------+
   Davide Ceresoli
   CNR Institute of Molecular Science and Technology (CNR-ISTM)
   c/o University of Milan, via Golgi 19, 20133 Milan, Italy
   Email: davide.ceresoli at istm.cnr.it
   Phone: +39-02-50314276, +39-347-1001570 (mobile)
   Skype: dceresoli
+--------------------------------------------------------------+