[Pw_forum] Effective parallel implementation?

Bang C. Huynh cbh31 at cam.ac.uk
Wed Sep 2 21:55:03 CEST 2015


 

Dear Pascal & Paolo, 

Thank you for your replies. Although it's true that I'm using USPP,
there's uranium involved and the pseudopotential file (generated from
the PSlibrary for the ld1.x atomic code) suggests a minimum value of 93
for Ecutwfc, hence the high Ecutwfc used in my input. I'm a bit hesitant
to lower this, as I'm not sure how much accuracy will be compromised. 

I'll play around with the k-grid to see if I can use a coarser one,
perhaps 2x2x1. Currently with 4x4x2 there are 18 non-equivalent k-points
in total, which I do agree are a bit much. 

If you are interested here's the output file (incomplete):
https://dl.dropboxusercontent.com/u/21657676/skutU0.125_vc.vc2.out 

Regards, 
---

BANG C. HUYNH
Peterhouse
University of Cambridge
CB2 1RD
The United Kingdom

On 02-09-2015 12:00, pw_forum-request at pwscf.org wrote: 

> Message: 6
> Date: Tue, 1 Sep 2015 22:03:49 +0200
> From: Pascal Boulet <pascal.boulet at univ-amu.fr>
> Subject: Re: [Pw_forum] Effective parallel implementation?
> To: PWSCF Forum <pw_forum at pwscf.org>
> Message-ID: <4D46F061-4B8E-46E4-988C-E3739DCBC1B0 at univ-amu.fr>
> Content-Type: text/plain; charset="windows-1252"
> 
> Hello Bang,
> 
> For comparison, in my case I have been able to run a vc-relax job on 92 atoms, 48 cores in 12 hours. I am using USPP, gamma point calculation and no symmetry. This is a slab. I am using a basic command line: mpirun -np 40 pw.x < input > output. 
> 
> Of course, I do not know if our computers are comparable, but it seems that your performance could be improved, probably through compilation optimization. I am using national supercomputer facility and QE was installed by a system manager, so I cannot give compilation details.
> 
> I have noted a few things in your input that I (personally) would change:
> Energy convergence=1d-7
> Force convergence=1d-4 eventually 1d-5 if you plan to compute phonons (in any case it should not be smaller than energy convergence)
> conv_thr=1d-8 (or smaller ? 1d-9, 1d-10 -- in case of phonon calculations only)
> It seems that your are using USPP, so in this case I think you can reduce Ecutwfc to between 30-50, but you have to test this, and Ecutrho should be between 8 and 12 times Ecutwfc.
> Check also if you can reduce the number of k-points: your cell seems to be rather large.
> 
> HTH
> Pascal
> 
> Le 1 sept. 2015 ? 21:18, Bang C. Huynh <cbh31 at cam.ac.uk> a ?crit :
> Dear all, I am currently attempting to perform several structural relaxation calculations on supercells that contain at least ~70 atoms (and even more, say hundreds). My calculations are being done on a single node with 40 cores, 2.4 GHz, Intel Xeon E5-2676v3 and 160 GiB memory (m4.10xlarge Amazon EC2). I am just wondering if my implementation for parallelism is 'reasonable' in the sense that the resources are fully utilised, and not in some way underutilised or poorly distributed. I'm pretty new to this so I'm not sure what to expect... Should I be happy with the current performance, can it be better, or should I consider deploying more resources and is it worth it? Currently one scf iteration takes around 5-7 minutes. Scf-convergence is achieved after around 50 scf iterations, and I'm not sure how long it's going to take for the vc-relax iterations to converge... The input file is shown below. I use this command to initiate the job: mpirun -np 40 pw.x -npool 2 -ndiag 36 <
skutU0.125_vc.vc2 > skutU0.125_vc.vc2.out Thank you for your help. Regards, -- Bang C. Huynh Peterhouse University of Cambridge CB2 1RD The United Kingdom ========input======= &CONTROL title = skutterudite-U-doped , calculation = 'vc-relax' , outdir = './' , wfcdir = './' , pseudo_dir = '../pseudo/' , prefix = 'skutU0.125_vc' , etot_conv_thr = 1.0D-5 , forc_conv_thr = 1.95D-6 , nstep = 250 , dt = 150 , / &SYSTEM ibrav = 6, celldm(1) = 17.06 celldm(3) = 2, nat = 66, ntyp = 3, ecutwfc = 93 , ecutrho = 707 , occupations = 'smearing' , starting_spin_angle = .false. , degauss = 0.02 , smearing = 'methfessel-paxton' , / &ELECTRONS conv_thr = 1D-10 , / &IONS / &CELL cell_dynamics = 'damp-w' , cell_dofree = 'all' , / ATOMIC_SPECIES Co 58.93000 Co.pz-nd-rrkjus.UPF Sb 121.76000 Sb.pz-bhs.UPF U 238.02891 U.pz-spfn-rrkjus_psl.1.0.0.UPF ATOMIC_POSITIONS crystal Co 0.250000000 0.250000000 0.125000000 0 0 0 Co 0.250000000 0.250000000 0.625000000 0 0 0 Co 0.750000000 0.750000000 0.375000000 0 0 0 Co
0.750000000 0.750000000 0.875000000 0 0 0 Co 0.750000000 0.750000000 0.125000000 0 0 0 Co 0.750000000 0.750000000 0.625000000 0 0 0 Co 0.250000000 0.250000000 0.375000000 0 0 0 Co 0.250000000 0.250000000 0.875000000 0 0 0 Co 0.750000000 0.250000000 0.375000000 0 0 0 Co 0.750000000 0.250000000 0.875000000 0 0 0 Co 0.250000000 0.750000000 0.125000000 0 0 0 Co 0.250000000 0.750000000 0.625000000 0 0 0 Co 0.250000000 0.750000000 0.375000000 0 0 0 Co 0.250000000 0.750000000 0.875000000 0 0 0 Co 0.750000000 0.250000000 0.125000000 0 0 0 Co 0.750000000 0.250000000 0.625000000 0 0 0 Sb 0.000000000 0.337592989 0.078553498 Sb 0.000000000 0.337592989 0.578553498 Sb 0.000000000 0.662407041 0.421446502 Sb 0.000000000 0.662407041 0.921446502 Sb 0.000000000 0.662407041 0.078553498 Sb 0.000000000 0.662407041 0.578553498 Sb 0.000000000 0.337592989 0.421446502 Sb 0.000000000 0.337592989 0.921446502 Sb 0.157106996 0.000000000 0.168796495 Sb 0.157106996 0.000000000 0.668796480 Sb 0.842893004 0.000000000
0.331203520 Sb 0.842893004 0.000000000 0.831203520 Sb 0.157106996 0.000000000 0.331203520 Sb 0.157106996 0.000000000 0.831203520 Sb 0.842893004 0.000000000 0.168796495 Sb 0.842893004 0.000000000 0.668796480 Sb 0.337592989 0.157106996 0.000000000 Sb 0.337592989 0.157106996 0.500000000 Sb 0.662407041 0.842893004 0.000000000 Sb 0.662407041 0.842893004 0.500000000 Sb 0.662407041 0.157106996 0.000000000 Sb 0.662407041 0.157106996 0.500000000 Sb 0.337592989 0.842893004 0.000000000 Sb 0.337592989 0.842893004 0.500000000 Sb 0.500000000 0.837592959 0.328553498 Sb 0.500000000 0.837592959 0.828553498 Sb 0.500000000 0.162407011 0.171446502 Sb 0.500000000 0.162407011 0.671446502 Sb 0.500000000 0.162407011 0.328553498 Sb 0.500000000 0.162407011 0.828553498 Sb 0.500000000 0.837592959 0.171446502 Sb 0.500000000 0.837592959 0.671446502 Sb 0.657106996 0.500000000 0.418796480 Sb 0.657106996 0.500000000 0.918796480 Sb 0.342893004 0.500000000 0.081203505 Sb 0.342893004 0.500000000 0.581203520 Sb
0.657106996 0.500000000 0.081203505 Sb 0.657106996 0.500000000 0.581203520 Sb 0.342893004 0.500000000 0.418796480 Sb 0.342893004 0.500000000 0.918796480 Sb 0.837592959 0.657106996 0.250000000 Sb 0.837592959 0.657106996 0.750000000 Sb 0.162407011 0.342893004 0.250000000 Sb 0.162407011 0.342893004 0.750000000 Sb 0.162407011 0.657106996 0.250000000 Sb 0.162407011 0.657106996 0.750000000 Sb 0.837592959 0.342893004 0.250000000 Sb 0.837592959 0.342893004 0.750000000 U 0.000000000 0.000000000 0.000000000 U 0.000000000 0.000000000 0.500000000 K_POINTS automatic 4 4 2 0 0 0 _______________________________________________ Pw_forum mailing list Pw_forum at pwscf.org http://pwscf.org/mailman/listinfo/pw_forum [1]

--
Pascal Boulet - MCF HDR, Resp. L1 MPCI - DEPARTEMENT CHIMIE
Aix-Marseille Universit? - ST JEROME - Avenue Escadrille Normandie
Niemen - 13013 Marseille
T?l: +33(0)4 13 55 18 10 - Fax : +33(0)4 13 55 18 50
Site : http://allos.up.univ-mrs.fr/pascal [2] - Email :
pascal.boulet at univ-amu.fr
Afin de respecter l'environnement, merci de n'imprimer cet email que si
n?cessaire.

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://pwscf.org/pipermail/pw_forum/attachments/20150901/e196c6be/attachment-0001.html
[3] 

------------------------------

Message: 7
Date: Tue, 1 Sep 2015 22:41:53 +0200
From: Paolo Giannozzi <p.giannozzi at gmail.com>
Subject: Re: [Pw_forum] Effective parallel implementation?
To: PWSCF Forum <pw_forum at pwscf.org>
Message-ID:
 <CAPMgbCuVHov+w+7sG=iYNZ7Lff441P+uRr5nm5nQeR6Mm7g_Pg at mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

On Tue, Sep 1, 2015 at 9:18 PM, Bang C. Huynh <cbh31 at cam.ac.uk> wrote:

> The input file is shown below

performances are better estimated from the output, rather than the
input.
What is useful in particular is the final printout with timings. It is
sufficient to do it for a single scf step.

Paolo
-- 
Paolo Giannozzi, Dept. Chemistry&Physics&Environment,
Univ. Udine, via delle Scienze 208, 33100 Udine, Italy
Phone +39-0432-558216, fax +39-0432-558222

 

Links:
------
[1] http://pwscf.org/mailman/listinfo/pw_forum
[2] http://allos.up.univ-mrs.fr/pascal
[3]
http://pwscf.org/pipermail/pw_forum/attachments/20150901/e196c6be/attachment-0001.html
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20150902/e6cefc12/attachment.html>


More information about the users mailing list