[Pw_forum] Effective parallel implementation?

Paolo Giannozzi p.giannozzi at gmail.com
Wed Sep 2 22:35:10 CEST 2015


In my opinion 93 Ry is a lot. You may use a lower cutoff for optimizing the
structure, refining it later (discovering maybe that very little changes).
Also note that convergence thresholds  etot_conv_thr = 1.0D-5 ,
forc_conv_thr = 1.95D-6 are very strict (too much in my opinion: the code
will go on forever performing tiny steps very close to convergence). Also
scf convergence threshold conv_thr = 1D-10 is too strict in my opinion: the
code reduces it anyway while approaching convergence.

For parallelism: make tests with a single scf iteration, experiment with
different parallelization levels, and do not assume that using all
available processors is always the best choice. Mixed MPI-OpenMP
parallelization might work better than an all-MPI one..

Paolo

On Wed, Sep 2, 2015 at 9:55 PM, Bang C. Huynh <cbh31 at cam.ac.uk> wrote:

> Dear Pascal & Paolo,
>
> Thank you for your replies. Although it's true that I'm using USPP,
> there's uranium involved and the pseudopotential file (generated from the PSlibrary
> for the ld1.x atomic code) suggests a minimum value of 93 for Ecutwfc,
> hence the high Ecutwfc used in my input. I'm a bit hesitant to lower this,
> as I'm not sure how much accuracy will be compromised.
>
> I'll play around with the k-grid to see if I can use a coarser one,
> perhaps 2x2x1. Currently with 4x4x2 there are 18 non-equivalent k-points in
> total, which I do agree are a bit much.
>
> If you are interested here's the output file (incomplete):
> https://dl.dropboxusercontent.com/u/21657676/skutU0.125_vc.vc2.out
>
>
>
> Regards,
> ---
>
>
> *Bang C. Huynh*Peterhouse
> University of Cambridge
> CB2 1RD
> The United Kingdom
>
> On 02-09-2015 12:00, pw_forum-request at pwscf.org wrote:
>
>
>
> Message: 6
> Date: Tue, 1 Sep 2015 22:03:49 +0200
> From: Pascal Boulet <pascal.boulet at univ-amu.fr>
> Subject: Re: [Pw_forum] Effective parallel implementation?
> To: PWSCF Forum <pw_forum at pwscf.org>
> Message-ID: <4D46F061-4B8E-46E4-988C-E3739DCBC1B0 at univ-amu.fr>
> Content-Type: text/plain; charset="windows-1252"
>
> Hello Bang,
>
> For comparison, in my case I have been able to run a vc-relax job on 92 atoms, 48 cores in 12 hours. I am using USPP, gamma point calculation and no symmetry. This is a slab. I am using a basic command line: mpirun -np 40 pw.x < input > output.
>
> Of course, I do not know if our computers are comparable, but it seems that your performance could be improved, probably through compilation optimization. I am using national supercomputer facility and QE was installed by a system manager, so I cannot give compilation details.
>
> I have noted a few things in your input that I (personally) would change:
> Energy convergence=1d-7
> Force convergence=1d-4 eventually 1d-5 if you plan to compute phonons (in any case it should not be smaller than energy convergence)
> conv_thr=1d-8 (or smaller ? 1d-9, 1d-10 -- in case of phonon calculations only)
> It seems that your are using USPP, so in this case I think you can reduce Ecutwfc to between 30-50, but you have to test this, and Ecutrho should be between 8 and 12 times Ecutwfc.
> Check also if you can reduce the number of k-points: your cell seems to be rather large.
>
> HTH
> Pascal
>
>
> Le 1 sept. 2015 ? 21:18, Bang C. Huynh <cbh31 at cam.ac.uk> a ?crit :
>
> Dear all, I am currently attempting to perform several structural
> relaxation calculations on supercells that contain at least ~70 atoms (and
> even more, say hundreds). My calculations are being done on a single node
> with 40 cores, 2.4 GHz, Intel Xeon E5-2676v3 and 160 GiB memory
> (m4.10xlarge Amazon EC2). I am just wondering if my implementation for
> parallelism is 'reasonable' in the sense that the resources are fully
> utilised, and not in some way underutilised or poorly distributed. I'm
> pretty new to this so I'm not sure what to expect... Should I be happy with
> the current performance, can it be better, or should I consider deploying
> more resources and is it worth it? Currently one scf iteration takes around
> 5-7 minutes. Scf-convergence is achieved after around 50 scf iterations,
> and I'm not sure how long it's going to take for the vc-relax iterations to
> converge... The input file is shown below. I use this command to initiate
> the job:
>
> mpirun -np 40 pw.x -npool 2 -ndiag 36 < skutU0.125_vc.vc2 >
> skutU0.125_vc.vc2.out
>
> Thank you for your help. Regards, -- Bang C. Huynh Peterhouse University
> of Cambridge CB2 1RD The United Kingdom ========input=======
>
> &CONTROL title = skutterudite-U-doped , calculation = 'vc-relax' , outdir
> = './' , wfcdir = './' , pseudo_dir = '../pseudo/' , prefix =
> 'skutU0.125_vc' , etot_conv_thr = 1.0D-5 , forc_conv_thr = 1.95D-6 , nstep
> = 250 , dt = 150 , / &SYSTEM ibrav = 6, celldm(1) = 17.06 celldm(3) = 2,
> nat = 66, ntyp = 3, ecutwfc = 93 , ecutrho = 707 , occupations = 'smearing'
> , starting_spin_angle = .false. , degauss = 0.02 , smearing =
> 'methfessel-paxton' , / &ELECTRONS conv_thr = 1D-10 , / &IONS / &CELL
> cell_dynamics = 'damp-w' , cell_dofree = 'all' , / ATOMIC_SPECIES Co
> 58.93000 Co.pz-nd-rrkjus.UPF Sb 121.76000 Sb.pz-bhs.UPF U 238.02891
> U.pz-spfn-rrkjus_psl.1.0.0.UPF ATOMIC_POSITIONS crystal Co 0.250000000
> 0.250000000 0.125000000 0 0 0 Co 0.250000000 0.250000000 0.625000000 0 0 0
> Co 0.750000000 0.750000000 0.375000000 0 0 0 Co 0.750000000 0.750000000
> 0.875000000 0 0 0 Co 0.750000000 0.750000000 0.125000000 0 0 0 Co
> 0.750000000 0.750000000 0.625000000 0 0 0 Co 0.250000000 0.250000000
> 0.375000000 0 0 0 Co 0.250000000 0.250000000 0.875000000 0 0 0 Co
> 0.750000000 0.250000000 0.375000000 0 0 0 Co 0.750000000 0.250000000
> 0.875000000 0 0 0 Co 0.250000000 0.750000000 0.125000000 0 0 0 Co
> 0.250000000 0.750000000 0.625000000 0 0 0 Co 0.250000000 0.750000000
> 0.375000000 0 0 0 Co 0.250000000 0.750000000 0.875000000 0 0 0 Co
> 0.750000000 0.250000000 0.125000000 0 0 0 Co 0.750000000 0.250000000
> 0.625000000 0 0 0 Sb 0.000000000 0.337592989 0.078553498 Sb 0.000000000
> 0.337592989 0.578553498 Sb 0.000000000 0.662407041 0.421446502 Sb
> 0.000000000 0.662407041 0.921446502 Sb 0.000000000 0.662407041 0.078553498
> Sb 0.000000000 0.662407041 0.578553498 Sb 0.000000000 0.337592989
> 0.421446502 Sb 0.000000000 0.337592989 0.921446502 Sb 0.157106996
> 0.000000000 0.168796495 Sb 0.157106996 0.000000000 0.668796480 Sb
> 0.842893004 0.000000000 0.331203520 Sb 0.842893004 0.000000000 0.831203520
> Sb 0.157106996 0.000000000 0.331203520 Sb 0.157106996 0.000000000
> 0.831203520 Sb 0.842893004 0.000000000 0.168796495 Sb 0.842893004
> 0.000000000 0.668796480 Sb 0.337592989 0.157106996 0.000000000 Sb
> 0.337592989 0.157106996 0.500000000 Sb 0.662407041 0.842893004 0.000000000
> Sb 0.662407041 0.842893004 0.500000000 Sb 0.662407041 0.157106996
> 0.000000000 Sb 0.662407041 0.157106996 0.500000000 Sb 0.337592989
> 0.842893004 0.000000000 Sb 0.337592989 0.842893004 0.500000000 Sb
> 0.500000000 0.837592959 0.328553498 Sb 0.500000000 0.837592959 0.828553498
> Sb 0.500000000 0.162407011 0.171446502 Sb 0.500000000 0.162407011
> 0.671446502 Sb 0.500000000 0.162407011 0.328553498 Sb 0.500000000
> 0.162407011 0.828553498 Sb 0.500000000 0.837592959 0.171446502 Sb
> 0.500000000 0.837592959 0.671446502 Sb 0.657106996 0.500000000 0.418796480
> Sb 0.657106996 0.500000000 0.918796480 Sb 0.342893004 0.500000000
> 0.081203505 Sb 0.342893004 0.500000000 0.581203520 Sb 0.657106996
> 0.500000000 0.081203505 Sb 0.657106996 0.500000000 0.581203520 Sb
> 0.342893004 0.500000000 0.418796480 Sb 0.342893004 0.500000000 0.918796480
> Sb 0.837592959 0.657106996 0.250000000 Sb 0.837592959 0.657106996
> 0.750000000 Sb 0.162407011 0.342893004 0.250000000 Sb 0.162407011
> 0.342893004 0.750000000 Sb 0.162407011 0.657106996 0.250000000 Sb
> 0.162407011 0.657106996 0.750000000 Sb 0.837592959 0.342893004 0.250000000
> Sb 0.837592959 0.342893004 0.750000000 U 0.000000000 0.000000000
> 0.000000000 U 0.000000000 0.000000000 0.500000000 K_POINTS automatic 4 4 2
> 0 0 0
>
> _______________________________________________ Pw_forum mailing list
> Pw_forum at pwscf.org http://pwscf.org/mailman/listinfo/pw_forum
>
> --
> Pascal Boulet - MCF HDR, Resp. L1 MPCI - DEPARTEMENT CHIMIE
> Aix-Marseille Universit? - ST JEROME - Avenue Escadrille Normandie Niemen - 13013 Marseille
> T?l: +33(0)4 13 55 18 10 - Fax : +33(0)4 13 55 18 50
> Site : http://allos.up.univ-mrs.fr/pascal - Email : pascal.boulet at univ-amu.fr
> Afin de respecter l'environnement, merci de n'imprimer cet email que si n?cessaire.
>
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: http://pwscf.org/pipermail/pw_forum/attachments/20150901/e196c6be/attachment-0001.html
>
> ------------------------------
>
> Message: 7
> Date: Tue, 1 Sep 2015 22:41:53 +0200
> From: Paolo Giannozzi <p.giannozzi at gmail.com>
> Subject: Re: [Pw_forum] Effective parallel implementation?
> To: PWSCF Forum <pw_forum at pwscf.org>
> Message-ID:
> 	<CAPMgbCuVHov+w+7sG=iYNZ7Lff441P+uRr5nm5nQeR6Mm7g_Pg at mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> On Tue, Sep 1, 2015 at 9:18 PM, Bang C. Huynh <cbh31 at cam.ac.uk> wrote:
>
> The input file is shown below
>
> performances are better estimated from the output, rather than the input.
> What is useful in particular is the final printout with timings. It is
> sufficient to do it for a single scf step.
>
> Paolo
> --
> Paolo Giannozzi, Dept. Chemistry&Physics&Environment,
> Univ. Udine, via delle Scienze 208, 33100 Udine, Italy
> Phone +39-0432-558216, fax +39-0432-558222
>
>
> _______________________________________________
> Pw_forum mailing list
> Pw_forum at pwscf.org
> http://pwscf.org/mailman/listinfo/pw_forum
>



-- 
Paolo Giannozzi, Dept. Chemistry&Physics&Environment,
Univ. Udine, via delle Scienze 208, 33100 Udine, Italy
Phone +39-0432-558216, fax +39-0432-558222
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20150902/398225c2/attachment.html>


More information about the users mailing list