[Pw_forum] Geometry optimization on QE530-GPU with memory allocation error?

Paolo Giannozzi p.giannozzi at gmail.com
Tue Feb 16 07:51:07 CET 2016


You do not need to update atomic coordinates: the code will read and use
the latest set of coordinates if you restart from a previous run (after a
clean stop)

Paolo

On Tue, Feb 16, 2016 at 6:39 AM, Rolly Ng <rollyng at gmail.com> wrote:

> Dear Filippo,
>
>
>
> Thanks for the quick tip.
>
>
>
> I would like to know the correct method of stop-restart a geometry
> optimization.
>
>
>
> 1)      Initially, add  max_seconds = 500000 to the &CONTROL section
>
> 2)      Add restart_mode = from_scractch to the &CONTROL section
>
> 3)      Run pw-gpu.x and wait for the run to stop after 500000 seconds
>
> 4)      Modify restart_mode = restart to the &CONTROL section
>
> 5)      Rerun pw-gpu.x and wait for the run to stop after 500000 seconds
>
>
>
> What I am not sure is the coordinates of atoms for restarting the
> calculation? Since I am doing  geometry optimization, the positions of the
> atoms does change and do I need to update the latest coordinates at the
> 500000 seconds manually? And how can I do that?
>
>
>
> Thanks,
>
> Rolly
>
>
>
> PhD, Research Fellow,
>
> Department of Physics and Materials Science,
>
> City University of Hong Kong
>
> Tel: +852 3442 4000
>
> Fax:+852 3442 0538
>
>
>
> *From:* pw_forum-bounces at pwscf.org [mailto:pw_forum-bounces at pwscf.org] *On
> Behalf Of *Filippo Spiga
> *Sent:* Tuesday, February 16, 2016 12:20 PM
> *To:* PWSCF Forum
> *Subject:* Re: [Pw_forum] Geometry optimization on QE530-GPU with memory
> allocation error?
>
>
>
> Dear Rolly,
>
>
>
> sorry to hear about your problem, I imagine the frustration of losing so
> much time and being unable to recover because of an error happened in the
> middle of a SCF step. It is hard to guess what went wrong at that point,
> especially after the calculation run continuously on multiple GPU for
> almost 7 days without stop.
>
>
>
> Just a consideration, valid with or without GPU: unless not possible,
> _never_ run continuously for so long. It is a bad idea for multiple
> reasons. Always safely checkpoit/restart your calculation more often.
>
>
>
> Cheers
>
>
>
> --
>
> Filippo SPIGA
>
> * Sent from my iPhone, sorry for typos *
>
>
> On 16 Feb 2016, at 04:01, Rolly Ng <rollyng at gmail.com> wrote:
>
> Dear Filippo and QE-GPU users,
>
>
>
> I am running a geometry optimization and the system contains 128 atoms. It
> runs fine but until the time spent reaches 590,000 seconds it stops with
> the error, and the job fails to complete L and I have this error 3 times
> for 3 different cases.
>
>
>
> “Error in memory allocation, program will be terminated (2) !!! Bye…”
>
>
>
> I can confirm the error only appear after running for more than 560,000
> seconds, so all the previous effort was wasted L if I cannot restart the
> optimization L.
>
>
>
> I have not seen such problem with QE520-GPU or may be my previous runs did
> not last for so long.
>
>
>
> Could you please check my input file? Thank you!
>
>
>
> &CONTROL
>
>                 calculation = 'relax' ,
>
>                 outdir = '/home/zgdeng/Rolly/TiNSurf200',
>
>                 pseudo_dir = '/home/zgdeng/SSSP_acc_PBE'
> ,
>
> prefix = 'TiNSurf200+Biotin',
>
>                 verbosity = 'low' ,
>
>                etot_conv_thr = 1.0D-3 ,
>
>                forc_conv_thr = 1.0D-2 ,
>
>                 nstep = 100 ,
>
>                 tstress = .false. ,
>
>                 tprnfor = .false. ,
>
> /
>
> &SYSTEM
>
>                 ibrav = 14,
>
> celldm(1) = 22.9288029598d0, celldm(2)=1.2990423130d0,
> celldm(3)=5.2512156527d0,
>
>                 celldm(4) = 0.0000000000d0, celldm(5)=0.0000000000d0,
> celldm(6)=0.0000000000d0,
>
>                 nat = 128,
>
>                 ntyp = 6,
>
>                 ecutwfc = 30d0 ,
>
>                 ecutrho = 240d0 ,
>
>                 nosym = .true. ,
>
>                 nbnd = 600,
>
>                input_dft = 'PBE' ,
>
>                 occupations = 'smearing' ,
>
>                 degauss = 0.015d0 ,
>
>                smearing = 'gaussian' ,
>
> /
>
> &ELECTRONS
>
>                 electron_maxstep = 1000,
>
>                 conv_thr = 1d-06 ,
>
>                 mixing_mode = 'local-TF' ,
>
>                 mixing_beta = 0.300d0 ,
>
>                 diagonalization = 'david' ,
>
> /
>
>   &IONS
>
>                ion_dynamics = 'bfgs' ,
>
>                upscale = 100.D0 ,
>
>                bfgs_ndim = 3 ,
>
> /
>
> ATOMIC_SPECIES
>
>                 C 12.010700d0 C_pbe_v1.2.uspp.F.UPF
>
>                 H 1.007940d0 H.pbe-rrkjus_psl.0.1.UPF
>
>                 N 14.006700d0 N.pbe.theos.UPF
>
> O 15.999400d0 O.pbe-n-kjpaw_psl.0.1.UPF
>
>                 S 32.065000d0 S_pbe_v1.2.uspp.F.UPF
>
>                 Ti 47.867000d0 ti_pbe_v1.4.uspp.F.UPF
>
> ATOMIC_POSITIONS {alat}
>
>                 Ti   0.0000000000d0   0.0000000000d0   0.1021361444d0
> 0   0   0
>
> Ti   0.1250000000d0   0.2165113823d0   0.1021361444d0   0   0   0
>
> Ti   0.0000000000d0   0.1443365914d0   0.3062508969d0   1   1   1
>
> Ti   0.1250000000d0   0.3608479737d0   0.3062508969d0   1   1   1
>
> N    0.0000000000d0   0.1443365914d0   0.0001050243d0   0   0   0
>
> N    0.1250000000d0   0.3608479737d0   0.0001050243d0   0   0   0
>
> N    0.1250000000d0   0.0721747909d0   0.2042197767d0   1   1   1
>
> N    0.0000000000d0   0.2886731828d0   0.2042197767d0   1   1   1
>
> Ti   0.2500000000d0   0.0000000000d0   0.1021361444d0   0   0   0
>
>                 Ti   0.3750000000d0   0.2165113823d0   0.1021361444d0
> 0   0   0
>
>                 Ti   0.2500000000d0   0.1443365914d0   0.3062508969d0
> 1   1   1
>
>                 Ti   0.3750000000d0   0.3608479737d0   0.3062508969d0
> 1   1   1
>
>                 N    0.2500000000d0   0.1443365914d0   0.0001050243d0
> 0   0   0
>
>                 N    0.3750000000d0   0.3608479737d0   0.0001050243d0
> 0   0   0
>
>                 N    0.3750000000d0   0.0721747909d0   0.2042197767d0
> 1   1   1
>
>                 N    0.2500000000d0   0.2886731828d0   0.2042197767d0
> 1   1   1
>
>                 Ti   0.5000000000d0   0.0000000000d0   0.1021361444d0
> 0   0   0
>
>                 Ti   0.6250000000d0   0.2165113823d0   0.1021361444d0
> 0   0   0
>
>                 Ti   0.5000000000d0   0.1443365914d0   0.3062508969d0
> 1   1   1
>
>                 Ti   0.6250000000d0   0.3608479737d0   0.3062508969d0
> 1   1   1
>
>                 N    0.5000000000d0   0.1443365914d0   0.0001050243d0
> 0   0   0
>
>                 N    0.6250000000d0   0.3608479737d0   0.0001050243d0
> 0   0   0
>
>                 N    0.6250000000d0   0.0721747909d0   0.2042197767d0
> 1   1   1
>
>                 N    0.5000000000d0   0.2886731828d0   0.2042197767d0
> 1   1   1
>
>                 Ti   0.7500000000d0   0.0000000000d0   0.1021361444d0
> 0   0   0
>
>                 Ti   0.8750000000d0   0.2165113823d0   0.1021361444d0
> 0   0   0
>
>                 Ti   0.7500000000d0   0.1443365914d0   0.3062508969d0
> 1   1   1
>
>                 Ti   0.8750000000d0   0.3608479737d0   0.3062508969d0
> 1   1   1
>
>                 N    0.7500000000d0   0.1443365914d0   0.0001050243d0
> 0   0   0
>
>                 N    0.8750000000d0   0.3608479737d0   0.0001050243d0
> 0   0   0
>
>                 N    0.8750000000d0   0.0721747909d0   0.2042197767d0
> 1   1   1
>
>                 N    0.7500000000d0   0.2886731828d0   0.2042197767d0
> 1   1   1
>
>                 Ti   0.0000000000d0   0.4330097742d0   0.1021361444d0
> 0   0   0
>
>                 Ti   0.1250000000d0   0.6495211565d0   0.1021361444d0
> 0   0   0
>
>                 Ti   0.0000000000d0   0.5773463656d0   0.3062508969d0
> 1   1   1
>
>                 Ti   0.1250000000d0   0.7938577479d0   0.3062508969d0
> 1   1   1
>
>                 N    0.0000000000d0   0.5773463656d0   0.0001050243d0
> 0   0   0
>
>                 N    0.1250000000d0   0.7938577479d0   0.0001050243d0
> 0   0   0
>
>                 N    0.1250000000d0   0.5051845651d0   0.2042197767d0
> 1   1   1
>
>                 N    0.0000000000d0   0.7216959474d0   0.2042197767d0
> 1   1   1
>
>                 Ti   0.2500000000d0   0.4330097742d0   0.1021361444d0
>   0   0   0
>
>                 Ti   0.3750000000d0   0.6495211565d0   0.1021361444d0
> 0   0   0
>
>                 Ti   0.2500000000d0   0.5773463656d0   0.3062508969d0
> 1   1   1
>
>                 Ti   0.3750000000d0   0.7938577479d0   0.3062508969d0
> 1   1   1
>
>                 N    0.2500000000d0   0.5773463656d0   0.0001050243d0
> 0   0   0
>
>                 N    0.3750000000d0   0.7938577479d0   0.0001050243d0
> 0   0   0
>
>                 N    0.3750000000d0   0.5051845651d0   0.2042197767d0
> 1   1   1
>
>                 N    0.2500000000d0   0.7216959474d0   0.2042197767d0
> 1   1   1
>
>                 Ti   0.5000000000d0   0.4330097742d0   0.1021361444d0
> 0   0   0
>
>                 Ti   0.6250000000d0   0.6495211565d0   0.1021361444d0
> 0   0   0
>
>                 Ti   0.5000000000d0   0.5773463656d0   0.3062508969d0
> 1   1   1
>
>                 Ti   0.6250000000d0   0.7938577479d0   0.3062508969d0
> 1   1   1
>
>                 N    0.5000000000d0   0.5773463656d0   0.0001050243d0
> 0   0   0
>
>                 N    0.6250000000d0   0.7938577479d0   0.0001050243d0
> 0   0   0
>
>                 N    0.6250000000d0   0.5051845651d0   0.2042197767d0
> 1   1   1
>
>                 N    0.5000000000d0   0.7216959474d0   0.2042197767d0
> 1   1   1
>
>                 Ti   0.7500000000d0   0.4330097742d0   0.1021361444d0
> 0   0   0
>
>                 Ti   0.8750000000d0   0.6495211565d0   0.1021361444d0
> 0   0   0
>
>                 Ti   0.7500000000d0   0.5773463656d0   0.3062508969d0
> 1   1   1
>
>                 Ti   0.8750000000d0   0.7938577479d0   0.3062508969d0
> 1   1   1
>
>                 N    0.7500000000d0   0.5773463656d0   0.0001050243d0
> 0   0   0
>
>                 N    0.8750000000d0   0.7938577479d0   0.0001050243d0
> 0   0   0
>
>                 N    0.8750000000d0   0.5051845651d0   0.2042197767d0
> 1   1   1
>
>                 N    0.7500000000d0   0.7216959474d0   0.2042197767d0
> 1   1   1
>
>                 Ti   0.0000000000d0   0.8660325388d0   0.1021361444d0
> 0   0   0
>
>                 Ti   0.1250000000d0   1.0825309307d0   0.1021361444d0
> 0   0   0
>
>                 Ti   0.0000000000d0   1.0103691302d0   0.3062508969d0
> 1   1   1
>
>                 Ti   0.1250000000d0   1.2268675220d0   0.3062508969d0
> 1   1   1
>
>                 N    0.0000000000d0   1.0103691302d0   0.0001050243d0
> 0   0   0
>
>                 N    0.1250000000d0   1.2268675220d0   0.0001050243d0
> 0   0   0
>
>                 N    0.1250000000d0   0.9381943393d0   0.2042197767d0
>   1   1   1
>
>                 N    0.0000000000d0   1.1547057216d0   0.2042197767d0
> 1   1   1
>
>                 Ti   0.2500000000d0   0.8660325388d0   0.1021361444d0
> 0   0   0
>
>                 Ti   0.3750000000d0   1.0825309307d0   0.1021361444d0
> 0   0   0
>
>                 Ti   0.2500000000d0   1.0103691302d0   0.3062508969d0
> 1   1   1
>
>                 Ti   0.3750000000d0   1.2268675220d0   0.3062508969d0
> 1   1   1
>
>                 N    0.2500000000d0   1.0103691302d0   0.0001050243d0
> 0   0   0
>
>                 N    0.3750000000d0   1.2268675220d0   0.0001050243d0
> 0   0   0
>
>                 N    0.3750000000d0   0.9381943393d0   0.2042197767d0
> 1   1   1
>
>                 N    0.2500000000d0   1.1547057216d0   0.2042197767d0
> 1   1   1
>
>                 Ti   0.5000000000d0   0.8660325388d0   0.1021361444d0
> 0   0   0
>
>                 Ti   0.6250000000d0   1.0825309307d0   0.1021361444d0
> 0   0   0
>
>                 Ti   0.5000000000d0   1.0103691302d0   0.3062508969d0
> 1   1   1
>
>                 Ti   0.6250000000d0   1.2268675220d0   0.3062508969d0
> 1   1   1
>
>                 N    0.5000000000d0   1.0103691302d0   0.0001050243d0
> 0   0   0
>
>                 N    0.6250000000d0   1.2268675220d0   0.0001050243d0
> 0   0   0
>
>                 N    0.6250000000d0   0.9381943393d0   0.2042197767d0
> 1   1   1
>
>                 N    0.5000000000d0   1.1547057216d0   0.2042197767d0
> 1   1   1
>
>                 Ti   0.7500000000d0   0.8660325388d0   0.1021361444d0
> 0   0   0
>
>                 Ti   0.8750000000d0   1.0825309307d0   0.1021361444d0
> 0   0   0
>
>                 Ti   0.7500000000d0   1.0103691302d0   0.3062508969d0
> 1   1   1
>
>                 Ti   0.8750000000d0   1.2268675220d0   0.3062508969d0
> 1   1   1
>
>                 N    0.7500000000d0   1.0103691302d0   0.0001050243d0
> 0   0   0
>
>                 N    0.8750000000d0   1.2268675220d0   0.0001050243d0
> 0   0   0
>
>                 N    0.8750000000d0   0.9381943393d0   0.2042197767d0
> 1   1   1
>
>                 N    0.7500000000d0   1.1547057216d0   0.2042197767d0
> 1   1   1
>
>                 N    0.4062600000d0   0.9896104340d0   0.6937906120d0
> 1   1   1
>
>                 C    0.4092000000d0   0.9020160108d0   0.6045199459d0
> 1   1   1
>
>                 C    0.4577300000d0   0.7953906178d0   0.6618107087d0
> 1   1   1
>
>                 N    0.4939900000d0   0.8337513373d0   0.7754470154d0
> 1   1   1
>
>                 C    0.4605200000d0   0.9497168446d0   0.7956116835d0
>   1   1   1
>
>                 C    0.5499000000d0   0.7467544736d0   0.5886612747d0
> 1   1   1
>
>                 S    0.5127800000d0   0.7970274111d0   0.4537050324d0
> 1   1   1
>
>                 C    0.4869600000d0   0.9325824765d0   0.5090003332d0
> 1   1   1
>
>                 C    0.5593700000d0   0.6202537332d0   0.5940700268d0
> 1   1   1
>
>                 C    0.5857900000d0   0.5794118428d0   0.7112246480d0
> 1   1   1
>
>                 C    0.5913300000d0   0.4526253131d0   0.7064460418d0
> 1   1   1
>
>                 C    0.6159700000d0   0.4036254371d0   0.8208700308d0
> 1   1   1
>
>                 C    0.6181100000d0   0.2770987158d0   0.8104726238d0
> 1   1   1
>
>                 O    0.6709500000d0   0.2080416264d0   0.8994807291d0
> 1   1   1
>
>                 O    0.5738500000d0   0.2226038907d0   0.7076538214d0
> 1   1   1
>
>                 O    0.4792600000d0   1.0152795101d0   0.8997958021d0
> 1   1   1
>
>                 H    0.3676800000d0   1.0720216783d0   0.6843909360d0
> 1   1   1
>
>                 H    0.3244700000d0   0.8813742285d0   0.5695993618d0
> 1   1   1
>
>                 H    0.3864400000d0   0.7347123514d0   0.6695825079d0
> 1   1   1
>
>                 H    0.5416000000d0   0.7826340223d0   0.8344706794d0
> 1   1   1
>
>                 H    0.6311400000d0   0.7881549521d0   0.6112940141d0
> 1   1   1
>
>                 H    0.4487000000d0   0.9936374652d0   0.4486113532d0
> 1   1   1
>
>                 H    0.5677800000d0   0.9656950650d0   0.5436058444d0
> 1   1   1
>
>                 H    0.6272000000d0   0.5918826491d0   0.5355189723d0
> 1   1   1
>
>                 H    0.4775600000d0   0.5827503816d0   0.5669737540d0
> 1   1   1
>
>                 H    0.5177200000d0   0.6062890283d0   0.7701432876d0
> 1   1   1
>
>                 H    0.6681700000d0   0.6144340236d0   0.7398437733d0
> 1   1   1
>
>                 H    0.6588100000d0   0.4267094190d0   0.6464771590d0
> 1   1   1
>
>                 H    0.5087600000d0   0.4194737533d0   0.6762515517d0
> 1   1   1
>
>                 H    0.5487800000d0   0.4294374078d0   0.8812590108d0
> 1   1   1
>
>                 H    0.6993600000d0   0.4344257303d0   0.8513270816d0
> 1   1   1
>
>                 H    0.5063400000d0   0.2734743877d0   0.6728382616d0
> 1   1   1
>
> K_POINTS {automatic}
>
>                 4 4 1 0 0 0
>
>
>
> <QE530-GPU memory error.png>
>
> _______________________________________________
> Pw_forum mailing list
> Pw_forum at pwscf.org
> http://pwscf.org/mailman/listinfo/pw_forum
>
>
> _______________________________________________
> Pw_forum mailing list
> Pw_forum at pwscf.org
> http://pwscf.org/mailman/listinfo/pw_forum
>



-- 
Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e Fisiche,
Univ. Udine, via delle Scienze 208, 33100 Udine, Italy
Phone +39-0432-558216, fax +39-0432-558222
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20160216/f84789a3/attachment.html>


More information about the users mailing list