[Pw_forum] Geometry optimization on QE530-GPU with memory allocation error?
Rolly Ng
rollyng at gmail.com
Tue Feb 16 06:39:38 CET 2016
Dear Filippo,
Thanks for the quick tip.
I would like to know the correct method of stop-restart a geometry optimization.
1) Initially, add max_seconds = 500000 to the &CONTROL section
2) Add restart_mode = from_scractch to the &CONTROL section
3) Run pw-gpu.x and wait for the run to stop after 500000 seconds
4) Modify restart_mode = restart to the &CONTROL section
5) Rerun pw-gpu.x and wait for the run to stop after 500000 seconds
What I am not sure is the coordinates of atoms for restarting the calculation? Since I am doing geometry optimization, the positions of the atoms does change and do I need to update the latest coordinates at the 500000 seconds manually? And how can I do that?
Thanks,
Rolly
PhD, Research Fellow,
Department of Physics and Materials Science,
City University of Hong Kong
Tel: +852 3442 4000
Fax:+852 3442 0538
From: pw_forum-bounces at pwscf.org [mailto:pw_forum-bounces at pwscf.org] On Behalf Of Filippo Spiga
Sent: Tuesday, February 16, 2016 12:20 PM
To: PWSCF Forum
Subject: Re: [Pw_forum] Geometry optimization on QE530-GPU with memory allocation error?
Dear Rolly,
sorry to hear about your problem, I imagine the frustration of losing so much time and being unable to recover because of an error happened in the middle of a SCF step. It is hard to guess what went wrong at that point, especially after the calculation run continuously on multiple GPU for almost 7 days without stop.
Just a consideration, valid with or without GPU: unless not possible, _never_ run continuously for so long. It is a bad idea for multiple reasons. Always safely checkpoit/restart your calculation more often.
Cheers
--
Filippo SPIGA
* Sent from my iPhone, sorry for typos *
On 16 Feb 2016, at 04:01, Rolly Ng <rollyng at gmail.com> wrote:
Dear Filippo and QE-GPU users,
I am running a geometry optimization and the system contains 128 atoms. It runs fine but until the time spent reaches 590,000 seconds it stops with the error, and the job fails to complete L and I have this error 3 times for 3 different cases.
“Error in memory allocation, program will be terminated (2) !!! Bye…”
I can confirm the error only appear after running for more than 560,000 seconds, so all the previous effort was wasted L if I cannot restart the optimization L.
I have not seen such problem with QE520-GPU or may be my previous runs did not last for so long.
Could you please check my input file? Thank you!
&CONTROL
calculation = 'relax' ,
outdir = '/home/zgdeng/Rolly/TiNSurf200',
pseudo_dir = '/home/zgdeng/SSSP_acc_PBE' ,
prefix = 'TiNSurf200+Biotin',
verbosity = 'low' ,
etot_conv_thr = 1.0D-3 ,
forc_conv_thr = 1.0D-2 ,
nstep = 100 ,
tstress = .false. ,
tprnfor = .false. ,
/
&SYSTEM
ibrav = 14,
celldm(1) = 22.9288029598d0, celldm(2)=1.2990423130d0, celldm(3)=5.2512156527d0,
celldm(4) = 0.0000000000d0, celldm(5)=0.0000000000d0, celldm(6)=0.0000000000d0,
nat = 128,
ntyp = 6,
ecutwfc = 30d0 ,
ecutrho = 240d0 ,
nosym = .true. ,
nbnd = 600,
input_dft = 'PBE' ,
occupations = 'smearing' ,
degauss = 0.015d0 ,
smearing = 'gaussian' ,
/
&ELECTRONS
electron_maxstep = 1000,
conv_thr = 1d-06 ,
mixing_mode = 'local-TF' ,
mixing_beta = 0.300d0 ,
diagonalization = 'david' ,
/
&IONS
ion_dynamics = 'bfgs' ,
upscale = 100.D0 ,
bfgs_ndim = 3 ,
/
ATOMIC_SPECIES
C 12.010700d0 C_pbe_v1.2.uspp.F.UPF
H 1.007940d0 H.pbe-rrkjus_psl.0.1.UPF
N 14.006700d0 N.pbe.theos.UPF
O 15.999400d0 O.pbe-n-kjpaw_psl.0.1.UPF
S 32.065000d0 S_pbe_v1.2.uspp.F.UPF
Ti 47.867000d0 ti_pbe_v1.4.uspp.F.UPF
ATOMIC_POSITIONS {alat}
Ti 0.0000000000d0 0.0000000000d0 0.1021361444d0 0 0 0
Ti 0.1250000000d0 0.2165113823d0 0.1021361444d0 0 0 0
Ti 0.0000000000d0 0.1443365914d0 0.3062508969d0 1 1 1
Ti 0.1250000000d0 0.3608479737d0 0.3062508969d0 1 1 1
N 0.0000000000d0 0.1443365914d0 0.0001050243d0 0 0 0
N 0.1250000000d0 0.3608479737d0 0.0001050243d0 0 0 0
N 0.1250000000d0 0.0721747909d0 0.2042197767d0 1 1 1
N 0.0000000000d0 0.2886731828d0 0.2042197767d0 1 1 1
Ti 0.2500000000d0 0.0000000000d0 0.1021361444d0 0 0 0
Ti 0.3750000000d0 0.2165113823d0 0.1021361444d0 0 0 0
Ti 0.2500000000d0 0.1443365914d0 0.3062508969d0 1 1 1
Ti 0.3750000000d0 0.3608479737d0 0.3062508969d0 1 1 1
N 0.2500000000d0 0.1443365914d0 0.0001050243d0 0 0 0
N 0.3750000000d0 0.3608479737d0 0.0001050243d0 0 0 0
N 0.3750000000d0 0.0721747909d0 0.2042197767d0 1 1 1
N 0.2500000000d0 0.2886731828d0 0.2042197767d0 1 1 1
Ti 0.5000000000d0 0.0000000000d0 0.1021361444d0 0 0 0
Ti 0.6250000000d0 0.2165113823d0 0.1021361444d0 0 0 0
Ti 0.5000000000d0 0.1443365914d0 0.3062508969d0 1 1 1
Ti 0.6250000000d0 0.3608479737d0 0.3062508969d0 1 1 1
N 0.5000000000d0 0.1443365914d0 0.0001050243d0 0 0 0
N 0.6250000000d0 0.3608479737d0 0.0001050243d0 0 0 0
N 0.6250000000d0 0.0721747909d0 0.2042197767d0 1 1 1
N 0.5000000000d0 0.2886731828d0 0.2042197767d0 1 1 1
Ti 0.7500000000d0 0.0000000000d0 0.1021361444d0 0 0 0
Ti 0.8750000000d0 0.2165113823d0 0.1021361444d0 0 0 0
Ti 0.7500000000d0 0.1443365914d0 0.3062508969d0 1 1 1
Ti 0.8750000000d0 0.3608479737d0 0.3062508969d0 1 1 1
N 0.7500000000d0 0.1443365914d0 0.0001050243d0 0 0 0
N 0.8750000000d0 0.3608479737d0 0.0001050243d0 0 0 0
N 0.8750000000d0 0.0721747909d0 0.2042197767d0 1 1 1
N 0.7500000000d0 0.2886731828d0 0.2042197767d0 1 1 1
Ti 0.0000000000d0 0.4330097742d0 0.1021361444d0 0 0 0
Ti 0.1250000000d0 0.6495211565d0 0.1021361444d0 0 0 0
Ti 0.0000000000d0 0.5773463656d0 0.3062508969d0 1 1 1
Ti 0.1250000000d0 0.7938577479d0 0.3062508969d0 1 1 1
N 0.0000000000d0 0.5773463656d0 0.0001050243d0 0 0 0
N 0.1250000000d0 0.7938577479d0 0.0001050243d0 0 0 0
N 0.1250000000d0 0.5051845651d0 0.2042197767d0 1 1 1
N 0.0000000000d0 0.7216959474d0 0.2042197767d0 1 1 1
Ti 0.2500000000d0 0.4330097742d0 0.1021361444d0 0 0 0
Ti 0.3750000000d0 0.6495211565d0 0.1021361444d0 0 0 0
Ti 0.2500000000d0 0.5773463656d0 0.3062508969d0 1 1 1
Ti 0.3750000000d0 0.7938577479d0 0.3062508969d0 1 1 1
N 0.2500000000d0 0.5773463656d0 0.0001050243d0 0 0 0
N 0.3750000000d0 0.7938577479d0 0.0001050243d0 0 0 0
N 0.3750000000d0 0.5051845651d0 0.2042197767d0 1 1 1
N 0.2500000000d0 0.7216959474d0 0.2042197767d0 1 1 1
Ti 0.5000000000d0 0.4330097742d0 0.1021361444d0 0 0 0
Ti 0.6250000000d0 0.6495211565d0 0.1021361444d0 0 0 0
Ti 0.5000000000d0 0.5773463656d0 0.3062508969d0 1 1 1
Ti 0.6250000000d0 0.7938577479d0 0.3062508969d0 1 1 1
N 0.5000000000d0 0.5773463656d0 0.0001050243d0 0 0 0
N 0.6250000000d0 0.7938577479d0 0.0001050243d0 0 0 0
N 0.6250000000d0 0.5051845651d0 0.2042197767d0 1 1 1
N 0.5000000000d0 0.7216959474d0 0.2042197767d0 1 1 1
Ti 0.7500000000d0 0.4330097742d0 0.1021361444d0 0 0 0
Ti 0.8750000000d0 0.6495211565d0 0.1021361444d0 0 0 0
Ti 0.7500000000d0 0.5773463656d0 0.3062508969d0 1 1 1
Ti 0.8750000000d0 0.7938577479d0 0.3062508969d0 1 1 1
N 0.7500000000d0 0.5773463656d0 0.0001050243d0 0 0 0
N 0.8750000000d0 0.7938577479d0 0.0001050243d0 0 0 0
N 0.8750000000d0 0.5051845651d0 0.2042197767d0 1 1 1
N 0.7500000000d0 0.7216959474d0 0.2042197767d0 1 1 1
Ti 0.0000000000d0 0.8660325388d0 0.1021361444d0 0 0 0
Ti 0.1250000000d0 1.0825309307d0 0.1021361444d0 0 0 0
Ti 0.0000000000d0 1.0103691302d0 0.3062508969d0 1 1 1
Ti 0.1250000000d0 1.2268675220d0 0.3062508969d0 1 1 1
N 0.0000000000d0 1.0103691302d0 0.0001050243d0 0 0 0
N 0.1250000000d0 1.2268675220d0 0.0001050243d0 0 0 0
N 0.1250000000d0 0.9381943393d0 0.2042197767d0 1 1 1
N 0.0000000000d0 1.1547057216d0 0.2042197767d0 1 1 1
Ti 0.2500000000d0 0.8660325388d0 0.1021361444d0 0 0 0
Ti 0.3750000000d0 1.0825309307d0 0.1021361444d0 0 0 0
Ti 0.2500000000d0 1.0103691302d0 0.3062508969d0 1 1 1
Ti 0.3750000000d0 1.2268675220d0 0.3062508969d0 1 1 1
N 0.2500000000d0 1.0103691302d0 0.0001050243d0 0 0 0
N 0.3750000000d0 1.2268675220d0 0.0001050243d0 0 0 0
N 0.3750000000d0 0.9381943393d0 0.2042197767d0 1 1 1
N 0.2500000000d0 1.1547057216d0 0.2042197767d0 1 1 1
Ti 0.5000000000d0 0.8660325388d0 0.1021361444d0 0 0 0
Ti 0.6250000000d0 1.0825309307d0 0.1021361444d0 0 0 0
Ti 0.5000000000d0 1.0103691302d0 0.3062508969d0 1 1 1
Ti 0.6250000000d0 1.2268675220d0 0.3062508969d0 1 1 1
N 0.5000000000d0 1.0103691302d0 0.0001050243d0 0 0 0
N 0.6250000000d0 1.2268675220d0 0.0001050243d0 0 0 0
N 0.6250000000d0 0.9381943393d0 0.2042197767d0 1 1 1
N 0.5000000000d0 1.1547057216d0 0.2042197767d0 1 1 1
Ti 0.7500000000d0 0.8660325388d0 0.1021361444d0 0 0 0
Ti 0.8750000000d0 1.0825309307d0 0.1021361444d0 0 0 0
Ti 0.7500000000d0 1.0103691302d0 0.3062508969d0 1 1 1
Ti 0.8750000000d0 1.2268675220d0 0.3062508969d0 1 1 1
N 0.7500000000d0 1.0103691302d0 0.0001050243d0 0 0 0
N 0.8750000000d0 1.2268675220d0 0.0001050243d0 0 0 0
N 0.8750000000d0 0.9381943393d0 0.2042197767d0 1 1 1
N 0.7500000000d0 1.1547057216d0 0.2042197767d0 1 1 1
N 0.4062600000d0 0.9896104340d0 0.6937906120d0 1 1 1
C 0.4092000000d0 0.9020160108d0 0.6045199459d0 1 1 1
C 0.4577300000d0 0.7953906178d0 0.6618107087d0 1 1 1
N 0.4939900000d0 0.8337513373d0 0.7754470154d0 1 1 1
C 0.4605200000d0 0.9497168446d0 0.7956116835d0 1 1 1
C 0.5499000000d0 0.7467544736d0 0.5886612747d0 1 1 1
S 0.5127800000d0 0.7970274111d0 0.4537050324d0 1 1 1
C 0.4869600000d0 0.9325824765d0 0.5090003332d0 1 1 1
C 0.5593700000d0 0.6202537332d0 0.5940700268d0 1 1 1
C 0.5857900000d0 0.5794118428d0 0.7112246480d0 1 1 1
C 0.5913300000d0 0.4526253131d0 0.7064460418d0 1 1 1
C 0.6159700000d0 0.4036254371d0 0.8208700308d0 1 1 1
C 0.6181100000d0 0.2770987158d0 0.8104726238d0 1 1 1
O 0.6709500000d0 0.2080416264d0 0.8994807291d0 1 1 1
O 0.5738500000d0 0.2226038907d0 0.7076538214d0 1 1 1
O 0.4792600000d0 1.0152795101d0 0.8997958021d0 1 1 1
H 0.3676800000d0 1.0720216783d0 0.6843909360d0 1 1 1
H 0.3244700000d0 0.8813742285d0 0.5695993618d0 1 1 1
H 0.3864400000d0 0.7347123514d0 0.6695825079d0 1 1 1
H 0.5416000000d0 0.7826340223d0 0.8344706794d0 1 1 1
H 0.6311400000d0 0.7881549521d0 0.6112940141d0 1 1 1
H 0.4487000000d0 0.9936374652d0 0.4486113532d0 1 1 1
H 0.5677800000d0 0.9656950650d0 0.5436058444d0 1 1 1
H 0.6272000000d0 0.5918826491d0 0.5355189723d0 1 1 1
H 0.4775600000d0 0.5827503816d0 0.5669737540d0 1 1 1
H 0.5177200000d0 0.6062890283d0 0.7701432876d0 1 1 1
H 0.6681700000d0 0.6144340236d0 0.7398437733d0 1 1 1
H 0.6588100000d0 0.4267094190d0 0.6464771590d0 1 1 1
H 0.5087600000d0 0.4194737533d0 0.6762515517d0 1 1 1
H 0.5487800000d0 0.4294374078d0 0.8812590108d0 1 1 1
H 0.6993600000d0 0.4344257303d0 0.8513270816d0 1 1 1
H 0.5063400000d0 0.2734743877d0 0.6728382616d0 1 1 1
K_POINTS {automatic}
4 4 1 0 0 0
<QE530-GPU memory error.png>
_______________________________________________
Pw_forum mailing list
Pw_forum at pwscf.org
http://pwscf.org/mailman/listinfo/pw_forum
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20160216/de661402/attachment.html>
More information about the users
mailing list