[Pw_forum] Geometry optimization on QE530-GPU with memory allocation error?

Rolly Ng rollyng at gmail.com
Tue Feb 16 08:01:18 CET 2016


Dear Paolo,

 

Thank you for the clarification, I will give it a trial.

 

Regards,

Rolly

 

PhD, Research Fellow,

Department of Physics and Materials Science,

City University of Hong Kong

Tel: +852 3442 4000

Fax:+852 3442 0538

 

From: pw_forum-bounces at pwscf.org [mailto:pw_forum-bounces at pwscf.org] On Behalf Of Paolo Giannozzi
Sent: Tuesday, February 16, 2016 2:51 PM
To: PWSCF Forum
Subject: Re: [Pw_forum] Geometry optimization on QE530-GPU with memory allocation error?

 

You do not need to update atomic coordinates: the code will read and use the latest set of coordinates if you restart from a previous run (after a clean stop)

Paolo

 

On Tue, Feb 16, 2016 at 6:39 AM, Rolly Ng <rollyng at gmail.com> wrote:

Dear Filippo,

 

Thanks for the quick tip.

 

I would like to know the correct method of stop-restart a geometry optimization.

 

1)      Initially, add  max_seconds = 500000 to the &CONTROL section

2)      Add restart_mode = from_scractch to the &CONTROL section

3)      Run pw-gpu.x and wait for the run to stop after 500000 seconds

4)      Modify restart_mode = restart to the &CONTROL section

5)      Rerun pw-gpu.x and wait for the run to stop after 500000 seconds

 

What I am not sure is the coordinates of atoms for restarting the calculation? Since I am doing  geometry optimization, the positions of the atoms does change and do I need to update the latest coordinates at the 500000 seconds manually? And how can I do that?

 

Thanks,

Rolly

 

PhD, Research Fellow,

Department of Physics and Materials Science,

City University of Hong Kong

Tel: +852 3442 4000 <tel:%2B852%203442%204000> 

Fax:+852 3442 0538 <tel:%2B852%203442%200538> 

 

From: pw_forum-bounces at pwscf.org [mailto:pw_forum-bounces at pwscf.org] On Behalf Of Filippo Spiga
Sent: Tuesday, February 16, 2016 12:20 PM
To: PWSCF Forum
Subject: Re: [Pw_forum] Geometry optimization on QE530-GPU with memory allocation error?

 

Dear Rolly,

 

sorry to hear about your problem, I imagine the frustration of losing so much time and being unable to recover because of an error happened in the middle of a SCF step. It is hard to guess what went wrong at that point, especially after the calculation run continuously on multiple GPU for almost 7 days without stop.

 

Just a consideration, valid with or without GPU: unless not possible, _never_ run continuously for so long. It is a bad idea for multiple reasons. Always safely checkpoit/restart your calculation more often.

 

Cheers

 

--

Filippo SPIGA

* Sent from my iPhone, sorry for typos *


On 16 Feb 2016, at 04:01, Rolly Ng <rollyng at gmail.com> wrote:

Dear Filippo and QE-GPU users,

 

I am running a geometry optimization and the system contains 128 atoms. It runs fine but until the time spent reaches 590,000 seconds it stops with the error, and the job fails to complete L and I have this error 3 times for 3 different cases.

 

“Error in memory allocation, program will be terminated (2) !!! Bye…”

 

I can confirm the error only appear after running for more than 560,000 seconds, so all the previous effort was wasted L if I cannot restart the optimization L.

 

I have not seen such problem with QE520-GPU or may be my previous runs did not last for so long.

 

Could you please check my input file? Thank you!

 

&CONTROL

                calculation = 'relax' ,

                outdir = '/home/zgdeng/Rolly/TiNSurf200',

                pseudo_dir = '/home/zgdeng/SSSP_acc_PBE' ,                                                  

prefix = 'TiNSurf200+Biotin',

                verbosity = 'low' ,

               etot_conv_thr = 1.0D-3 ,

               forc_conv_thr = 1.0D-2 ,

                nstep = 100 ,

                tstress = .false. ,

                tprnfor = .false. ,

/

&SYSTEM

                ibrav = 14,

celldm(1) = 22.9288029598d0, celldm(2)=1.2990423130d0, celldm(3)=5.2512156527d0,

                celldm(4) = 0.0000000000d0, celldm(5)=0.0000000000d0, celldm(6)=0.0000000000d0,

                nat = 128,

                ntyp = 6,

                ecutwfc = 30d0 ,

                ecutrho = 240d0 ,

                nosym = .true. ,

                nbnd = 600,

               input_dft = 'PBE' ,

                occupations = 'smearing' ,

                degauss = 0.015d0 ,

               smearing = 'gaussian' ,

/

&ELECTRONS

                electron_maxstep = 1000,

                conv_thr = 1d-06 ,

                mixing_mode = 'local-TF' ,

                mixing_beta = 0.300d0 ,

                diagonalization = 'david' ,

/

  &IONS

               ion_dynamics = 'bfgs' ,

               upscale = 100.D0 ,

               bfgs_ndim = 3 ,

/

ATOMIC_SPECIES

                C 12.010700d0 C_pbe_v1.2.uspp.F.UPF

                H 1.007940d0 H.pbe-rrkjus_psl.0.1.UPF

                N 14.006700d0 N.pbe.theos.UPF

O 15.999400d0 O.pbe-n-kjpaw_psl.0.1.UPF

                S 32.065000d0 S_pbe_v1.2.uspp.F.UPF

                Ti 47.867000d0 ti_pbe_v1.4.uspp.F.UPF

ATOMIC_POSITIONS {alat}

                Ti   0.0000000000d0   0.0000000000d0   0.1021361444d0   0   0   0

Ti   0.1250000000d0   0.2165113823d0   0.1021361444d0   0   0   0

Ti   0.0000000000d0   0.1443365914d0   0.3062508969d0   1   1   1

Ti   0.1250000000d0   0.3608479737d0   0.3062508969d0   1   1   1

N    0.0000000000d0   0.1443365914d0   0.0001050243d0   0   0   0

N    0.1250000000d0   0.3608479737d0   0.0001050243d0   0   0   0

N    0.1250000000d0   0.0721747909d0   0.2042197767d0   1   1   1

N    0.0000000000d0   0.2886731828d0   0.2042197767d0   1   1   1

Ti   0.2500000000d0   0.0000000000d0   0.1021361444d0   0   0   0

                Ti   0.3750000000d0   0.2165113823d0   0.1021361444d0   0   0   0

                Ti   0.2500000000d0   0.1443365914d0   0.3062508969d0   1   1   1

                Ti   0.3750000000d0   0.3608479737d0   0.3062508969d0   1   1   1

                N    0.2500000000d0   0.1443365914d0   0.0001050243d0   0   0   0

                N    0.3750000000d0   0.3608479737d0   0.0001050243d0   0   0   0

                N    0.3750000000d0   0.0721747909d0   0.2042197767d0   1   1   1

                N    0.2500000000d0   0.2886731828d0   0.2042197767d0   1   1   1

                Ti   0.5000000000d0   0.0000000000d0   0.1021361444d0   0   0   0

                Ti   0.6250000000d0   0.2165113823d0   0.1021361444d0   0   0   0

                Ti   0.5000000000d0   0.1443365914d0   0.3062508969d0   1   1   1

                Ti   0.6250000000d0   0.3608479737d0   0.3062508969d0   1   1   1

                N    0.5000000000d0   0.1443365914d0   0.0001050243d0   0   0   0

                N    0.6250000000d0   0.3608479737d0   0.0001050243d0   0   0   0

                N    0.6250000000d0   0.0721747909d0   0.2042197767d0   1   1   1

                N    0.5000000000d0   0.2886731828d0   0.2042197767d0   1   1   1

                Ti   0.7500000000d0   0.0000000000d0   0.1021361444d0   0   0   0

                Ti   0.8750000000d0   0.2165113823d0   0.1021361444d0   0   0   0

                Ti   0.7500000000d0   0.1443365914d0   0.3062508969d0   1   1   1

                Ti   0.8750000000d0   0.3608479737d0   0.3062508969d0   1   1   1

                N    0.7500000000d0   0.1443365914d0   0.0001050243d0   0   0   0

                N    0.8750000000d0   0.3608479737d0   0.0001050243d0   0   0   0

                N    0.8750000000d0   0.0721747909d0   0.2042197767d0   1   1   1

                N    0.7500000000d0   0.2886731828d0   0.2042197767d0   1   1   1

                Ti   0.0000000000d0   0.4330097742d0   0.1021361444d0   0   0   0

                Ti   0.1250000000d0   0.6495211565d0   0.1021361444d0   0   0   0

                Ti   0.0000000000d0   0.5773463656d0   0.3062508969d0   1   1   1

                Ti   0.1250000000d0   0.7938577479d0   0.3062508969d0   1   1   1

                N    0.0000000000d0   0.5773463656d0   0.0001050243d0   0   0   0

                N    0.1250000000d0   0.7938577479d0   0.0001050243d0   0   0   0

                N    0.1250000000d0   0.5051845651d0   0.2042197767d0   1   1   1

                N    0.0000000000d0   0.7216959474d0   0.2042197767d0   1   1   1

                Ti   0.2500000000d0   0.4330097742d0   0.1021361444d0   0   0   0

                Ti   0.3750000000d0   0.6495211565d0   0.1021361444d0   0   0   0

                Ti   0.2500000000d0   0.5773463656d0   0.3062508969d0   1   1   1

                Ti   0.3750000000d0   0.7938577479d0   0.3062508969d0   1   1   1

                N    0.2500000000d0   0.5773463656d0   0.0001050243d0   0   0   0

                N    0.3750000000d0   0.7938577479d0   0.0001050243d0   0   0   0

                N    0.3750000000d0   0.5051845651d0   0.2042197767d0   1   1   1

                N    0.2500000000d0   0.7216959474d0   0.2042197767d0   1   1   1

                Ti   0.5000000000d0   0.4330097742d0   0.1021361444d0   0   0   0

                Ti   0.6250000000d0   0.6495211565d0   0.1021361444d0   0   0   0

                Ti   0.5000000000d0   0.5773463656d0   0.3062508969d0   1   1   1

                Ti   0.6250000000d0   0.7938577479d0   0.3062508969d0   1   1   1

                N    0.5000000000d0   0.5773463656d0   0.0001050243d0   0   0   0

                N    0.6250000000d0   0.7938577479d0   0.0001050243d0   0   0   0

                N    0.6250000000d0   0.5051845651d0   0.2042197767d0   1   1   1

                N    0.5000000000d0   0.7216959474d0   0.2042197767d0   1   1   1

                Ti   0.7500000000d0   0.4330097742d0   0.1021361444d0   0   0   0

                Ti   0.8750000000d0   0.6495211565d0   0.1021361444d0   0   0   0

                Ti   0.7500000000d0   0.5773463656d0   0.3062508969d0   1   1   1

                Ti   0.8750000000d0   0.7938577479d0   0.3062508969d0   1   1   1

                N    0.7500000000d0   0.5773463656d0   0.0001050243d0   0   0   0

                N    0.8750000000d0   0.7938577479d0   0.0001050243d0   0   0   0

                N    0.8750000000d0   0.5051845651d0   0.2042197767d0   1   1   1

                N    0.7500000000d0   0.7216959474d0   0.2042197767d0   1   1   1

                Ti   0.0000000000d0   0.8660325388d0   0.1021361444d0   0   0   0

                Ti   0.1250000000d0   1.0825309307d0   0.1021361444d0   0   0   0

                Ti   0.0000000000d0   1.0103691302d0   0.3062508969d0   1   1   1

                Ti   0.1250000000d0   1.2268675220d0   0.3062508969d0   1   1   1

                N    0.0000000000d0   1.0103691302d0   0.0001050243d0   0   0   0

                N    0.1250000000d0   1.2268675220d0   0.0001050243d0   0   0   0

                N    0.1250000000d0   0.9381943393d0   0.2042197767d0   1   1   1

                N    0.0000000000d0   1.1547057216d0   0.2042197767d0   1   1   1

                Ti   0.2500000000d0   0.8660325388d0   0.1021361444d0   0   0   0

                Ti   0.3750000000d0   1.0825309307d0   0.1021361444d0   0   0   0

                Ti   0.2500000000d0   1.0103691302d0   0.3062508969d0   1   1   1

                Ti   0.3750000000d0   1.2268675220d0   0.3062508969d0   1   1   1

                N    0.2500000000d0   1.0103691302d0   0.0001050243d0   0   0   0

                N    0.3750000000d0   1.2268675220d0   0.0001050243d0   0   0   0

                N    0.3750000000d0   0.9381943393d0   0.2042197767d0   1   1   1

                N    0.2500000000d0   1.1547057216d0   0.2042197767d0   1   1   1

                Ti   0.5000000000d0   0.8660325388d0   0.1021361444d0   0   0   0

                Ti   0.6250000000d0   1.0825309307d0   0.1021361444d0   0   0   0

                Ti   0.5000000000d0   1.0103691302d0   0.3062508969d0   1   1   1

                Ti   0.6250000000d0   1.2268675220d0   0.3062508969d0   1   1   1

                N    0.5000000000d0   1.0103691302d0   0.0001050243d0   0   0   0

                N    0.6250000000d0   1.2268675220d0   0.0001050243d0   0   0   0

                N    0.6250000000d0   0.9381943393d0   0.2042197767d0   1   1   1

                N    0.5000000000d0   1.1547057216d0   0.2042197767d0   1   1   1

                Ti   0.7500000000d0   0.8660325388d0   0.1021361444d0   0   0   0

                Ti   0.8750000000d0   1.0825309307d0   0.1021361444d0   0   0   0

                Ti   0.7500000000d0   1.0103691302d0   0.3062508969d0   1   1   1

                Ti   0.8750000000d0   1.2268675220d0   0.3062508969d0   1   1   1

                N    0.7500000000d0   1.0103691302d0   0.0001050243d0   0   0   0

                N    0.8750000000d0   1.2268675220d0   0.0001050243d0   0   0   0

                N    0.8750000000d0   0.9381943393d0   0.2042197767d0   1   1   1

                N    0.7500000000d0   1.1547057216d0   0.2042197767d0   1   1   1

                N    0.4062600000d0   0.9896104340d0   0.6937906120d0   1   1   1

                C    0.4092000000d0   0.9020160108d0   0.6045199459d0   1   1   1

                C    0.4577300000d0   0.7953906178d0   0.6618107087d0   1   1   1

                N    0.4939900000d0   0.8337513373d0   0.7754470154d0   1   1   1

                C    0.4605200000d0   0.9497168446d0   0.7956116835d0   1   1   1

                C    0.5499000000d0   0.7467544736d0   0.5886612747d0   1   1   1

                S    0.5127800000d0   0.7970274111d0   0.4537050324d0   1   1   1

                C    0.4869600000d0   0.9325824765d0   0.5090003332d0   1   1   1

                C    0.5593700000d0   0.6202537332d0   0.5940700268d0   1   1   1

                C    0.5857900000d0   0.5794118428d0   0.7112246480d0   1   1   1

                C    0.5913300000d0   0.4526253131d0   0.7064460418d0   1   1   1

                C    0.6159700000d0   0.4036254371d0   0.8208700308d0   1   1   1

                C    0.6181100000d0   0.2770987158d0   0.8104726238d0   1   1   1

                O    0.6709500000d0   0.2080416264d0   0.8994807291d0   1   1   1

                O    0.5738500000d0   0.2226038907d0   0.7076538214d0   1   1   1

                O    0.4792600000d0   1.0152795101d0   0.8997958021d0   1   1   1

                H    0.3676800000d0   1.0720216783d0   0.6843909360d0   1   1   1

                H    0.3244700000d0   0.8813742285d0   0.5695993618d0   1   1   1

                H    0.3864400000d0   0.7347123514d0   0.6695825079d0   1   1   1

                H    0.5416000000d0   0.7826340223d0   0.8344706794d0   1   1   1

                H    0.6311400000d0   0.7881549521d0   0.6112940141d0   1   1   1

                H    0.4487000000d0   0.9936374652d0   0.4486113532d0   1   1   1

                H    0.5677800000d0   0.9656950650d0   0.5436058444d0   1   1   1

                H    0.6272000000d0   0.5918826491d0   0.5355189723d0   1   1   1

                H    0.4775600000d0   0.5827503816d0   0.5669737540d0   1   1   1

                H    0.5177200000d0   0.6062890283d0   0.7701432876d0   1   1   1

                H    0.6681700000d0   0.6144340236d0   0.7398437733d0   1   1   1

                H    0.6588100000d0   0.4267094190d0   0.6464771590d0   1   1   1

                H    0.5087600000d0   0.4194737533d0   0.6762515517d0   1   1   1

                H    0.5487800000d0   0.4294374078d0   0.8812590108d0   1   1   1

                H    0.6993600000d0   0.4344257303d0   0.8513270816d0   1   1   1

                H    0.5063400000d0   0.2734743877d0   0.6728382616d0   1   1   1

K_POINTS {automatic}

                4 4 1 0 0 0

 

<QE530-GPU memory error.png>

_______________________________________________
Pw_forum mailing list
Pw_forum at pwscf.org
http://pwscf.org/mailman/listinfo/pw_forum


_______________________________________________
Pw_forum mailing list
Pw_forum at pwscf.org
http://pwscf.org/mailman/listinfo/pw_forum




-- 

Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e Fisiche,
Univ. Udine, via delle Scienze 208, 33100 Udine, Italy
Phone +39-0432-558216, fax +39-0432-558222

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20160216/01d3beb8/attachment.html>


More information about the users mailing list