[QE-users] pw.x stuck.

Hud Wahab hudwahab at gmail.com
Thu May 9 23:13:12 CEST 2019


I am using a single node with 64 gb memory with 32 cpus per task 

I tried running the 32 atom calc again with 1x1x1 k-points and it got stuck as well.
But 2 atoms with 8x8x1 works.

$ top, I dont see the pw.x written explicitly.

Should I try letting pw.x run longer than an hour?
On 5/9/2019 2:59:56 PM, Giuseppe Mattioli <giuseppe.mattioli at ism.cnr.it> wrote:

Hi
I don't know exactly why, because nothing seems clearly wrong in your
input file, but I see

> K_POINTS automatic
> 4 2 1 0 0 0

> total cpu time spent up to now is 1474.4 secs

Your calculation seems to be very slow, more than 20 minutes for the
first scf step, and the second step is often slower... And in your
previous calculation I saw

> K_POINTS automatic
> 8 8 1 0 0 0

that is going to be even slower. Are you sure pw.x hangs? If you type
$ top in a console (I suppose you use linux) do you see one active
pw.x process?
HTH
Giuseppe

Quoting Hud Wahab :

> Hi
>
> ok, even if I try with ibrav == 0, specify cell_parameters, and use
> cartesian coordinates for all as such:
>
> &control
> calculation = 'scf'
> prefix = 'graphene'
> pseudo_dir = '/gscratch/hwahab/DFT-code/psp/'
> outdir = './'
> restart_mode = 'from_scratch'
> etot_conv_thr = 1.d-6
> forc_conv_thr = 1.d-5
> /
>
> &system
> ibrav = 0
> nat = 48
> ntyp = 1
> ecutwfc = 80
> occupations = 'smearing'
> smearing = 'gaussian'
> degauss = 0.1
> vdw_corr='grimme-d2'
> /
>
> &electrons
> diagonalization = 'david'
> diago_thr_init = 1.d-4
> mixing_mode = 'local-TF'
> mixing_beta = 0.7
> conv_thr = 1.d-8
> /
>
> &ions
>
> /
>
> ATOMIC_SPECIES
> C 12.0107 C.pbe-n-kjpaw_psl.1.0.0.UPF
>
> ATOMIC_POSITIONS angstrom
> C -0.005594637 0.710607728 17.606217811
> C -0.007113039 2.128152101 17.584272182
> C 1.224800757 0.002792887 17.613037723
> C 1.220781849 2.833821539 17.553802163
> C 2.453370565 0.710361709 17.588200291
> C 2.452466724 2.129238290 17.551425829
> C 3.684434744 0.001530538 17.594476170
> C 3.684438920 2.839550201 17.533796214
> C 4.915544062 0.711333118 17.590036280
> C 4.915853909 2.129726547 17.558981625
> C 6.143486600 0.002227196 17.614492659
> C 6.144397239 2.838931198 17.567937591
> C -0.009386173 4.969207674 17.562655171
> C -0.007017413 6.394548846 17.533586100
> C 1.221221723 4.253471766 17.548052411
> C 1.198574699 7.115165393 17.592468497
> C 2.446729939 4.945634289 17.560357556
> C 2.429014083 6.377364466 17.843730988
> C 3.679896901 4.258356417 17.551643715
> C 3.704632453 7.115561848 17.867361404
> C 4.916336515 4.974358812 17.564482071
> C 4.935632583 6.385210475 17.597930230
> C 6.145763199 4.260558789 17.557531939
> C 6.139436097 7.104085900 17.546822753
> C -0.008057730 9.237466939 17.548881513
> C -0.005309676 10.658828114 17.540319093
> C 1.220546640 8.524488204 17.583337938
> C 1.225791434 11.368226350 17.566890173
> C 2.458810059 9.240431880 17.612862514
> C 2.454941023 10.659162890 17.592221246
> C 3.688725076 8.555854012 17.643635937
> C 3.683918063 11.368771580 17.603925777
> C 4.910904632 9.244786635 17.581661741
> C 4.913395819 10.664333371 17.573847419
> C 6.143923926 8.528771033 17.556446236
> C 6.142987294 11.370161820 17.555480374
> C -0.004309434 13.496944454 17.583281339
> C -0.004505934 14.915030012 17.613620426
> C 1.226582522 12.786684076 17.587667221
> C 1.224762481 15.623384029 17.618836753
> C 2.455170254 13.495545076 17.620810601
> C 2.455394258 14.913895553 17.626574694
> C 3.684071873 12.787455574 17.618041427
> C 3.684499812 15.622198165 17.627916140
> C 4.913552069 13.495405863 17.614905357
> C 4.913417489 14.913710743 17.621452935
> C 6.142676777 12.788081224 17.582175333
> C 6.143988489 15.623015254 17.616347215
>
>
> K_POINTS automatic
> 4 2 1 0 0 0
>
> CELL_PARAMETERS angstrom
> 7.378073983 0.000000000 0.0000000000
> 0.000000000 17.038932 0.0000000000
> 0.000000000 0.000000000 25.00
>
> I still get it stuck here in output:
>
> Estimated max dynamical RAM per process > 9360.31MB
>
> Check: negative/imaginary core charge= -0.000003 0.000000
>
> Initial potential from superposition of free atoms
> Check: negative starting charge= -0.001323
>
> starting charge 191.99800, renormalised to 192.00000
>
> negative rho (up, down): 1.323E-03 0.000E+00
> Starting wfc are 192 randomized atomic wfcs
>
> total cpu time spent up to now is 1474.4 secs
>
> per-process dynamical memory: 1195.0 Mb
>
> Self-consistent Calculation
>
> iteration # 1 ecut= 80.00 Ry beta=0.70
> Davidson diagonalization with overlap
>
>
> This code has worked previously with qe/5.4.0, but gets stuck with
> 6.1 serial.
>
> -Hud
> On 5/9/2019 1:23:39 PM, Giuseppe Mattioli
> wrote:
>
> You are mixing two different ways to indicate cell parameters
>
> ///---
> EITHER:
>
> +--------------------------------------------------------------------
> Variable: celldm(i), i=1,6
>
> Type: REAL
> See: ibrav
> Description: Crystallographic constants - see the "ibrav" variable.
> Specify either these OR
> "A","B","C","cosAB","cosBC","cosAC" NOT both.
> Only needed values (depending on "ibrav") must
> be specified
> alat = "celldm"(1) is the lattice parameter "a"
> (in BOHR)
> If "ibrav"==0, only "celldm"(1) is used if present;
> cell vectors are read from card "CELL_PARAMETERS"
> +--------------------------------------------------------------------
>
> OR:
>
> +--------------------------------------------------------------------
> Variables: A, B, C, cosAB, cosAC, cosBC
>
> Type: REAL
> See: ibrav
> Description: Traditional crystallographic constants:
>
> a,b,c in ANGSTROM
> cosAB = cosine of the angle between axis a
> and b (gamma)
> cosAC = cosine of the angle between axis a
> and c (beta)
> cosBC = cosine of the angle between axis b
> and c (alpha)
>
> The axis are chosen according to the value of
> @ref ibrav.
> Specify either these OR @ref celldm but NOT both.
> Only needed values (depending on @ref ibrav)
> must be specified.
>
> The lattice parameter alat = A (in ANGSTROM ).
>
> If @ref ibrav == 0, only A is used if present, and
> cell vectors are read from card @ref CELL_PARAMETERS.
> +--------------------------------------------------------------------
>
> \\\---
>
> This might be the cause of the strange behavior, supposing that your
> machine has the ~4GB of free RAM to perform the calculation indicated
> in the output. However, in the case of a regular hcp supercell you
> should not need at all to indicate the cosAB and cosAC values.
> HTH
> Giuseppe
>
>
> Quoting "H1 at GMAIL" :
>
>> Hi Giuseppe
>>
>> apologies. My input file:
>>
>> &control
>> calculation = 'scf'
>> prefix = 'graphene'
>> pseudo_dir = '/gscratch/hwahab/DFT-code/psp/'
>> outdir = './'
>> restart_mode = 'from_scratch'
>> etot_conv_thr = 1.d-6
>> forc_conv_thr = 1.d-5
>> /
>> &system
>> ibrav = 4
>> celldm(1) = 9.84
>> celldm(3) = 10
>> cosAB = -0.5
>> cosAC = 1
>> nat = 32
>> ntyp = 1
>> ecutwfc = 80
>> occupations = 'smearing'
>> smearing = 'gaussian'
>> degauss = 0.1
>> vdw_corr='grimme-d2'
>> /
>>
>> &electrons
>> diagonalization = 'david'
>> diago_thr_init = 1.d-4
>> mixing_mode = 'local-TF'
>> mixing_beta = 0.7
>> conv_thr = 1.d-10
>> /
>>
>> &ions
>> /
>>
>> ATOMIC_SPECIES
>> C 12.0107 C.pbe-mt_gipaw.UPF
>>
>> ATOMIC_POSITIONS crystal
>> C 0.16667 0.08333 0.00000
>> C 0.41667 0.08333 0.00000
>> C 0.66667 0.08333 0.00000
>> C 0.91667 0.08333 0.00000
>> C 0.08333 0.16667 0.00000
>> C 0.33333 0.16667 0.00000
>> C 0.58333 0.16667 0.00000
>> C 0.83333 0.16667 0.00000
>> C 0.16667 0.33333 0.00000
>> C 0.41667 0.33333 0.00000
>> C 0.66667 0.33333 0.00000
>> C 0.91667 0.33333 0.00000
>> C 0.08333 0.41667 0.00000
>> C 0.33333 0.41667 0.00000
>> C 0.58333 0.41667 0.00000
>> C 0.83333 0.41667 0.00000
>> C 0.16667 0.58333 0.00000
>> C 0.41667 0.58333 0.00000
>> C 0.66667 0.58333 0.00000
>> C 0.91667 0.58333 0.00000
>> C 0.08333 0.66667 0.00000
>> C 0.33333 0.66667 0.00000
>> C 0.58333 0.66667 0.00000
>> C 0.83333 0.66667 0.00000
>> C 0.16667 0.83333 0.00000
>> C 0.41667 0.83333 0.00000
>> C 0.66667 0.83333 0.00000
>> C 0.91667 0.83333 0.00000
>> C 0.08333 0.91667 0.00000
>> C 0.33333 0.91667 0.00000
>> C 0.58333 0.91667 0.00000
>> C 0.83333 0.91667 0.00000
>>
>>
>> K_POINTS automatic
>> 8 8 1 0 0 0
>>
>> And the end snippet of the output:
>>
>> Estimated max dynamical RAM per process > 3558.76MB
>>
>> Initial potential from superposition of free atoms
>>
>> starting charge 111.99996, renormalised to 128.00000
>>
>> negative rho (up, down): 5.479E-05 0.000E+00
>> Starting wfc are 256 randomized atomic wfcs
>>
>> There is no error outputs, it just gets stuck there..
>>
>> Hope this makes sense.
>>
>> Hud Wahab
>> University of Wyoming
>> 1000 E University Ave
>> Laramie WY, 82072
>> Email: hwahab at uwyo.edu
>> On 5/9/2019 12:50:50 PM, Giuseppe Mattioli
>> wrote:
>>
>> Dear Hud (please sign always with full name and scientific affiliation
>> the posts to this forum, we appreciate it)
>> It is impossible to help you if you don't post the input of your
>> calculation and the relevant part of your output (where does the code
>> stop?). Is there any system error like a segfault printed, e.g., in a
>> nohup.out file? It is primarily important to look into such kind of
>> information, in order to see if the error is reproducible on different
>> machines/architectures or by using different compilers/libraries.
>> HTH
>> Giuseppe
>>
>> Quoting "H1 at GMAIL" :
>>
>>> Hello
>>> I run on 6.1-serial. My pw.x scf runs ok with small size systems 2
>>> atoms, but nothing happens when scaled to 4x4 or 32 atoms. I let it
>>> run for more than an hour and don't expect that the calculation
>>> takes that long.
>>>
>>> From the troubleshooting in User Guide I see it might be a
>>> floating-point error causing endless NaNs - how to handle for such
>>> exceptions?
>>>
>>> As I can't provide the error output, I am not sure what details you
>>> need to troubleshoot, but let me know if something is missing
>>>
>>> Cheers, Hud
>>> Dept. Chemical Engineering
>>> University of Wyoming
>>
>>
>>
>> GIUSEPPE MATTIOLI
>> CNR - ISTITUTO DI STRUTTURA DELLA MATERIA
>> Via Salaria Km 29,300 - C.P. 10
>> I-00015 - Monterotondo Scalo (RM)
>> Mob (*preferred*) +39 373 7305625
>> Tel + 39 06 90672342 - Fax +39 06 90672316
>> E-mail:
>>
>> _______________________________________________
>> Quantum Espresso is supported by MaX (www.max-centre.eu/quantum-espresso)
>> users mailing list users at lists.quantum-espresso.org
>> https://lists.quantum-espresso.org/mailman/listinfo/users
>
>
>
> GIUSEPPE MATTIOLI
> CNR - ISTITUTO DI STRUTTURA DELLA MATERIA
> Via Salaria Km 29,300 - C.P. 10
> I-00015 - Monterotondo Scalo (RM)
> Mob (*preferred*) +39 373 7305625
> Tel + 39 06 90672342 - Fax +39 06 90672316
> E-mail:
>
> _______________________________________________
> Quantum Espresso is supported by MaX (www.max-centre.eu/quantum-espresso)
> users mailing list users at lists.quantum-espresso.org
> https://lists.quantum-espresso.org/mailman/listinfo/users



GIUSEPPE MATTIOLI
CNR - ISTITUTO DI STRUTTURA DELLA MATERIA
Via Salaria Km 29,300 - C.P. 10
I-00015 - Monterotondo Scalo (RM)
Mob (*preferred*) +39 373 7305625
Tel + 39 06 90672342 - Fax +39 06 90672316
E-mail:

_______________________________________________
Quantum Espresso is supported by MaX (www.max-centre.eu/quantum-espresso)
users mailing list users at lists.quantum-espresso.org
https://lists.quantum-espresso.org/mailman/listinfo/users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20190509/dc7221f6/attachment.html>


More information about the users mailing list