[QE-users] D3Q code stopped due to davcio error
Lorenzo Paulatto
paulatz at gmail.com
Mon Dec 7 14:44:30 CET 2020
>
> The qq2rr code is serial right? After running in /scratch (for memory
> issue) it shows Bus error!
>
> /var/spool/slurm/slurmd.spool/job3829394/slurm_script: line 38: 179584
> Done ls anh*
> 179585 Bus error |
> /gpfs/home/kghosh/kanka/qe-6.5/bin/d3_qq2rr.x 1 1 1
> Job finished
>
> Any specific reason for this error?
It is parallelized with openmp (not MPI), although I have not tested it
in a while. I do not know what causes a bus error, it is not something I
had seen since the nineties. Maybe out of memory ? If you are running it
on a cluster, it may be better to submit it as a job even if it is serial
>
> Regards,
> Kanka
>
>
> Kanka Ghosh
> Postdoctoral Researcher
> I2M-Bordeaux
> University of Bordeaux, CNRS UMR 5295
> Site: Ecole Nationale Supérieure des Arts et Métiers
> Bordeaux-Talence 33400
>
> ------------------------------------------------------------------------
> *From: *"Lorenzo Paulatto" <paulatz at gmail.com>
> *To: *"users" <users at lists.quantum-espresso.org>
> *Sent: *Friday, December 4, 2020 1:12:09 PM
> *Subject: *Re: [QE-users] D3Q code stopped due to davcio error
>
>
>
> Yes it took little more than 5 days to compute only the first
> q-point. anyway it seems that I should use 1x1x1 grid instead of
> 2x2x2. But are you suggesting to do the single mode calculation
> with 1x1x1 grid or the "mode=full" using the 1x1x1 grid?
>
> Yes, but no need to do it: you have done it already. You can just call
> d3_qq2rr and specify "1 1 1" as the grid size:
>
> ls anh*| d3_qq2rr.x 1 1 1
>
> and it will automatically compute the force constants from the
> calculation at (0,0,0). This way you can immediately test how it works.
>
> If you want to try the 2x2x2 grid, I would use 10 pools and maybe try
> with *fewer* CPUs per pool: at the moment you are using 128 which
> requires a lot of communications. If the calculation fits in RAM, I
> would recommend keeping each pool on a single computing node.
>
> You may try to use some local scratch in order to avoid running out of
> disk space (ask the cluster managers what to use).
>
> Finally, if you manage to get everything running, you can run al the
> q-points triplet simultaneously as different batch jobs by setting
> "first" and "last". You can have the same outdir and prefix, as long
> as they work on different triplets, they will not interfere (this is
> true for d3q, but not in general for other linear response codes)
>
> hth
>
>
> Regards,
> Kanka
>
> Kanka Ghosh
> Postdoctoral Researcher
> I2M-Bordeaux
> University of Bordeaux, CNRS UMR 5295
> Site: Ecole Nationale Supérieure des Arts et Métiers
> Bordeaux-Talence 33400
>
> ------------------------------------------------------------------------
> *From: *"Lorenzo Paulatto" <paulatz at gmail.com>
> *To: *"users" <users at lists.quantum-espresso.org>
> *Sent: *Friday, December 4, 2020 9:09:50 AM
> *Subject: *Re: [QE-users] D3Q code stopped due to davcio error
>
> Thanks for pointing out the storage issue. Yes, I am running
> it at the French computing centre (University of Bordeaux's
> cluster system (curta, mcia)). Here I am attaching the d3q
> output file. Indeed, it was in the process of computing the
> second q-point triplet.
>
> I do not have access to Bordeaux cluster, but I could ask it if
> you need that I look at the code. That said, I see that to compute
> the first q-point it took about 5 days, it will take at least a
> month to do the second point ! Because it has less symmetry the
> code needs to compute 2x more k-points and 3x more perturbations.
>
>
> "Maybe for such a large system you can get some decent
> force-constants already from (0,0,0) alone"
>
>
> In that case, you mean to implement the "mode=gamma-only" tag?
>
> Not really, the triple (0,0,0) is in itself the 1x1x1 grid, and
> you can threat it as such. Thanks to some Fourier interpolation
> trickery, you can use it to get the D3 matrices at any point.
> Also, the d3_qq2rr code is not particularly optimized, and is not
> parallelized I'm not sure you would manage to compute the Fourier
> transform of the 2x2x2 grid anyway.
>
> You have to keep in mind that the 3-body force constant become
> huge very quickly with the number of atoms and the size of the
> grid: each D3 matrix has (3*nat)^3 complex elements, and a grid n
> x n x n contains n^6 power triplets
>
> In your case, the 2x2x2 grid would use about 2.2GB of RAM, which
> is probably still feasible, but i would try the 1x1x1 first.
>
>
> cheers
>
>
>
> Regards,
>
> Kanka
>
>
>
>
>
> Kanka Ghosh
> Postdoctoral Researcher
> I2M-Bordeaux
> University of Bordeaux, CNRS UMR 5295
> Site: Ecole Nationale Supérieure des Arts et Métiers
> Bordeaux-Talence 33400
>
> ------------------------------------------------------------------------
> *From: *"Lorenzo Paulatto" <paulatz at gmail.com>_
> *To: *"users" <users at lists.quantum-espresso.org>
> *Sent: *Thursday, December 3, 2020 11:23:58 PM
> *Subject: *Re: [QE-users] D3Q code stopped due to davcio error
>
> task # 71
>
> from davcio : error # 5011
> error while writing from file
> ".//D3_Q1.0_0_0_Q2.0_0_-1o2_Q3.0_0_1o2/scf.d1.dq1pq1.72"
>
>
> I guess it may have run out of space, d3q uses a ton of disk
> space and there is not easy way to avoid this. If you are
> running on any of the French computing centers I can try to
> have a look directly.
>
> I do not think the change in number of CPUs could cause this
> problem, but if you provide the full output I can check. Also,
> 44 atoms is a lot for the d3q code, it seems like you're
> running the second q-point triplet, which is of kind (0,q,-q),
> it takes much more time and disk space than the triplet
> (0,0,0). Maybe for such a large system you can get some decent
> force-constants already from (0,0,0) alone
>
> cheers
>
>
> Kanka Ghosh
> Postdoctoral Researcher
> I2M-Bordeaux
> University of Bordeaux, CNRS UMR 5295
> Site: Ecole Nationale Supérieure des Arts et Métiers
> Bordeaux-Talence 33400
>
> _______________________________________________
> Quantum ESPRESSO is supported by MaX (www.max-centre.eu)
> users mailing listusers at lists.quantum-espresso.org
> https://lists.quantum-espresso.org/mailman/listinfo/users
>
>
> _______________________________________________
> Quantum ESPRESSO is supported by MaX (www.max-centre.eu)
> users mailing list users at lists.quantum-espresso.org
> https://lists.quantum-espresso.org/mailman/listinfo/users
>
> _______________________________________________
> Quantum ESPRESSO is supported by MaX (www.max-centre.eu)
> users mailing listusers at lists.quantum-espresso.org
> https://lists.quantum-espresso.org/mailman/listinfo/users
>
>
> _______________________________________________
> Quantum ESPRESSO is supported by MaX (www.max-centre.eu)
> users mailing list users at lists.quantum-espresso.org
> https://lists.quantum-espresso.org/mailman/listinfo/users
>
> _______________________________________________
> Quantum ESPRESSO is supported by MaX (www.max-centre.eu)
> users mailing listusers at lists.quantum-espresso.org
> https://lists.quantum-espresso.org/mailman/listinfo/users
>
>
> _______________________________________________
> Quantum ESPRESSO is supported by MaX (www.max-centre.eu)
> users mailing list users at lists.quantum-espresso.org
> https://lists.quantum-espresso.org/mailman/listinfo/users
>
> _______________________________________________
> Quantum ESPRESSO is supported by MaX (www.max-centre.eu)
> users mailing list users at lists.quantum-espresso.org
> https://lists.quantum-espresso.org/mailman/listinfo/users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20201207/d017b476/attachment.html>
More information about the users
mailing list