[QE-users] D3Q code stopped due to davcio error

Lorenzo Paulatto paulatz at gmail.com
Mon Dec 7 14:44:30 CET 2020


>
> The qq2rr code is serial right?  After running in /scratch (for memory 
> issue) it shows Bus error!
>
> /var/spool/slurm/slurmd.spool/job3829394/slurm_script: line 38: 179584 
> Done                    ls anh*
>      179585 Bus error               | 
> /gpfs/home/kghosh/kanka/qe-6.5/bin/d3_qq2rr.x 1 1 1
> Job finished
>
> Any specific reason for this error?

It is parallelized with openmp (not MPI), although I have not tested it 
in a while. I do not know what causes a bus error, it is not something I 
had seen since the nineties. Maybe out of memory ? If you are running it 
on a cluster, it may be better to submit it as a job even if it is serial


>
> Regards,
> Kanka
>
>
> Kanka Ghosh
> Postdoctoral Researcher
> I2M-Bordeaux
> University of Bordeaux, CNRS UMR 5295
> Site: Ecole Nationale Supérieure des Arts et Métiers
> Bordeaux-Talence 33400
>
> ------------------------------------------------------------------------
> *From: *"Lorenzo Paulatto" <paulatz at gmail.com>
> *To: *"users" <users at lists.quantum-espresso.org>
> *Sent: *Friday, December 4, 2020 1:12:09 PM
> *Subject: *Re: [QE-users] D3Q code stopped due to davcio error
>
>
>
>     Yes it took little more than 5 days to compute only the first
>     q-point. anyway it seems that I should use 1x1x1 grid instead of
>     2x2x2. But are you suggesting to do the single mode calculation
>     with 1x1x1 grid or the "mode=full" using the 1x1x1 grid?
>
> Yes, but no need to do it: you have done it already. You can just call 
> d3_qq2rr and specify "1 1 1" as the grid size:
>
> ls anh*| d3_qq2rr.x 1 1 1
>
> and it will automatically compute the force constants from the 
> calculation at (0,0,0). This way you can immediately test how it works.
>
> If you want to try the 2x2x2 grid, I would use 10 pools and maybe try 
> with *fewer* CPUs per pool: at the moment you are using 128 which 
> requires a lot of communications. If the calculation fits in RAM, I 
> would recommend keeping each pool on a single computing node.
>
> You may try to use some local scratch in order to avoid running out of 
> disk space (ask the cluster managers what to use).
>
> Finally, if you manage to get everything running, you can run al the 
> q-points triplet simultaneously as different batch jobs by setting 
> "first" and "last". You can have the same outdir and prefix, as long 
> as they work on different triplets, they will not interfere (this is 
> true for d3q, but not in general for other linear response codes)
>
> hth
>
>
>     Regards,
>     Kanka
>
>     Kanka Ghosh
>     Postdoctoral Researcher
>     I2M-Bordeaux
>     University of Bordeaux, CNRS UMR 5295
>     Site: Ecole Nationale Supérieure des Arts et Métiers
>     Bordeaux-Talence 33400
>
>     ------------------------------------------------------------------------
>     *From: *"Lorenzo Paulatto" <paulatz at gmail.com>
>     *To: *"users" <users at lists.quantum-espresso.org>
>     *Sent: *Friday, December 4, 2020 9:09:50 AM
>     *Subject: *Re: [QE-users] D3Q code stopped due to davcio error
>
>         Thanks for pointing out the storage issue. Yes, I am running
>         it at the French computing centre (University of Bordeaux's
>         cluster system (curta, mcia)). Here I am attaching the d3q
>         output file. Indeed, it was in the process of computing the
>         second q-point triplet.
>
>     I do not have access to  Bordeaux cluster, but I could ask it if
>     you need that I look at the code. That said, I see that to compute
>     the first q-point it took about 5 days, it will take at least a
>     month to do the second point ! Because it has less symmetry the
>     code needs to compute 2x more k-points and 3x more perturbations.
>
>
>         "Maybe for such a large system you can get some decent
>         force-constants already from (0,0,0) alone"
>
>
>         In that case, you mean to implement the "mode=gamma-only" tag?
>
>     Not really, the triple (0,0,0) is in itself the 1x1x1 grid, and
>     you can threat it as such. Thanks to some Fourier interpolation
>     trickery, you can use it to get the D3 matrices at any point.
>     Also, the d3_qq2rr code is not particularly optimized, and is not
>     parallelized I'm not sure you would manage to compute the Fourier
>     transform of the 2x2x2 grid anyway.
>
>     You have to keep in mind that the 3-body force constant become
>     huge very quickly with the number of atoms and the size of the
>     grid: each D3 matrix has (3*nat)^3 complex elements, and a grid n
>     x n x n contains n^6 power triplets
>
>     In your case, the 2x2x2 grid would use about 2.2GB of RAM, which
>     is probably still feasible, but i would try the 1x1x1 first.
>
>
>     cheers
>
>
>
>         Regards,
>
>         Kanka
>
>
>
>
>
>         Kanka Ghosh
>         Postdoctoral Researcher
>         I2M-Bordeaux
>         University of Bordeaux, CNRS UMR 5295
>         Site: Ecole Nationale Supérieure des Arts et Métiers
>         Bordeaux-Talence 33400
>
>         ------------------------------------------------------------------------
>         *From: *"Lorenzo Paulatto" <paulatz at gmail.com>​_
>         *To: *"users" <users at lists.quantum-espresso.org>
>         *Sent: *Thursday, December 3, 2020 11:23:58 PM
>         *Subject: *Re: [QE-users] D3Q code stopped due to davcio error
>
>              task #        71
>
>                  from davcio : error #      5011
>                  error while writing from file
>             ".//D3_Q1.0_0_0_Q2.0_0_-1o2_Q3.0_0_1o2/scf.d1.dq1pq1.72"
>
>
>         I guess it may have run out of space, d3q uses a ton of disk
>         space and there is not easy way to avoid this. If you are
>         running on any of the French computing centers I can try to
>         have a look directly.
>
>         I do not think the change in number of CPUs could cause this
>         problem, but if you provide the full output I can check. Also,
>         44 atoms is a lot for the d3q code, it seems like you're
>         running the second q-point triplet, which is of kind (0,q,-q),
>         it takes much more time and disk space than the triplet
>         (0,0,0). Maybe for such a large system you can get some decent
>         force-constants already from (0,0,0) alone
>
>         cheers
>
>
>             Kanka Ghosh
>             Postdoctoral Researcher
>             I2M-Bordeaux
>             University of Bordeaux, CNRS UMR 5295
>             Site: Ecole Nationale Supérieure des Arts et Métiers
>             Bordeaux-Talence 33400
>
>             _______________________________________________
>             Quantum ESPRESSO is supported by MaX (www.max-centre.eu)
>             users mailing listusers at lists.quantum-espresso.org
>             https://lists.quantum-espresso.org/mailman/listinfo/users
>
>
>         _______________________________________________
>         Quantum ESPRESSO is supported by MaX (www.max-centre.eu)
>         users mailing list users at lists.quantum-espresso.org
>         https://lists.quantum-espresso.org/mailman/listinfo/users
>
>         _______________________________________________
>         Quantum ESPRESSO is supported by MaX (www.max-centre.eu)
>         users mailing listusers at lists.quantum-espresso.org
>         https://lists.quantum-espresso.org/mailman/listinfo/users
>
>
>     _______________________________________________
>     Quantum ESPRESSO is supported by MaX (www.max-centre.eu)
>     users mailing list users at lists.quantum-espresso.org
>     https://lists.quantum-espresso.org/mailman/listinfo/users
>
>     _______________________________________________
>     Quantum ESPRESSO is supported by MaX (www.max-centre.eu)
>     users mailing listusers at lists.quantum-espresso.org
>     https://lists.quantum-espresso.org/mailman/listinfo/users
>
>
> _______________________________________________
> Quantum ESPRESSO is supported by MaX (www.max-centre.eu)
> users mailing list users at lists.quantum-espresso.org
> https://lists.quantum-espresso.org/mailman/listinfo/users
>
> _______________________________________________
> Quantum ESPRESSO is supported by MaX (www.max-centre.eu)
> users mailing list users at lists.quantum-espresso.org
> https://lists.quantum-espresso.org/mailman/listinfo/users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20201207/d017b476/attachment.html>


More information about the users mailing list