[QE-users] D3Q code stopped due to davcio error

Kanka Ghosh kanka.ghosh at u-bordeaux.fr
Mon Dec 7 12:24:41 CET 2020


Dear Lorenzo, 

The qq2rr code is serial right? After running in /scratch (for memory issue) it shows Bus error! 

/var/spool/slurm/slurmd.spool/job3829394/slurm_script: line 38: 179584 Done ls anh* 
179585 Bus error | /gpfs/home/kghosh/kanka/qe-6.5/bin/d3_qq2rr.x 1 1 1 
Job finished 

Any specific reason for this error? 

Regards, 
Kanka 


Kanka Ghosh 
Postdoctoral Researcher 
I2M-Bordeaux 
University of Bordeaux, CNRS UMR 5295 
Site: Ecole Nationale Supérieure des Arts et Métiers 
Bordeaux-Talence 33400 


From: "Lorenzo Paulatto" <paulatz at gmail.com> 
To: "users" <users at lists.quantum-espresso.org> 
Sent: Friday, December 4, 2020 1:12:09 PM 
Subject: Re: [QE-users] D3Q code stopped due to davcio error 






Yes it took little more than 5 days to compute only the first q-point. anyway it seems that I should use 1x1x1 grid instead of 2x2x2. But are you suggesting to do the single mode calculation with 1x1x1 grid or the "mode=full" using the 1x1x1 grid? 




Yes, but no need to do it: you have done it already. You can just call d3_qq2rr and specify "1 1 1" as the grid size: 

ls anh*| d3_qq2rr.x 1 1 1 

and it will automatically compute the force constants from the calculation at (0,0,0). This way you can immediately test how it works. 


If you want to try the 2x2x2 grid, I would use 10 pools and maybe try with *fewer* CPUs per pool: at the moment you are using 128 which requires a lot of communications. If the calculation fits in RAM, I would recommend keeping each pool on a single computing node. 

You may try to use some local scratch in order to avoid running out of disk space (ask the cluster managers what to use). 


Finally, if you manage to get everything running, you can run al the q-points triplet simultaneously as different batch jobs by setting "first" and "last". You can have the same outdir and prefix, as long as they work on different triplets, they will not interfere (this is true for d3q, but not in general for other linear response codes) 


hth 

BQ_BEGIN


Regards, 
Kanka 

Kanka Ghosh 
Postdoctoral Researcher 
I2M-Bordeaux 
University of Bordeaux, CNRS UMR 5295 
Site: Ecole Nationale Supérieure des Arts et Métiers 
Bordeaux-Talence 33400 


From: "Lorenzo Paulatto" [ mailto:paulatz at gmail.com | <paulatz at gmail.com> ] 
To: "users" [ mailto:users at lists.quantum-espresso.org | <users at lists.quantum-espresso.org> ] 
Sent: Friday, December 4, 2020 9:09:50 AM 
Subject: Re: [QE-users] D3Q code stopped due to davcio error 


BQ_BEGIN

Thanks for pointing out the storage issue. Yes, I am running it at the French computing centre (University of Bordeaux's cluster system (curta, mcia)). Here I am attaching the d3q output file. Indeed, it was in the process of computing the second q-point triplet. 

BQ_END


I do not have access to Bordeaux cluster, but I could ask it if you need that I look at the code. That said, I see that to compute the first q-point it took about 5 days, it will take at least a month to do the second point ! Because it has less symmetry the code needs to compute 2x more k-points and 3x more perturbations. 

BQ_BEGIN




"Maybe for such a large system you can get some decent force-constants already from (0,0,0) alone" 





In that case, you mean to implement the "mode=gamma-only" tag? 

BQ_END


Not really, the triple (0,0,0) is in itself the 1x1x1 grid, and you can threat it as such. Thanks to some Fourier interpolation trickery, you can use it to get the D3 matrices at any point. Also, the d3_qq2rr code is not particularly optimized, and is not parallelized I'm not sure you would manage to compute the Fourier transform of the 2x2x2 grid anyway. 

You have to keep in mind that the 3-body force constant become huge very quickly with the number of atoms and the size of the grid: each D3 matrix has (3*nat)^3 complex elements, and a grid n x n x n contains n^6 power triplets 

In your case, the 2x2x2 grid would use about 2.2GB of RAM, which is probably still feasible, but i would try the 1x1x1 first. 




cheers 

BQ_BEGIN









Regards, 

Kanka 




Kanka Ghosh 
Postdoctoral Researcher 
I2M-Bordeaux 
University of Bordeaux, CNRS UMR 5295 
Site: Ecole Nationale Supérieure des Arts et Métiers 
Bordeaux-Talence 33400 


From: "Lorenzo Paulatto" [ mailto:paulatz at gmail.com | <paulatz at gmail.com> ] ​_ 
To: "users" [ mailto:users at lists.quantum-espresso.org | <users at lists.quantum-espresso.org> ] 
Sent: Thursday, December 3, 2020 11:23:58 PM 
Subject: Re: [QE-users] D3Q code stopped due to davcio error 

task # 71 
BQ_BEGIN

from davcio : error # 5011 
error while writing from file ".//D3_Q1.0_0_0_Q2.0_0_-1o2_Q3.0_0_1o2/scf.d1.dq1pq1.72" 


BQ_END





I guess it may have run out of space, d3q uses a ton of disk space and there is not easy way to avoid this. If you are running on any of the French computing centers I can try to have a look directly. 

I do not think the change in number of CPUs could cause this problem, but if you provide the full output I can check. Also, 44 atoms is a lot for the d3q code, it seems like you're running the second q-point triplet, which is of kind (0,q,-q), it takes much more time and disk space than the triplet (0,0,0). Maybe for such a large system you can get some decent force-constants already from (0,0,0) alone 
cheers 


BQ_BEGIN


Kanka Ghosh 
Postdoctoral Researcher 
I2M-Bordeaux 
University of Bordeaux, CNRS UMR 5295 
Site: Ecole Nationale Supérieure des Arts et Métiers 
Bordeaux-Talence 33400 

_______________________________________________
Quantum ESPRESSO is supported by MaX ( [ http://www.max-centre.eu/ | www.max-centre.eu ] )
users mailing list [ mailto:users at lists.quantum-espresso.org | users at lists.quantum-espresso.org ] [ https://lists.quantum-espresso.org/mailman/listinfo/users | https://lists.quantum-espresso.org/mailman/listinfo/users ] 

BQ_END

_______________________________________________ 
Quantum ESPRESSO is supported by MaX ( [ http://www.max-centre.eu/ | www.max-centre.eu ] ) 
users mailing list [ mailto:users at lists.quantum-espresso.org | users at lists.quantum-espresso.org ] 
[ https://lists.quantum-espresso.org/mailman/listinfo/users | https://lists.quantum-espresso.org/mailman/listinfo/users ] 

_______________________________________________
Quantum ESPRESSO is supported by MaX ( [ http://www.max-centre.eu/ | www.max-centre.eu ] )
users mailing list [ mailto:users at lists.quantum-espresso.org | users at lists.quantum-espresso.org ] [ https://lists.quantum-espresso.org/mailman/listinfo/users | https://lists.quantum-espresso.org/mailman/listinfo/users ] 

BQ_END

_______________________________________________ 
Quantum ESPRESSO is supported by MaX ( [ http://www.max-centre.eu/ | www.max-centre.eu ] ) 
users mailing list [ mailto:users at lists.quantum-espresso.org | users at lists.quantum-espresso.org ] 
[ https://lists.quantum-espresso.org/mailman/listinfo/users | https://lists.quantum-espresso.org/mailman/listinfo/users ] 

_______________________________________________
Quantum ESPRESSO is supported by MaX ( [ http://www.max-centre.eu/ | www.max-centre.eu ] )
users mailing list [ mailto:users at lists.quantum-espresso.org | users at lists.quantum-espresso.org ] [ https://lists.quantum-espresso.org/mailman/listinfo/users | https://lists.quantum-espresso.org/mailman/listinfo/users ] 

BQ_END

_______________________________________________ 
Quantum ESPRESSO is supported by MaX (www.max-centre.eu) 
users mailing list users at lists.quantum-espresso.org 
https://lists.quantum-espresso.org/mailman/listinfo/users 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20201207/d5581f95/attachment.html>


More information about the users mailing list