<div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div><div>I am trying to run GPU enabled QE (QE 6.8 running on Ubuntu 18.04.5 LTS (GNU/Linux 4.15.0-135-generic x86_64) System Configuration: Processor: Intel Xeon Gold 5120 CPU 2.20 GHz (2 Processor) RAM: 96 GB HDD: 6 TB Graphics Card: NVIDIA Quadro P5000 (16 GB))</div><div><br></div><div>I am successfully able to run small jobs (with dynamical ram ~1GB). However, when going to even larger systems (less than 16GB), the output abruptly stops during the first iteration(attached below)</div><div><br></div><div>     Program PWSCF v.6.8 starts on  8Oct2021 at 10:33:9 </div><div><br></div><div>     This program is part of the open-source Quantum ESPRESSO suite</div><div>     for quantum simulation of materials; please cite</div><div>         "P. Giannozzi et al., J. Phys.:Condens. Matter 21 395502 (2009);</div><div>         "P. Giannozzi et al., J. Phys.:Condens. Matter 29 465901 (2017);</div><div>         "P. Giannozzi et al., J. Chem. Phys. 152 154105 (2020);</div><div>          URL <a href="http://www.quantum-espresso.org">http://www.quantum-espresso.org</a>", </div><div>     in publications or presentations arising from this work. More details at</div><div>     <a href="http://www.quantum-espresso.org/quote">http://www.quantum-espresso.org/quote</a></div><div><br></div><div>     Parallel version (MPI & OpenMP), running on     784 processor cores</div><div>     Number of MPI processes:                28</div><div>     Threads/MPI process:                    28</div><div><br></div><div>     MPI processes distributed on     1 nodes</div><div>     R & G space division:  proc/nbgrp/npool/nimage =      28</div><div>     43440 MiB available memory on the printing compute node when the environment starts</div><div> </div><div>     Reading input from <a href="http://001.in">001.in</a></div><div>Warning: card &CELL ignored</div><div>Warning: card / ignored</div><div><br></div><div>     Current dimensions of program PWSCF are:</div><div>     Max number of different atomic species (ntypx) = 10</div><div>     Max number of k-points (npk) =  40000</div><div>     Max angular momentum in pseudopotentials (lmaxx) =  4</div><div>     file Ti.pbe-spn-rrkjus_psl.1.0.0.upf: wavefunction(s)  3S 3D renormalized</div><div><br></div><div>     gamma-point specific algorithms are used</div><div>     Found symmetry operation: I + ( -0.0000 -0.5000  0.0000)</div><div>     This is a supercell, fractional translations are disabled</div><div><br></div><div>     Subspace diagonalization in iterative solution of the eigenvalue problem:</div><div>     a serial algorithm will be used</div><div><br></div><div> </div><div>     Parallelization info</div><div>     --------------------</div><div>     sticks:   dense  smooth     PW     G-vecs:    dense   smooth      PW</div><div>     Min         637     232     57                81572    18102    2258</div><div>     Max         640     234     60                81588    18118    2266</div><div>     Sum       17865    6549   1633              2284245   507201   63345</div><div> </div><div>     Using Slab Decomposition</div><div> </div><div><br></div><div><br></div><div>     bravais-lattice index     =           14</div><div>     lattice parameter (alat)  =      21.0379  a.u.</div><div>     unit-cell volume          =    9204.2807 (a.u.)^3</div><div>     number of atoms/cell      =           36</div><div>     number of atomic types    =            2</div><div>     number of electrons       =       288.00</div><div>     number of Kohn-Sham states=          173</div><div>     kinetic-energy cutoff     =      55.0000  Ry</div><div>     charge density cutoff     =     600.0000  Ry</div><div>     scf convergence threshold =      1.0E-06</div><div>     mixing beta               =       0.4000</div><div>     number of iterations used =            8  local-TF  mixing</div><div>     energy convergence thresh.=      1.0E-04</div><div>     force convergence thresh. =      1.0E-03</div><div>     Exchange-correlation= PBE</div><div>                           (   1   4   3   4   0   0   0)</div><div>     nstep                     =          500</div><div><br></div><div><br></div><div>     GPU acceleration is ACTIVE.</div><div><br></div><div>     Message from routine print_cuda_info:</div><div>     High GPU oversubscription detected. Are you sure this is what you want?</div><div><br></div><div>     GPU used by master process:</div><div><br></div><div>        Device Number: 0</div><div>        Device name: Quadro P5000</div><div>        Compute capability : 61</div><div>        Ratio of single to double precision performance  : 32</div><div>        Memory Clock Rate (KHz): 4513000</div><div>        Memory Bus Width (bits): 256</div><div>        Peak Memory Bandwidth (GB/s): 288.83</div><div><br></div><div>     celldm(1)=  21.037943  celldm(2)=   1.000000  celldm(3)=   2.419041</div><div>     celldm(4)=  -0.766650  celldm(5)=  -0.766650  celldm(6)=   0.533303</div><div><br></div><div>     crystal axes: (cart. coord. in units of alat)</div><div>               a(1) = (   1.000000   0.000000   0.000000 )  </div><div>               a(2) = (   0.533303   0.845924   0.000000 )  </div><div>               a(3) = (  -1.854558  -1.023161   1.168553 )  </div><div><br></div><div>     reciprocal axes: (cart. coord. in units 2 pi/alat)</div><div>               b(1) = (  1.000000 -0.630438  1.035056 )  </div><div>               b(2) = ( -0.000000  1.182139  1.035056 )  </div><div>               b(3) = (  0.000000  0.000000  0.855759 )  </div><div><br></div><div><br></div><div>     PseudoPot. # 1 for Ti read from file:</div><div>     ../Ti.pbe-spn-rrkjus_psl.1.0.0.upf</div><div>     MD5 check sum: e281089c08e14b8efcf92e44a67ada65</div><div>     Pseudo is Ultrasoft + core correction, Zval = 12.0</div><div>     Generated using &quot;atomic&quot; code by A. Dal Corso  v.6.2.2</div><div>     Using radial grid of 1177 points,  6 beta functions with: </div><div>                l(1) =   0</div><div>                l(2) =   0</div><div>                l(3) =   1</div><div>                l(4) =   1</div><div>                l(5) =   2</div><div>                l(6) =   2</div><div>     Q(r) pseudized with 0 coefficients </div><div><br></div><div><br></div><div>     PseudoPot. # 2 for O  read from file:</div><div>     ../O.pbe-n-rrkjus_psl.1.0.0.upf</div><div>     MD5 check sum: 91400c9766925bcf19f520983a725ff0</div><div>     Pseudo is Ultrasoft + core correction, Zval =  6.0</div><div>     Generated using &quot;atomic&quot; code by A. Dal Corso  v.6.3MaX</div><div>     Using radial grid of 1095 points,  4 beta functions with: </div><div>                l(1) =   0</div><div>                l(2) =   0</div><div>                l(3) =   1</div><div>                l(4) =   1</div><div>     Q(r) pseudized with 0 coefficients </div><div><br></div><div><br></div><div>     atomic species   valence    mass     pseudopotential</div><div>        Ti            12.00    47.86700     Ti( 1.00)</div><div>        O              6.00    15.99940     O ( 1.00)</div><div><br></div><div>     Starting magnetic structure </div><div>     atomic species   magnetization</div><div>        Ti           0.200</div><div>        O            0.000</div><div><br></div><div>     No symmetry found</div><div><br></div><div><br></div><div>                                    s                        frac. trans.</div><div><br></div><div>      isym =  1     identity                                     </div><div><br></div><div> cryst.   s( 1) = (     1          0          0      )</div><div>                  (     0          1          0      )</div><div>                  (     0          0          1      )</div><div><br></div><div> cart.    s( 1) = (  1.0000000  0.0000000  0.0000000 )</div><div>                  (  0.0000000  1.0000000  0.0000000 )</div><div>                  (  0.0000000  0.0000000  1.0000000 )</div><div><br></div><div><br></div><div>     point group C_1 (1)    </div><div>     there are  1 classes</div><div>     the character table:</div><div><br></div><div>       E    </div><div>A      1.00</div><div><br></div><div>     the symmetry operations in each class and the name of the first element:</div><div><br></div><div>     E        1</div><div>          identity                                               </div><div><br></div><div>   Cartesian axes</div><div><br></div><div>     site n.     atom                  positions (alat units)</div><div>         1           O   tau(   1) = (  -0.8353365  -0.5987815   0.7050395  )</div><div>         2           Ti  tau(   2) = (  -0.6772809  -0.5115821   0.7050395  )</div><div>         3           O   tau(   3) = (  -0.5192254  -0.4243827   0.7050395  )</div><div>         4           Ti  tau(   4) = (  -0.9272815  -0.5115821   0.5842738  )</div><div>         5           O   tau(   5) = (  -0.7692260  -0.4243827   0.5842738  )</div><div>         6           O   tau(   6) = (  -0.3186838  -0.1758181   0.5842738  )</div><div>         7           O   tau(   7) = (  -0.4520098  -0.3872999   0.4635080  )</div><div>         8           Ti  tau(   8) = (  -0.2939543  -0.3001004   0.4635080  )</div><div>         9           O   tau(   9) = (  -0.1358987  -0.2129011   0.4635080  )</div><div>        10           O   tau(  10) = (  -0.5686844  -0.1758181   0.7050395  )</div><div>        11           Ti  tau(  11) = (  -0.4106289  -0.0886188   0.7050395  )</div><div>        12           O   tau(  12) = (  -0.2525734  -0.0014194   0.7050395  )</div><div>        13           Ti  tau(  13) = (  -0.6606296  -0.0886188   0.5842738  )</div><div>        14           O   tau(  14) = (  -0.5025740  -0.0014194   0.5842738  )</div><div>        15           O   tau(  15) = (  -0.0520318   0.2471452   0.5842738  )</div><div>        16           O   tau(  16) = (  -0.1853578   0.0356635   0.4635080  )</div><div>        17           Ti  tau(  17) = (  -0.0273023   0.1228629   0.4635080  )</div><div>        18           O   tau(  18) = (   0.1307533   0.2100623   0.4635080  )</div><div>        19           O   tau(  19) = (  -0.3353351  -0.5987815   0.7050395  )</div><div>        20           Ti  tau(  20) = (  -0.1772797  -0.5115821   0.7050395  )</div><div>        21           O   tau(  21) = (  -0.0192241  -0.4243827   0.7050395  )</div><div>        22           Ti  tau(  22) = (  -0.4272803  -0.5115821   0.5842738  )</div><div>        23           O   tau(  23) = (  -0.2692247  -0.4243827   0.5842738  )</div><div>        24           O   tau(  24) = (   0.1813175  -0.1758181   0.5842738  )</div><div>        25           O   tau(  25) = (   0.0479915  -0.3872999   0.4635080  )</div><div>        26           Ti  tau(  26) = (   0.2060470  -0.3001004   0.4635080  )</div><div>        27           O   tau(  27) = (   0.3641026  -0.2129011   0.4635080  )</div><div>        28           O   tau(  28) = (  -0.0686832  -0.1758181   0.7050395  )</div><div>        29           Ti  tau(  29) = (   0.0893724  -0.0886188   0.7050395  )</div><div>        30           O   tau(  30) = (   0.2474280  -0.0014194   0.7050395  )</div><div>        31           Ti  tau(  31) = (  -0.1606282  -0.0886188   0.5842738  )</div><div>        32           O   tau(  32) = (  -0.0025728  -0.0014194   0.5842738  )</div><div>        33           O   tau(  33) = (   0.4479695   0.2471452   0.5842738  )</div><div>        34           O   tau(  34) = (   0.3146435   0.0356635   0.4635080  )</div><div>        35           Ti  tau(  35) = (   0.4726991   0.1228629   0.4635080  )</div><div>        36           O   tau(  36) = (   0.6307546   0.2100623   0.4635080  )</div><div><br></div><div>   Crystallographic axes</div><div><br></div><div>     site n.     atom                  positions (cryst. coord.)</div><div>         1           O   tau(   1) = (  0.2719137  0.0219125  0.6033439  )</div><div>         2           Ti  tau(   2) = (  0.3749954  0.1249943  0.6033439  )</div><div>         3           O   tau(   3) = (  0.4780771  0.2280761  0.6033439  )</div><div>         4           Ti  tau(   4) = ( -0.0000046 -0.0000050  0.4999975  )</div><div>         5           O   tau(   5) = (  0.1030772  0.1030768  0.4999975  )</div><div>         6           O   tau(   6) = (  0.3969147  0.3969146  0.4999975  )</div><div>         7           O   tau(   7) = (  0.2719156  0.0219145  0.3966511  )</div><div>         8           Ti  tau(   8) = (  0.3749973  0.1249964  0.3966511  )</div><div>         9           O   tau(   9) = (  0.4780790  0.2280781  0.3966511  )</div><div>        10           O   tau(  10) = (  0.2719134  0.5219140  0.6033439  )</div><div>        11           Ti  tau(  11) = (  0.3749952  0.6249957  0.6033439  )</div><div>        12           O   tau(  12) = (  0.4780769  0.7280775  0.6033439  )</div><div>        13           Ti  tau(  13) = ( -0.0000048  0.4999964  0.4999975  )</div><div>        14           O   tau(  14) = (  0.1030769  0.6030781  0.4999975  )</div><div>        15           O   tau(  15) = (  0.3969145  0.8969160  0.4999975  )</div><div>        16           O   tau(  16) = (  0.2719153  0.5219160  0.3966511  )</div><div>        17           Ti  tau(  17) = (  0.3749970  0.6249978  0.3966511  )</div><div>        18           O   tau(  18) = (  0.4780787  0.7280796  0.3966511  )</div><div>        19           O   tau(  19) = (  0.7719150  0.0219125  0.6033439  )</div><div>        20           Ti  tau(  20) = (  0.8749966  0.1249943  0.6033439  )</div><div>        21           O   tau(  21) = (  0.9780784  0.2280761  0.6033439  )</div><div>        22           Ti  tau(  22) = (  0.4999967 -0.0000050  0.4999975  )</div><div>        23           O   tau(  23) = (  0.6030784  0.1030768  0.4999975  )</div><div>        24           O   tau(  24) = (  0.8969160  0.3969146  0.4999975  )</div><div>        25           O   tau(  25) = (  0.7719169  0.0219145  0.3966511  )</div><div>        26           Ti  tau(  26) = (  0.8749985  0.1249964  0.3966511  )</div><div>        27           O   tau(  27) = (  0.9780803  0.2280781  0.3966511  )</div><div>        28           O   tau(  28) = (  0.7719147  0.5219140  0.6033439  )</div><div>        29           Ti  tau(  29) = (  0.8749965  0.6249957  0.6033439  )</div><div>        30           O   tau(  30) = (  0.9780782  0.7280775  0.6033439  )</div><div>        31           Ti  tau(  31) = (  0.4999965  0.4999964  0.4999975  )</div><div>        32           O   tau(  32) = (  0.6030782  0.6030781  0.4999975  )</div><div>        33           O   tau(  33) = (  0.8969158  0.8969160  0.4999975  )</div><div>        34           O   tau(  34) = (  0.7719166  0.5219160  0.3966511  )</div><div>        35           Ti  tau(  35) = (  0.8749983  0.6249978  0.3966511  )</div><div>        36           O   tau(  36) = (  0.9780801  0.7280796  0.3966511  )</div><div><br></div><div>     number of k points=     1  Gaussian smearing, width (Ry)=  0.0100</div><div>                       cart. coord. in units 2pi/alat</div><div>        k(    1) = (   0.0000000   0.0000000   0.0000000), wk =   1.0000000</div><div><br></div><div>                       cryst. coord.</div><div>        k(    1) = (   0.0000000   0.0000000   0.0000000), wk =   1.0000000</div><div><br></div><div>     Dense  grid:  1142123 G-vectors     FFT dimensions: ( 180, 180, 400)</div><div><br></div><div>     Smooth grid:   253601 G-vectors     FFT dimensions: ( 100, 100, 243)</div><div><br></div><div>     Dynamical RAM for                 wfc:       2.99 MB</div><div><br></div><div>     Dynamical RAM for     wfc (w. buffer):       2.99 MB</div><div><br></div><div>     Dynamical RAM for           str. fact:       1.24 MB</div><div><br></div><div>     Dynamical RAM for           local pot:       0.00 MB</div><div><br></div><div>     Dynamical RAM for          nlocal pot:       7.05 MB</div><div><br></div><div>     Dynamical RAM for                qrad:       3.93 MB</div><div><br></div><div>     Dynamical RAM for          rho,v,vnew:      25.98 MB</div><div><br></div><div>     Dynamical RAM for               rhoin:       8.66 MB</div><div><br></div><div>     Dynamical RAM for           G-vectors:       2.40 MB</div><div><br></div><div>     Dynamical RAM for          h,s,v(r/c):       2.74 MB</div><div><br></div><div>     Dynamical RAM for          <psi|beta>:       0.54 MB</div><div><br></div><div>     Dynamical RAM for                 psi:       5.98 MB</div><div><br></div><div>     Dynamical RAM for                hpsi:       5.98 MB</div><div><br></div><div>     Dynamical RAM for                spsi:       5.98 MB</div><div><br></div><div>     Dynamical RAM for      wfcinit/wfcrot:       8.53 MB</div><div><br></div><div>     Dynamical RAM for           addusdens:     131.34 MB</div><div><br></div><div>     Dynamical RAM for          addusforce:     160.16 MB</div><div><br></div><div>     Estimated static dynamical RAM per process >      76.37 MB</div><div><br></div><div>     Estimated max dynamical RAM per process >     236.53 MB</div><div><br></div><div>     Estimated total dynamical RAM >       6.47 GB</div><div><br></div><div>     Check: negative core charge=   -0.000001</div><div>     Generating pointlists ...</div><div>     new r_m :   0.0722 (alat units)  1.5191 (a.u.) for type    1</div><div>     new r_m :   0.0722 (alat units)  1.5191 (a.u.) for type    2</div><div><br></div><div>     Initial potential from superposition of free atoms</div><div><br></div><div>     starting charge  287.98222, renormalised to  288.00000</div><div><br></div><div>     negative rho (up, down):  9.119E-05 6.477E-05</div><div>     Starting wfcs are  216 randomized atomic wfcs</div><div><br></div><div>     total cpu time spent up to now is       14.0 secs</div><div><br></div><div>     Self-consistent Calculation</div><div>[tb_dev] Currently allocated     2.23E+01 Mbytes, locked:    0 /   9</div><div>[tb_pin] Currently allocated     0.00E+00 Mbytes, locked:    0 /   0</div><div><br></div><div>     iteration #  1     ecut=    55.00 Ry     beta= 0.40</div><div>     Davidson diagonalization with overlap</div><div><br></div><div>---- Real-time Memory Report at c_bands before calling an iterative solver</div><div>           980 MiB given to the printing process from OS</div><div>             0 MiB allocation reported by mallinfo(arena+hblkhd)</div><div>         32000 MiB available memory on the node where the printing process lives</div><div>     GPU memory used/free/total (MiB): 11117 / 5152 / 16270</div><div>------------------</div><div>     ethr =  1.00E-02,  avg # of iterations =  1.5</div><div>The CRASH file generated says</div><div><br></div><div> %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%</div><div>     task #        24</div><div>     from  addusdens_gpu  : error #         1</div><div>      cannot allocate aux2_d </div><div> %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%</div><div><br></div><div><br></div><div> %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%</div><div>     task #        14</div><div>     from  addusdens_gpu  : error #         1</div><div>      cannot allocate aux2_d </div><div> %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%</div><div><br></div><div><br></div><div> %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%</div><div>     task #         5</div><div>     from  addusdens_gpu  : error #         1</div><div>      cannot allocate aux2_d </div><div> %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%</div><div><br></div><div><br></div><div> %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%</div><div>     task #         7</div><div>     from  addusdens_gpu  : error #         1</div><div>      cannot allocate aux2_d </div><div> %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%</div><div><br></div><div><br></div><div> %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%</div><div>     task #        15</div><div>     from  addusdens_gpu  : error #         1</div><div>      cannot allocate aux2_d </div><div> %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%</div><div><br></div><div><br></div><div> %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%</div><div>     task #        17</div><div>     from  addusdens_gpu  : error #         1</div><div>      cannot allocate aux2_d </div><div> %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%</div><div><br></div><div><br></div><div> %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%</div><div>     task #        10</div><div>     from  addusdens_gpu  : error #         1</div><div>      cannot allocate aux2_d </div><div> %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%</div><div><br></div><div><br></div><div> %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%</div><div>     task #         9</div><div>     from  addusdens_gpu  : error #         1</div><div>      cannot allocate aux2_d </div><div> %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%</div><div><br></div><div><br></div><div> %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%</div><div>     task #        12</div><div>     from  addusdens_gpu  : error #         1</div><div>      cannot allocate aux2_d </div><div> %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%</div><div><br></div><div><br></div><div> %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%</div><div>     task #         4</div><div>     from  addusdens_gpu  : error #         1</div><div>      cannot allocate aux2_d </div><div> %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%</div><div><br></div><div><br></div><div> %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%</div><div>     task #        13</div><div>     from  addusdens_gpu  : error #         1</div><div>      cannot allocate aux2_d </div><div> %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%</div><div><br></div><div><br></div><div> %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%</div><div>     task #        19</div><div>     from  addusdens_gpu  : error #         1</div><div>      cannot allocate aux2_d </div><div> %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%</div><div><br></div><div>Using -ndiag 1 and -ntg1 with pw.x also gave a similar output with the following additional lines</div><div><br></div><div>     negative rho (up, down):  9.119E-05 6.477E-05</div><div>     Starting wfcs are  216 randomized atomic wfcs</div><div><br></div><div>     total cpu time spent up to now is       11.9 secs</div><div><br></div><div>     Self-consistent Calculation</div><div>[tb_dev] Currently allocated     3.21E+01 Mbytes, locked:    0 /   9</div><div>[tb_pin] Currently allocated     0.00E+00 Mbytes, locked:    0 /   0</div><div><br></div><div>     iteration #  1     ecut=    55.00 Ry     beta= 0.40</div><div>     Davidson diagonalization with overlap</div><div><br></div><div>---- Real-time Memory Report at c_bands before calling an iterative solver</div><div>          1036 MiB given to the printing process from OS</div><div>             0 MiB allocation reported by mallinfo(arena+hblkhd)</div><div>         36041 MiB available memory on the node where the printing process lives</div><div>     GPU memory used/free/total (MiB): 8915 / 7354 / 16270</div><div>------------------</div><div>     ethr =  1.00E-02,  avg # of iterations =  1.5</div><div>0: ALLOCATE: 156244752 bytes requested; status = 2(out of memory)</div><div>0: ALLOCATE: 156239280 bytes requested; status = 2(out of memory)</div><div>0: ALLOCATE: 156239280 bytes requested; status = 2(out of memory)</div><div>0: ALLOCATE: 156244752 bytes requested; status = 2(out of memory)</div><div>0: ALLOCATE: 156239280 bytes requested; status = 2(out of memory)</div><div>0: ALLOCATE: 156239280 bytes requested; status = 2(out of memory)</div><div>0: ALLOCATE: 156244752 bytes requested; status = 2(out of memory)</div><div>0: ALLOCATE: 156244752 bytes requested; status = 2(out of memory)</div><div>0: ALLOCATE: 156244752 bytes requested; status = 2(out of memory)</div><div>0: ALLOCATE: 156244752 bytes requested; status = 2(out of memory)</div><div>0: ALLOCATE: 156239280 bytes requested; status = 2(out of memory)</div><div>0: ALLOCATE: 156239280 bytes requested; status = 2(out of memory)</div><div>0: ALLOCATE: 156244752 bytes requested; status = 2(out of memory)</div><div>0: ALLOCATE: 156239280 bytes requested; status = 2(out of memory)</div><div>0: ALLOCATE: 156244752 bytes requested; status = 2(out of memory)</div><div>0: ALLOCATE: 156239280 bytes requested; status = 2(out of memory)</div><div>--------------------------------------------------------------------------</div><div>Primary job  terminated normally, but 1 process returned</div><div>a non-zero exit code. Per user-direction, the job has been aborted.</div><div>--------------------------------------------------------------------------</div><div>--------------------------------------------------------------------------</div><div>mpirun detected that one or more processes exited with non-zero status, thus causing</div><div>the job to be terminated. The first process to do so was:</div><div><br></div><div>  Process name: [[58344,1],12]</div><div>  Exit code:    127</div><div>--------------------------------------------------------------------------</div><div>I believe I am not "filling the CPUs with OpenMP threads", or running 1 MPI on 1 GPU, as suggested in this document.</div><div><br></div><div>Can someone please give some suggestions? Sorry for the long post. I am totally new to this field. Any help would be appreciated. Thanks in advance</div></div>-- <br><div dir="ltr"><div dir="ltr">Sent by <b>ANSON THOMAS</b></div><div><b>M.Sc. Chemistry, IIT Roorkee, India</b></div><div dir="ltr"><b><br></b></div><div dir="ltr"><b><br></b></div></div></div></div></div></div>