[Pw_forum] Memory distribution problem

Sun Mar 2 15:58:27 CET 2014

On Sat, 2014-03-01 at 21:13 -0600, Peng Chen wrote:

> And I still couldn't find where this large array comes from.  

the code prints the size of largest arrays at the beginning 
of the calculation

P.

> 
> On Sat, Mar 1, 2014 at 3:49 PM, Paolo Giannozzi
> <paolo.giannozzi at uniud.it> wrote:
>         On Sat, 2014-03-01 at 14:27 -0600, Peng Chen wrote:
>         >
>         > And the system is not that large(32 atoms, 400 nband, 8*8*8
>         kpoints)
>         > which is run in 128 cores.  I think you are probably right
>         that QE is
>         > trying to allocate a large array somehow.
>         
>         
>         ... and ?
>         
>         > On Fri, Feb 28, 2014 at 10:35 AM, Paolo Giannozzi
>         > <paolo.giannozzi at uniud.it> wrote:
>         >         On Fri, 2014-02-28 at 09:12 -0600, Peng Chen wrote:
>         >
>         >         > I think it is memory, because the error message is
>         like:
>         >         > : 02/27/2014 14:06:20|  main|zeta27|W|job 221982
>         exceeds job
>         >         hard
>         >         > limit "h_vmem" of queue (2871259136.00000 >
>         >         limit:2147483648.00000) -
>         >         > sending SIGKILL
>         >
>         >
>         >         there are a few hints on how to reduce memory usage
>         to the
>         >         strict
>         >         minimum here:
>         >
>         http://www.quantum-espresso.org/wp-content/uploads/Doc/pw_user_guide/node19.html#SECTION000600100000000000000
>         >         If the FFT grid is large, reduce mixing_ndim from
>         its default
>         >         value (8)
>         >         to 4 or so. If the number of bands is large,
>         distribute
>         >         nbnd*nbnd
>         >         matrices using "-ndiag". If you have many k-points,
>         save to
>         >         disk with
>         >         disk_io='medium'. The message you get: "2871259136 >
>         >         limit:2147483648"
>         >         makes me think that you crash when trying to
>         allocate an array
>         >         whose
>         >         size is at least 2871259136-2147483648=a lot. It
>         shouldn' be
>         >         difficult
>         >         to figure out where such a large array comes from
>         >
>         >         Paolo
>         >
>         >
>         >         >
>         >         > I normally used h_stak=128M, it is working fine.
>         >         >
>         >         >
>         >         >
>         >         >
>         >         >
>         >         >
>         >         > On Fri, Feb 28, 2014 at 7:30 AM, Paolo Giannozzi
>         >         > <paolo.giannozzi at uniud.it> wrote:
>         >         >         On Thu, 2014-02-27 at 17:30 -0600, Peng
>         Chen wrote:
>         >         >         > P.S. Most of the jobs failed at the
>         beginning of
>         >         scf
>         >         >         calculation, and
>         >         >         > the length of output scf file is zero.
>         >         >
>         >         >
>         >         >         are you sure the problem is the size of
>         the RAM and
>         >         not the
>         >         >         size of
>         >         >         the stack?
>         >         >
>         >         >         P.
>         >         >
>         >         >
>         >         >         >
>         >         >         >
>         >         >         > On Thu, Feb 27, 2014 at 5:09 PM, Peng
>         Chen
>         >         >         <pchen229 at illinois.edu>
>         >         >         > wrote:
>         >         >         >         Dear QE users,
>         >         >         >
>         >         >         >
>         >         >         >         Recently, our workstation is
>         updated and
>         >         there is a
>         >         >         hard limit
>         >         >         >         on memory (2G per core). Some of
>         QE jobs
>         >         are
>         >         >         constantly failed
>         >         >         >         (not always) because one of the
>         MPI
>         >         processes
>         >         >         exceeded the RAM
>         >         >         >         limit and was killed. I am
>         wondering if
>         >         there is a
>         >         >         way to
>         >         >         >         distribute using memory more
>         evenly in
>         >         every core.
>         >         >         >
>         >         >         >
>         >         >
>         >         >         >
>         _______________________________________________
>         >         >         > Pw_forum mailing list
>         >         >         > Pw_forum at pwscf.org
>         >         >         >
>         http://pwscf.org/mailman/listinfo/pw_forum
>         >         >
>         >         >
>         >         >         --
>         >         >          Paolo Giannozzi, Dept.
>         >         Chemistry&Physics&Environment,
>         >         >          Univ. Udine, via delle Scienze 208, 33100
>         Udine,
>         >         Italy
>         >         >          Phone +39-0432-558216, fax
>         +39-0432-558222
>         >         >
>         >         >
>         _______________________________________________
>         >         >         Pw_forum mailing list
>         >         >         Pw_forum at pwscf.org
>         >         >         http://pwscf.org/mailman/listinfo/pw_forum
>         >         >
>         >         >
>         >         >
>         >         > _______________________________________________
>         >         > Pw_forum mailing list
>         >         > Pw_forum at pwscf.org
>         >         > http://pwscf.org/mailman/listinfo/pw_forum
>         >
>         >         --
>         >          Paolo Giannozzi, Dept.
>         Chemistry&Physics&Environment,
>         >          Univ. Udine, via delle Scienze 208, 33100 Udine,
>         Italy
>         >          Phone +39-0432-558216, fax +39-0432-558222
>         >
>         >         _______________________________________________
>         >         Pw_forum mailing list
>         >         Pw_forum at pwscf.org
>         >         http://pwscf.org/mailman/listinfo/pw_forum
>         >
>         >
>         >
>         > _______________________________________________
>         > Pw_forum mailing list
>         > Pw_forum at pwscf.org
>         > http://pwscf.org/mailman/listinfo/pw_forum
>         
>         --
>         Paolo Giannozzi, Dept. Chemistry&Physics&Environment,
>         Univ. Udine, via delle Scienze 208, 33100 Udine, Italy
>         Phone +39-0432-558216, fax +39-0432-558222
>         
>         _______________________________________________
>         Pw_forum mailing list
>         Pw_forum at pwscf.org
>         http://pwscf.org/mailman/listinfo/pw_forum
>         
> 
> 
> _______________________________________________
> Pw_forum mailing list
> Pw_forum at pwscf.org
> http://pwscf.org/mailman/listinfo/pw_forum

-- 
Paolo Giannozzi, Dept. Chemistry&Physics&Environment, 
Univ. Udine, via delle Scienze 208, 33100 Udine, Italy
Phone +39-0432-558216, fax +39-0432-558222