[Pw_forum] Memory distribution problem

Peng Chen pchen229 at illinois.edu
Sat Mar 1 21:27:18 CET 2014


Dear Professor Giannozzi,

I tried to decrease the value of npool and mixing_nidm, there are still
some errors like
02/28/2014 21:29:09|  main||W|job 233902 exceeds job hard limit "h_vmem" of
queue  (6573314048.00000 > limit:6442450944.00000) - sending SIGKILL

And the system is not that large(32 atoms, 400 nband, 8*8*8 kpoints) which
is run in 128 cores.  I think you are probably right that QE is trying to
allocate a large array somehow.


On Fri, Feb 28, 2014 at 10:35 AM, Paolo Giannozzi
<paolo.giannozzi at uniud.it>wrote:

> On Fri, 2014-02-28 at 09:12 -0600, Peng Chen wrote:
>
> > I think it is memory, because the error message is like:
> > : 02/27/2014 14:06:20|  main|zeta27|W|job 221982 exceeds job hard
> > limit "h_vmem" of queue (2871259136.00000 > limit:2147483648.00000) -
> > sending SIGKILL
>
> there are a few hints on how to reduce memory usage to the strict
> minimum here:
>
> http://www.quantum-espresso.org/wp-content/uploads/Doc/pw_user_guide/node19.html#SECTION000600100000000000000
> If the FFT grid is large, reduce mixing_ndim from its default value (8)
> to 4 or so. If the number of bands is large, distribute nbnd*nbnd
> matrices using "-ndiag". If you have many k-points, save to disk with
> disk_io='medium'. The message you get: "2871259136 > limit:2147483648"
> makes me think that you crash when trying to allocate an array whose
> size is at least 2871259136-2147483648=a lot. It shouldn' be difficult
> to figure out where such a large array comes from
>
> Paolo
>
>
> >
> > I normally used h_stak=128M, it is working fine.
> >
> >
> >
> >
> >
> >
> > On Fri, Feb 28, 2014 at 7:30 AM, Paolo Giannozzi
> > <paolo.giannozzi at uniud.it> wrote:
> >         On Thu, 2014-02-27 at 17:30 -0600, Peng Chen wrote:
> >         > P.S. Most of the jobs failed at the beginning of scf
> >         calculation, and
> >         > the length of output scf file is zero.
> >
> >
> >         are you sure the problem is the size of the RAM and not the
> >         size of
> >         the stack?
> >
> >         P.
> >
> >
> >         >
> >         >
> >         > On Thu, Feb 27, 2014 at 5:09 PM, Peng Chen
> >         <pchen229 at illinois.edu>
> >         > wrote:
> >         >         Dear QE users,
> >         >
> >         >
> >         >         Recently, our workstation is updated and there is a
> >         hard limit
> >         >         on memory (2G per core). Some of QE jobs are
> >         constantly failed
> >         >         (not always) because one of the MPI processes
> >         exceeded the RAM
> >         >         limit and was killed. I am wondering if there is a
> >         way to
> >         >         distribute using memory more evenly in every core.
> >         >
> >         >
> >
> >         > _______________________________________________
> >         > Pw_forum mailing list
> >         > Pw_forum at pwscf.org
> >         > http://pwscf.org/mailman/listinfo/pw_forum
> >
> >
> >         --
> >          Paolo Giannozzi, Dept. Chemistry&Physics&Environment,
> >          Univ. Udine, via delle Scienze 208, 33100 Udine, Italy
> >          Phone +39-0432-558216, fax +39-0432-558222
> >
> >         _______________________________________________
> >         Pw_forum mailing list
> >         Pw_forum at pwscf.org
> >         http://pwscf.org/mailman/listinfo/pw_forum
> >
> >
> >
> > _______________________________________________
> > Pw_forum mailing list
> > Pw_forum at pwscf.org
> > http://pwscf.org/mailman/listinfo/pw_forum
>
> --
>  Paolo Giannozzi, Dept. Chemistry&Physics&Environment,
>  Univ. Udine, via delle Scienze 208, 33100 Udine, Italy
>  Phone +39-0432-558216, fax +39-0432-558222
>
> _______________________________________________
> Pw_forum mailing list
> Pw_forum at pwscf.org
> http://pwscf.org/mailman/listinfo/pw_forum
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20140301/5383e480/attachment.html>


More information about the users mailing list