[Pw_forum] Off-topic: memory usage issues
Axel Kohlmeyer
akohlmey at cmm.chem.upenn.edu
Thu Apr 26 15:37:37 CEST 2007
On Thu, 26 Apr 2007, Marcos Verissimo Alves wrote:
MVA> Hi all,
hi marcos,
MVA> This is not a question specifically aimed at espresso issues, but since we
MVA> have quite a few knowledgeable people in the list, I'm posting it.
actually, i think this is not so off-topic as you say, since this
gives an opportunity to clear up some confusions.
MVA> I have compiled pw.x in a dual-core Turion with 2GB RAM under linux
MVA> (OpenSuse 10.2), installed and compiled mpich 2-1.0.5 with ifort 9.1.041,
MVA> and then compiled pw.x (from Espresso 3.0) with mpich, mkl 8.0 and fftw,
MVA> and I have been running it successfully since then in parallel. However,
MVA> looking at the memory usage using top, when running pw.x, I see that,
MVA> although the memory from top is at most 250 MB per processor (from the
MVA> part where the individual processes' cpu and memory usage is displayed),
first of all you have to realize that there are multiple entries for the
memory usage with different meaning. none of them gives you the 'real'
memory usage, since this is not a well defined entity. under linux you
have multiple fields and it depends on the configuration of top what you
can see. if you type 'f' you can add or remove fields and then save the
config with 'W' to ~/.toprc. in particular there is:
VIRT: the total address space (= the maximum amount of memory that is
reserved for the process, even if it is not used or in swap)
RES: the resident set size (= the amount of _physical_ memory used,
the amount of actively used memory minus the part in swap).
SHR: shared memory contribution (= how much sharing is going on, usually
due to shared libraries)
SWAP: the amount of memory in swap (= for technical reason this can also
include 'mapped' memory, e.g. the memory for a video card, or network
card)
CODE: the part of the executable that is loaded into memory
DATA: the amount of memory allocated in the 'data segment'
so what to watch out for?
for applications with dynamical memory management, the memory usage
is somewhere between VIRT and RES (which are usually both displayed),
VIRT will always be larger than the real amount of memory used, but
RES will always stay within the limits of your machine, as it does
not show the amount of swap space used by this process. only if %CPU
is low and the process is mostly in state 'D' (for waiting on 'D'isk,
as opposed to 'R'unning or 'S'leeping) than you are probably swapping
a lot. however performance suffers significantly, if there is too much
swap usage (which is usually not displayed). so RES is actually not
a good measure. DATA is probably better, but it does not include CODE
or SHR. and SHR is undefined since you cannot know how much each process
contributes to the shared memory pool (well, technically you can, but
that would slow down the kernel a lot).
bottom line: don't get fooled by VIRT or RES, but if VIRT fits into
your physical memory, you are on the safe side.
MVA> almost the whole of physical memory is taken (from the "Mem: total, used,
MVA> free, buffers" line).
to quote linus torvalds: unused memory is wasted memory,
so linux uses 'free' memory for file system caching and
i/o buffering. if an application needs more memory, that
is then claimed from the 'cached' and/or 'buffered' pool
of memory which has a delicately tune strategy of ageing
buffered pages to re-use those first that are used the least.
if you are lazy (like me), you can get the 'corrected'
numbers from 'free -t', e.g.:
[akohlmey at vitriol ~]$ free -t
total used free shared buffers cached
Mem: 16399936 15969172 430764 0 358728 14729608
-/+ buffers/cache: 880836 15519100
Swap: 4096524 346008 3750516
Total: 20496460 16315180 4181280
[...]
MVA> The same kind of thing happens when I run another DFT code (siesta), so I
MVA> would guess this could be an MPICH-related issue or, worse, a kernel- or
MVA> compiler-related issue. Does anyone know if there are any memory-usage
MVA> issues for MPICH, for the newer versions of the linux kernel or for the
no, but for a local (shared memory) communication, MPICH needs to
allocate additional buffers and some of them are actually SYSV
shared memory segments and should thus show up as used and shared.
one thing to be noted here is, that when you run a machine for a
very long time, you may actually run out of shared memory segments
and lose memory due to crashed jobs that did not have a chance to
clean up the shared memory segment first. with some smart, but not
overly portable programming this can be avoided, but for a number of
reasons this is not always done. (this also affects other programs
that use shared memory like audio/video players).
you can see the sysv ipc status with 'ipcs' and if you have too many
shared memory segments with 'nattach' 0, then you should clean up.
e.g. with:
[akohlmey at vitriol ~]$ cat ~/bin/clearipc.sh
#!/bin/sh
for s in `ipcs -m | grep $USER | cut -d \ -f 2`
do \
ipcrm -m $s
done
for s in `ipcs -s | grep $USER | cut -d \ -f 2`
do \
ipcrm -s $s
done
for s in `ipcs -q | grep $USER | cut -d \ -f 2`
do \
ipcrm -q $s
done
MVA> ifort 9.1.041/mkl 8.0? Has anybody seen something similar when running
MVA> pw.x on dual-processor machines?
hope that explains some of the oddities.
cheers,
axel.
MVA>
MVA> Thanks,
MVA>
MVA> Marcos
MVA>
MVA>
--
=======================================================================
Axel Kohlmeyer akohlmey at cmm.chem.upenn.edu http://www.cmm.upenn.edu
Center for Molecular Modeling -- University of Pennsylvania
Department of Chemistry, 231 S.34th Street, Philadelphia, PA 19104-6323
tel: 1-215-898-1582, fax: 1-215-573-6233, office-tel: 1-215-898-5425
=======================================================================
If you make something idiot-proof, the universe creates a better idiot.
More information about the users
mailing list