[Pw_forum] Job crashes on multiple nodes
Gabriele Sclauzero
sclauzer at sissa.it
Mon May 31 10:12:25 CEST 2010
Dear Wolfgang,
Il giorno 31/mag/10, alle ore 09:53, Wolfgang Gehricht ha scritto:
> Dear group!
>
> I am experiencing the following problem with pw.x. When I run a job
> on a single core with 8 processors, the calculation does not exceed
> the available RAM, i.e. it works. When I run the same job on two
> cores, the calculation crashes (mpierun: Forwarding signal 12 to
> job) [relevant log-parts see below]. However, I can compute this job
> on two cores with a smaller k-point sampling, hence I suspect that
> it has to do somehow with the memory demands/distribution. I am
> using the "Davidson iterative diagonalization" as a minimizer, with
> just settings for the thresholds (convergence, 1st iterative
> diagonalization).
> Can you please point me into the right direction?
> With thanks
> Yours Wolfgang
> ---
> Parallel version (MPI)
>
> Number of processors in use: 16
> R & G space division: proc/pool = 16
It's not a good idea to run 16 processes on 8 cores... even worst on 2
cores! The data is distributed among the processes, so the more
processes in the R&G pool, the smaller amount of memory needed per
processes. Anyway, if you run all processes on the same core, you are
not distributing the data *physically*. Moreover these processes will
tend to stomp on each other's feet.
About the k-point sampling (assuming that you don't use
parallelization over k-points, i.e. npool=1): the largest data arrays
are present in the main memory only for one k-point at a time (unless
you use options like disk_io='none' or so), so that incresing the
number of k-points will increase the computation time almost
linearly, but will leave the memory consumption almost unchanged.
If the problem is with the parallel Davidson diagonalization, you may
try to disable it with ndiag=1.
All that said, from the memory usage estimate reported below, it looks
like you're not running such a big system... nothing that couldn't be
run on a laptop with 2GB of memory.
HTH
GS
> ...
> Subspace diagonalization in iterative solution of the eigenvalue
> problem:
> a parallel distributed memory algorithm will be used,
> eigenstates matrixes will be distributed block like on
> ortho sub-group = 4* 4 procs
> ...
> Planes per process (thick) : nr3 = 90 npp = 6 ncplane = 8100
>
> Proc/ planes cols G planes cols G columns G
> Pool (dense grid) (smooth grid) (wavefct grid)
> 1 6 339 18817 6 339 18817 92 2668
> 2 6 339 18817 6 339 18817 92 2668
> 3 6 339 18817 6 339 18817 92 2668
> 4 6 338 18812 6 338 18812 92 2668
> 5 6 338 18812 6 338 18812 93 2669
> 6 6 338 18812 6 338 18812 93 2669
> 7 6 338 18812 6 338 18812 93 2669
> 8 6 338 18812 6 338 18812 93 2669
> 9 6 338 18812 6 338 18812 93 2669
> 10 6 338 18812 6 338 18812 93 2669
> 11 5 339 18817 5 339 18817 93 2669
> 12 5 339 18815 5 339 18815 93 2669
> 13 5 339 18815 5 339 18815 93 2669
> 14 5 339 18815 5 339 18815 92 2666
> 15 5 339 18815 5 339 18815 92 2666
> 16 5 339 18815 5 339 18815 92 2666
> tot 90 5417 301027 90 5417 301027 1481 42691
> ...
> G cutoff = 1729.8995 ( 301027 G-vectors) FFT grid: ( 90, 90, 90)
>
> Largest allocated arrays est. size (Mb) dimensions
> Kohn-Sham Wavefunctions 4.63 Mb ( 2373, 128)
> NL pseudopotentials 9.12 Mb ( 2373, 252)
> Each V/rho on FFT grid 1.48 Mb ( 48600, 2)
> Each G-vector array 0.14 Mb ( 18817)
> G-vector shells 0.01 Mb ( 1370)
> Largest temporary arrays est. size (Mb) dimensions
> Auxiliary wavefunctions 18.54 Mb ( 2373, 512)
> Each subspace H/S matrix 4.00 Mb ( 512, 512)
> Each <psi_i|beta_j> matrix 0.49 Mb ( 252, 128)
> Arrays for rho mixing 5.93 Mb ( 48600, 8)
>
> Initial potential from superposition of free atoms
> 16 total processes killed (some possibly by mpirun during cleanup)
> _______________________________________________
> Pw_forum mailing list
> Pw_forum at pwscf.org
> http://www.democritos.it/mailman/listinfo/pw_forum
§ Gabriele Sclauzero, EPFL SB ITP CSEA
PH H2 462, Station 3, CH-1015 Lausanne
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20100531/7727cab4/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 1753 bytes
Desc: not available
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20100531/7727cab4/attachment.p7s>
More information about the users
mailing list