[Pw_forum] Problem using SGE

Axel Kohlmeyer akohlmey at gmail.com
Tue Feb 21 15:59:57 CET 2012


On Tue, Feb 21, 2012 at 3:55 AM, Mahmoud Payami <mpayami at aeoi.org.ir> wrote:
> Dear QE users,
>
> I am using sge for running a parallel job.
> My "file.qsub" contains the following lines:
> --
> #!/bin/bash
> #
> #$ -cwd
> #$ -j y
> #$ -S /bin/bash
> /opt/openmpi/bin/mpirun /opt/qe/bin/pw.x -npool 2 -ndiag 16 <
> /home/mahmoud/file.in
> --
> Then I use the orte parallel env and use the command:
> qsub -V -pe orte 32 file.qsub
> Everything is ok until the first david diagonalization during which the load
> on some nodes increases the number of processors (that is, the node has
> totally 8 cores but the load shows at the crash time to be more than 16) ,
> and then those nodes hangup.

are you sure you're not simply running out of memory
and driving the machines to swap until they give up?

axel.


> Any comment is highly appreciated.
>
> Best regards,
>                       Mahmoud Payami
>
> --------------------------------
> Mahmoud Payami
> Physics Group, AEOI,
> Tehran-Iran
>
> Email: mpayami at aeoi.org.ir
> Phone: +98 (0) 21 82064393
> ----------------------------------------------
>
> _______________________________________________
> Pw_forum mailing list
> Pw_forum at pwscf.org
> http://www.democritos.it/mailman/listinfo/pw_forum
>



-- 
Dr. Axel Kohlmeyer
akohlmey at gmail.com  http://goo.gl/1wk0

College of Science and Technology
Temple University, Philadelphia PA, USA.



More information about the users mailing list