[Pw_forum] error in running pw.x command

Ari P Seitsonen Ari.P.Seitsonen at iki.fi
Mon Jul 20 09:06:25 CEST 2015


Dear Mohaddeseh et co,

   Just a note: I used to have such problems when I had compiled with 
MKL-ScaLAPACK of old version, indeed around 11.1, when I ran with more 
than four cores. I think I managed to run when I disabled ScaLAPACK. Of 
course this might be fully unrelated to your problem.

     Greetings from Lappeenranta,

        apsi

-=*=-=*=-=*=-=*=-=*=-=*=-=*=-=*=-=*=-=*=-=*=-=*=-=*=-=*=-=*=-=*=-=*=-=*=-=*=-
   Ari Paavo Seitsonen / Ari.P.Seitsonen at iki.fi / http://www.iki.fi/~apsi/
   Ecole Normale Supérieure (ENS), Département de Chimie, Paris
   Mobile (F) : +33 789 37 24 25    (CH) : +41 79 71 90 935


On Mon, 20 Jul 2015, Paolo Giannozzi wrote:

> This is not a QE problem: the fortran code knows nothing about nodes and cores. It's the software setup for parallel execution on your machine that has a problem.
> 
> Paolo
> 
> On Thu, Jul 16, 2015 at 2:25 PM, mohaddeseh abbasnejad <m.abbasnejad at gmail.com> wrote:
>
>       Dear all,
> 
> I have recently installed PWscf (version 5.1) on our cluster (4 nodes, 32 cores).
> Ifort & mkl version 11.1 has been installed.
> When I run pw.x command on every node individually, for both the following command, it will work properly.
> 1- /opt/exp_soft/espresso-5.1/bin/pw.x -in scf.in
> 2- mpirun -n 4 /opt/exp_soft/espresso-5.1/bin/pw.x -in scf.in
> However, when I use the following command (again for each of them, separately),
> 3- mpirun -n 8 /opt/exp_soft/espresso-5.1/bin/pw.x -in scf.in
> it gives me such an error:
> 
> [cluster:14752] *** Process received signal ***
> [cluster:14752] Signal: Segmentation fault (11)
> [cluster:14752] Signal code:  (128)
> [cluster:14752] Failing at address: (nil)
> [cluster:14752] [ 0] /lib64/libpthread.so.0() [0x3a78c0f710]
> [cluster:14752] [ 1] /opt/intel/Compiler/11.1/064/mkl/lib/em64t/libmkl_mc3.so(mkl_blas_zdotc+0x79) [0x2b5e8e37d4f9]
> [cluster:14752] *** End of error message ***
> --------------------------------------------------------------------------
> mpirun noticed that process rank 4 with PID 14752 on node cluster.khayam.local exited on signal 11 (Segmentation fault).
> --------------------------------------------------------------------------
> 
> This error also exists when I use all the node with each other in parallel mode (using the following command):
> 4- mpirun -n 32 -hostfile testhost /opt/exp_soft/espresso-5.1/bin/pw.x -in scf.in
> The error:
> 
> [cluster:14838] *** Process received signal ***
> [cluster:14838] Signal: Segmentation fault (11)
> [cluster:14838] Signal code:  (128)
> [cluster:14838] Failing at address: (nil)
> [cluster:14838] [ 0] /lib64/libpthread.so.0() [0x3a78c0f710]
> [cluster:14838] [ 1] /opt/intel/Compiler/11.1/064/mkl/lib/em64t/libmkl_mc3.so(mkl_blas_zdotc+0x79) [0x2b04082cf4f9]
> [cluster:14838] *** End of error message ***
> --------------------------------------------------------------------------
> mpirun noticed that process rank 24 with PID 14838 on node cluster.khayam.local exited on signal 11 (Segmentation fault).
> --------------------------------------------------------------------------
> 
> Any help will be appreciated.
> 
> Regards,
> Mohaddeseh
> 
> ---------------------------------------------------------
> 
> Mohaddeseh Abbasnejad,
> Room No. 323, Department of Physics,
> University of Tehran, North Karegar Ave.,
> Tehran, P.O. Box: 14395-547- IRAN
> Tel. No.: +98 21 6111 8634  & Fax No.: +98 21 8800 4781
> Cellphone: +98 917 731 7514
> E-Mail:     m.abbasnejad at gmail.com
> Website:  http://physics.ut.ac.ir
> 
> ---------------------------------------------------------
> 
> _______________________________________________
> Pw_forum mailing list
> Pw_forum at pwscf.org
> http://pwscf.org/mailman/listinfo/pw_forum
> 
> 
> 
> 
> --
> Paolo Giannozzi, Dept. Chemistry&Physics&Environment,
> Univ. Udine, via delle Scienze 208, 33100 Udine, Italy
> Phone +39-0432-558216, fax +39-0432-558222
> 
>


More information about the users mailing list