Dear Torstein could you send the file input.inp, just to try to reproduce the error in an other machine?<br><br>bests<br><br>Layla <br><br><div class="gmail_quote">2012/3/19 Torstein Fjermestad <span dir="ltr"><<a href="mailto:torstein.fjermestad@kjemi.uio.no">torstein.fjermestad@kjemi.uio.no</a>></span><br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> Dear Prof. Giannozzi,<br>
<br>
Thanks for the suggestion.<br>
The two tests I referred to were both run with image parallelization<br>
(16 processors and 8 images).<br>
The tests were run with the same input file and submit script. The<br>
command line was as follows:<br>
<br>
mpirun -np 16 -npernode 8 neb.x -nimage 8 -inp input.inp > output.out<br>
<br>
In this case the job is submitted and is labelled as "running". It<br>
stays like this until the end of the requested time, but it produces no<br>
output. At the end of the file slurm-<jobID>.out the following message<br>
is printed:<br>
<br>
<br>
<br>
slurmd[compute-14-6]: *** JOB 9146164 CANCELLED AT 2012-03-15T23:20:09<br>
DUE TO TIME LIMIT ***<br>
mpirun: killing job...<br>
<br>
Job 9146164 ("neb_11") completed on compute-14-[6-7] at Thu Mar 15<br>
23:20:09 CET 2012<br>
--------------------------------------------------------------------------<br>
mpirun noticed that process rank 0 with PID 523 on node<br>
compute-14-6.local exited on signal 0 (Unknown signal 0).<br>
--------------------------------------------------------------------------<br>
[compute-14-6.local:00516] [[31454,0],0]-[[31454,0],1]<br>
mca_oob_tcp_msg_recv: readv failed: Connection reset by peer (104)<br>
mpirun: clean termination accomplished<br>
<br>
<br>
<br>
<br>
When removing the image parallelization by either setting -nimage 1 or<br>
removing the option altogether (but still running on 16 processors), the<br>
job only runs for a few seconds. At the end of the file<br>
slurm-<jobID>.out the following message is printed:<br>
<br>
<br>
<br>
from test_input_xml: Empty input file .. stopping<br>
--------------------------------------------------------------------------<br>
mpirun has exited due to process rank 11 with PID 32678 on<br>
node compute-14-13 exiting without calling "finalize". This may<br>
have caused other processes in the application to be<br>
terminated by signals sent by mpirun (as reported here).<br>
--------------------------------------------------------------------------<br>
Job 9163874 ("neb_13") completed on compute-14-[12-13] at Sat Mar 17<br>
19:55:23 CET 2012<br>
<br>
<br>
I found in particular the line "from test_input_xml: Empty input file<br>
.. stopping" interesting. The program stops because it thinks a file is<br>
empty.<br>
<br>
Although I did not get much closer to having a running program, I<br>
thought that this change in behavior was interesting. Maybe it can give<br>
you (or someone else) a hint on what is going on.<br>
<br>
Of cause this erroneous behavior may have other causes, such as a<br>
machine related issue, the openmpi environment, the installation<br>
procedure, etc. However, before contacting to the sysadmin, I would like<br>
to rule out (to the extent possible) any issues related to quantum<br>
espresso itself.<br>
<br>
Thanks in advance.<br>
<br>
Yours sincerely,<br>
Torstein Fjermestad<br>
University of Oslo,<br>
Norway<br>
<br>
<br>
<br>
<br>
<br>
<br>
On Thu, 15 Mar 2012 22:26:22 +0100, Paolo Giannozzi<br>
<<a href="mailto:giannozz@democritos.it">giannozz@democritos.it</a>> wrote:<br>
> On Mar 15, 2012, at 20:48 , Torstein Fjermestad wrote:<br>
><br>
>> pw.x now works without problem, but neb.x only works when one node<br>
>> (with 8 processors) is requested. I have run two tests requesting<br>
>> two<br>
>> nodes (16 processors) and in both cases I see the same erroneous<br>
>> behavior:<br>
><br>
> with "image" parallelization in both cases? can you run neb.x with 1<br>
> image<br>
> and 16 processors?<br>
><br>
><br>
> P.<br>
> ---<br>
> Paolo Giannozzi, Dept of Chemistry&Physics&Environment,<br>
> Univ. Udine, via delle Scienze 208, 33100 Udine, Italy<br>
> Phone <a href="tel:%2B39-0432-558216" value="+390432558216">+39-0432-558216</a>, fax <a href="tel:%2B39-0432-558222" value="+390432558222">+39-0432-558222</a><br>
<br>
_______________________________________________<br>
Pw_forum mailing list<br>
<a href="mailto:Pw_forum@pwscf.org">Pw_forum@pwscf.org</a><br>
<a href="http://www.democritos.it/mailman/listinfo/pw_forum" target="_blank">http://www.democritos.it/mailman/listinfo/pw_forum</a><br>
</blockquote></div><br>