[Pw_forum] wfc files: heavy I/O, handling for restarts

S. K. S. sks.jnc at gmail.com
Tue Sep 6 16:22:22 CEST 2011


Dear Prof. Paolo,

Thanks a lot for your reply.

> which QE code and which files are you referring to? <

That is phonon code (ph.x) and the files I mentioned before are given below
in detail.

With the earlier versions up to 4.1.3, phonon code runs fine and  I got
following files
(bold and underlined items are folders) in the tmp directory of a local
disk.
The nodes which are used by the codes are node186, 036, 139.

   node186:/tmpscratch/sksct8/tmp$ ls

> *pbmno.save*     _phpbmno.com1    _phpbmno.dwf2   _phpbmno.igk3
> _phpbmno.prd3
> pbmno.wfc1     _phpbmno.com2    _phpbmno.dwf3   _phpbmno.mixd1
> _phpbmno.recover
> pbmno.wfc2     _phpbmno.com3    _phpbmno.ebar1  _phpbmno.mixd2
> _phpbmno.recover2
> pbmno.wfc3     _phpbmno.dvkb31  _phpbmno.ebar2  _phpbmno.mixd3
> _phpbmno.recover3
> _phpbmno.bar1  _phpbmno.dvkb32  _phpbmno.ebar3  *_phpbmno.phsave** * *
> _phpbmno.save*
> _phpbmno.bar2  _phpbmno.dvkb33  _phpbmno.igk    _phpbmno.prd1
> _phpbmno.bar3  _phpbmno.dwf1    _phpbmno.igk2   _phpbmno.prd2
>

node036:/tmpscratch/sksct8/tmp$ ls
pbmno.wfc4     _phpbmno.dwf4  _phpbmno.mixd4  _phpbmno.recover4
pbmno.wfc5     _phpbmno.dwf5  _phpbmno.mixd5  _phpbmno.recover5
_phpbmno.bar4  _phpbmno.igk4  _phpbmno.prd4   _phpbmno.wfc4
_phpbmno.bar5  _phpbmno.igk5  _phpbmno.prd5   _phpbmno.wfc5

node139:/tmpscratch/sksct8/tmp$ ls
pbmno.wfc6     _phpbmno.bar8  _phpbmno.igk7  _phpbmno.recover6
_phpbmno.wfc8
pbmno.wfc7     _phpbmno.dwf6  _phpbmno.igk8  _phpbmno.recover7
pbmno.wfc8     _phpbmno.dwf7  _phpbmno.prd6  _phpbmno.recover8
_phpbmno.bar6  _phpbmno.dwf8  _phpbmno.prd7  _phpbmno.wfc6
_phpbmno.bar7  _phpbmno.igk6  _phpbmno.prd8  _phpbmno.wfc7

However, with the new version 4.3.1, for the exactly same input files and
job scripts I only get these files and nothing else :

node045:/tmpscratch/sksct84/tmp$ ls
*pbmno.save*  pbmno.wfc1  *_ph0*

node111:/tmpscratch/sksct84/tmp$ ls
pbmno.wfc2  pbmno.wfc3

node092:/tmpscratch/sksct84/tmp$ ls
pbmno.wfc4  pbmno.wfc5

node080:/tmpscratch/sksct84/tmp$ ls
pbmno.wfc6  pbmno.wfc7

node072:/tmpscratch/sksct84/tmp$ ls
pbmno.wfc8

Note that,  the used nodes in this time are  node045, node111, node092,
node080, node072.

So it is clear from the above example that somehow in the new version
4.3.1,  _phpbmno.save and
_phpbmno.phsave goes inside the directory "_ph0" ;

And the same phonon calculation,  which was running fine with the earlier
version, now stops like this way (a bit
abruptly and rudely, with out much informations or error messages) :


    Electric field:
     Dielectric constant
     Born effective charges in two ways


     Atomic displacements:
     There are   5 irreducible representations

     Representation     1      3 modes -T_1u G_15  G_4- To be done

     Representation     2      3 modes -T_1u G_15  G_4- To be done

     Representation     3      3 modes -T_1u G_15  G_4- To be done

     Representation     4      3 modes -T_1u G_15  G_4- To be done

     Representation     5      3 modes -T_2u G_25  G_5- To be done



simply with this error :
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 2 in communicator MPI_COMM_WORLD
with errorcode 0.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun has exited due to process rank 2 with PID 8791 on
node node111.cvos.cluster exiting without calling "finalize". This may
have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------
[node045:11580] 6 more processes have sent help message help-mpi-api.txt /
mpi-abort
[node045:11580] Set MCA parameter "orte_base_help_aggregate" to 0 to see all
help / error messages


Hope this email explains much better.

Thanks and Regards,
Saha SK
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20110906/78efc62c/attachment.html>


More information about the users mailing list