[Pw_forum] Question of a system administrator about pw.x usage

Fri Sep 21 19:10:48 CEST 2012

Hello all,

I'm a system administrator of the High Performance Computing Center at
Universidade Federal do ABC - Brazil (http://hpc.ufabc.edu.br/).  I'm
not used about the internals of scientific research and the tools you
are used, but we have ran into problems regarding pw.x usage of one of
the users we support.

She has a simple job submission file that run pw.x, with an input and an
output file.  After hours of execution, the system load increases
absurdly.  We can see it is I/O stuff, but we could not discover why it
happens later in the execution of the program nor how to fix it.

In this cluster, we use NFS for both "distributed scratch" and home
folders (we know we should use a modern parallel file system, but it is
not possible for the moment), but each node has a big local scratch
partition.

Some questions:

1. Why I/O happens later in the execution of pw.x?

2. Documentation (here: http://www.quantum-espresso.org/wp-content/uploads/Doc/user_guide/node18.html#SECTION00043100000000000000)
   is not clear about the "distributed" or "collected" work.  Although
   it has some tips, I still wonder about suggesting to our user about
   the best configuration.  What can be "parallelized"?  What may remain
   in one place?

3. Is there any flag or configuration we can pass to pw.x to see what it
   is doing?  Any debug flag?

4. Variables you place in input file are documented anywhere?

Thank you very much.

-- 
Silas Silva