[Pw_forum] Question of a system administrator about pw.x usage
silas.silva at ufabc.edu.br
Fri Sep 21 19:10:48 CEST 2012
I'm a system administrator of the High Performance Computing Center at
Universidade Federal do ABC - Brazil (http://hpc.ufabc.edu.br/). I'm
not used about the internals of scientific research and the tools you
are used, but we have ran into problems regarding pw.x usage of one of
the users we support.
She has a simple job submission file that run pw.x, with an input and an
output file. After hours of execution, the system load increases
absurdly. We can see it is I/O stuff, but we could not discover why it
happens later in the execution of the program nor how to fix it.
In this cluster, we use NFS for both "distributed scratch" and home
folders (we know we should use a modern parallel file system, but it is
not possible for the moment), but each node has a big local scratch
1. Why I/O happens later in the execution of pw.x?
2. Documentation (here: http://www.quantum-espresso.org/wp-content/uploads/Doc/user_guide/node18.html#SECTION00043100000000000000)
is not clear about the "distributed" or "collected" work. Although
it has some tips, I still wonder about suggesting to our user about
the best configuration. What can be "parallelized"? What may remain
in one place?
3. Is there any flag or configuration we can pass to pw.x to see what it
is doing? Any debug flag?
4. Variables you place in input file are documented anywhere?
Thank you very much.
More information about the users