[Pw_forum] QE in parallel mode
Paolo Giannozzi
paolo.giannozzi at uniud.it
Fri Dec 5 18:06:55 CET 2014
You should first of all verify whether the parallel code is actually
running in parallel, that is, if the number of processors reprinted
at the beginning of the printout is what you expect (e.g., N in
"mpirun -np N ..."). This kind of error typically occurs when more
than one process is trying to access the same scratch directory
Paolo
On Fri, 2014-12-05 at 19:10 +0330, Masoud Shahrokhi wrote:
> Dear QE users,
> I am trying to execute QE in parallel mode. My operation system is
> CentOs 5.6. First of all I installed Openmpi-1.6.5, and then I
> compiled QE 4.3.2 with ifort 11.1. When I execute QE in single mode
> there is no problem but when I execute it in parallel mode I face with
> this error:
> forrtl: No such file or directory
> forrtl: severe (28): CLOSE error, unit 10, file "Unknown"
> Image PC Routine Line
> Source
> pw.x 00000000008EDC3D Unknown Unknown Unknown
> pw.x 00000000008EC745 Unknown Unknown Unknown
> pw.x 00000000008844F9 Unknown Unknown Unknown
> pw.x 000000000081A11D Unknown Unknown Unknown
> pw.x 000000000081996A Unknown Unknown Unknown
> pw.x 0000000000811361 Unknown Unknown Unknown
> pw.x 000000000051E229 buffers_mp_close_ 212 buffers.f90
> pw.x 0000000000536839 close_files_ 34
> close_files.f90
> pw.x 00000000004BB87D stop_run_ 50
> stop_run.f90
> pw.x 000000000040589F MAIN__ 214 pwscf.f90
> pw.x 000000000040543C Unknown Unknown Unknown
> libc.so.6 000000327E41D994 Unknown Unknown Unknown
> pw.x 0000000000405349 Unknown Unknown Unknown
> forrtl: No such file or directory
> forrtl: severe (28): CLOSE error, unit 10, file "Unknown"
> Image PC Routine Line
> Source
> pw.x 00000000008EDC3D Unknown Unknown Unknown
> pw.x 00000000008EC745 Unknown Unknown Unknown
> pw.x 00000000008844F9 Unknown Unknown Unknown
> pw.x 000000000081A11D Unknown Unknown Unknown
> pw.x 000000000081996A Unknown Unknown Unknown
> pw.x 0000000000811361 Unknown Unknown Unknown
> pw.x 000000000051E229 buffers_mp_close_ 212 buffers.f90
> pw.x 0000000000536839 close_files_ 34
> close_files.f90
> pw.x 00000000004BB87D stop_run_ 50
> stop_run.f90
> pw.x 000000000040589F MAIN__ 214 pwscf.f90
> pw.x 000000000040543C Unknown Unknown Unknown
> libc.so.6 000000327E41D994 Unknown Unknown Unknown
> pw.x 0000000000405349 Unknown Unknown Unknown
> forrtl: No such file or directory
> forrtl: severe (28): CLOSE error, unit 10, file "Unknown"
> Image PC Routine Line
> Source
> pw.x 00000000008EDC3D Unknown Unknown Unknown
> pw.x 00000000008EC745 Unknown Unknown Unknown
> pw.x 00000000008844F9 Unknown Unknown Unknown
> pw.x 000000000081A11D Unknown Unknown Unknown
> pw.x 000000000081996A Unknown Unknown Unknown
> pw.x 0000000000811361 Unknown Unknown Unknown
> pw.x 000000000051E229 buffers_mp_close_ 212 buffers.f90
> pw.x 0000000000536839 close_files_ 34
> close_files.f90
> pw.x 00000000004BB87D stop_run_ 50
> stop_run.f90
> pw.x 000000000040589F MAIN__ 214 pwscf.f90
> pw.x 000000000040543C Unknown Unknown Unknown
> libc.so.6 000000327E41D994 Unknown Unknown Unknown
> pw.x 0000000000405349 Unknown Unknown Unknown
> --------------------------------------------------------------------------
> mpirun noticed that the job aborted, but has no info as to the process
> that caused that situation.
> --------------------------------------------------------------------------
>
>
> My input file is:
> &CONTROL
> calculation = 'scf',
> restart_mode='from_scratch',
> prefix='pir',
> outdir= '/root/run-espresso/SiH4',
> pseudo_dir= '/root/run-espresso/SiH4',
> wf_collect = .true.
> /
> &SYSTEM
> ibrav = 0,
> celldm(1) = 35.0,
> nat = 5,
> ntyp = 2,
> ecutwfc = 25,
> nspin = 1
> occupations='fixed'
> nosym = .true. ,
> nbnd = 40,
>
> /
> &ELECTRONS
> electron_maxstep = 50,
> mixing_mode = 'plain',
> mixing_beta = 0.3,
> conv_thr = 1.d-6
> /
> ATOMIC_SPECIES
> Si 28.08 Si.pz-vbc.UPF
> H 1.008 H.pz-vbc.UPF
> ATOMIC_POSITIONS (angstrom)
> Si 0.000000000 0.000000000 0.0000000000
> H 0.000000000 1.489000000 0.0000000000
> H 0.000000000 -0.496363243 -1.403832088
> H -1.21575425 -0.496363243 0.701916043
> H 1.21575425 -0.496363243 0.701916043
> CELL_PARAMETERS cubic
> 1.000000000 0.000000000 0.000000000
> 0.000000000 1.000000000 0.000000000
> 0.000000000 0.000000000 1.000000000
> K_POINTS crystal
> 1
> 0.00000 0.00000 0.00000 1.0000000
>
>
> Also I repeated this calculation with QE 5.0.2. But this error still exists.
> What should I do?
> Please let me know!
> Thanks in advance
> Masoud Shahrokhi, PhD
> Razi University. Iran
> _______________________________________________
> Pw_forum mailing list
> Pw_forum at pwscf.org
> http://pwscf.org/mailman/listinfo/pw_forum
--
Paolo Giannozzi, Dept. Chemistry&Physics&Environment,
Univ. Udine, via delle Scienze 208, 33100 Udine, Italy
Phone +39-0432-558216, fax +39-0432-558222
More information about the users
mailing list