[QE-users] I-PI interface with MPI
Aldo Ugolotti
a.ugolotti at campus.unimib.it
Wed Jul 17 14:58:13 CEST 2019
Dear QE users,
I am trying to run a relaxation through ASE python module, interfaced
with QE by a socket I/O (namely using I-PI protocol), but I am having
problems in running that parallel, on a cluster master/nodes
configuration. It does work serially and also parallel on a single
machine (the master); however when I try to ask mpirun to start the
threads on a node, I get error. The run starts fine, but it has problems
when the scf finishes, when it seems there is a communication issue
through the socket, due to mpi processes and I wonder if that may be
related to how I address the socket in the command string.
As a (non)working example, this is the ASE script:
from ase import *
from ase.calculators.espresso import Espresso
from ase.optimize import BFGS
from ase.calculators.socketio import SocketIOCalculator
import sys
slab=Atoms('Si2',positions=[[0,0,0],[2.5,2,0.5]],cell=[3,4,10],pbc=[1,1,0])
input_qe = {'pseudo_dir':'./','system':{
'ecutwfc': 20,
'ecutrho': 105,
'degauss': 0.01,
'occupations' : 'smearing',
'smearing': 'm-v',}
}
pseudopotentials = {'Si': 'Si.pbe-n-kjpaw_psl.0.1.UPF'}
unixsocket = 'ase_espresso_test'
command = ('mpirun -np 2 -host 113,113 pw.x --ipi {unixsocket}:UNIX -in
espresso.pwi > espresso.pwo'.format(unixsocket=unixsocket))
calc_qe = Espresso(command=command,pseudopotentials=pseudopotentials,
tprnfor=True, kpts=(1, 1, 1), koffset=(0, 0, 0), input_data=input_qe)
opt = BFGS(slab, trajectory='opt.traj', logfile='opt.log')
print (slab.get_positions())
with SocketIOCalculator(calc, log=sys.stdout) as calc:
slab.calc = calc_qe
opt.run(fmax=0.05)
print (slab.get_positions())
This is the output printed at the end of espresso.pwo:
convergence has been achieved in 6 iterations
@ DRIVER MODE: Connecting to ase_espresso_test using UNIX socket
-------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status,
thus causing
the job to be terminated. The first process to do so was:
Process name: [[54411,1],0]
Exit code: 255
--------------------------------------------------------------------------
Has anyone experienced similar issues?
Bests regards,
Aldo
--
Aldo Ugolotti
PhD student
Materials Science Dept. U5,
Università degli Studi di Milano-Bicocca
via Cozzi 55,
20125 Milano (MI)
Italy
e-mail: a.ugolotti at campus.unimib.it
More information about the users
mailing list