[QE-users] I-PI interface with MPI

Aldo Ugolotti a.ugolotti at campus.unimib.it
Wed Jul 17 14:58:13 CEST 2019


Dear QE users,

I am trying to run a relaxation through ASE python module, interfaced 
with QE by a socket I/O (namely using I-PI protocol), but I am having 
problems in running that parallel, on a cluster master/nodes 
configuration. It does work serially and also parallel on a single 
machine (the master); however when I try to ask mpirun to start the 
threads on a node, I get error. The run starts fine, but it has problems 
when the scf finishes, when it seems there is a communication issue 
through the socket, due to mpi processes and I wonder if that may be 
related to how I address the socket in the command string.

As a (non)working example, this is the ASE script:

from ase import *
from ase.calculators.espresso import Espresso
from ase.optimize import BFGS
from ase.calculators.socketio import SocketIOCalculator
import sys

slab=Atoms('Si2',positions=[[0,0,0],[2.5,2,0.5]],cell=[3,4,10],pbc=[1,1,0])

input_qe = {'pseudo_dir':'./','system':{
                 'ecutwfc': 20,
                 'ecutrho': 105,
                 'degauss': 0.01,
                 'occupations' : 'smearing',
                 'smearing': 'm-v',}
         }
pseudopotentials = {'Si': 'Si.pbe-n-kjpaw_psl.0.1.UPF'}

unixsocket = 'ase_espresso_test'
command = ('mpirun -np 2 -host 113,113 pw.x --ipi {unixsocket}:UNIX -in 
espresso.pwi > espresso.pwo'.format(unixsocket=unixsocket))

calc_qe = Espresso(command=command,pseudopotentials=pseudopotentials, 
tprnfor=True, kpts=(1, 1, 1), koffset=(0, 0, 0), input_data=input_qe)
opt = BFGS(slab, trajectory='opt.traj', logfile='opt.log')
print (slab.get_positions())
with SocketIOCalculator(calc, log=sys.stdout) as calc:
         slab.calc = calc_qe
         opt.run(fmax=0.05)
         print (slab.get_positions())

This is the output printed at the end of espresso.pwo:

      convergence has been achieved in   6 iterations
   @ DRIVER MODE: Connecting to ase_espresso_test using UNIX socket
-------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, 
thus causing
the job to be terminated. The first process to do so was:

   Process name: [[54411,1],0]
   Exit code:    255
--------------------------------------------------------------------------

Has anyone experienced similar issues?

Bests regards,

Aldo

-- 
Aldo Ugolotti

PhD student
Materials Science Dept. U5,
Università degli Studi di Milano-Bicocca
via Cozzi 55,
20125 Milano (MI)
Italy
e-mail: a.ugolotti at campus.unimib.it



More information about the users mailing list