[Pw_forum] Failed to explicit offload QE on Intel Xeon Phi 7120P (KNC)

Rolly Ng rollyng at gmail.com
Fri Mar 10 18:56:41 CET 2017


Dear QE users,

I am referring to the guide on Explicit offload QE to Xeon Phi KNC 
(7120P) here,

https://software.intel.com/en-us/articles/explicit-offload-for-quantum-espresso

I tried to follow the above steps but I failed to run pw.x (QE v5.3.0) 
on 2 Xeon Phi 7120P using the mpirun.sh script. The error reads:

  allocating buffers        2048  2048        1024
  on device            0
  threshold    20000000000.0000
  allocating buffers        2048        2048        1024
  on device            0
  threshold    20000000000.0000
offload error: cannot create buffer on device 0 (error code 14)
offload error: cannot create buffer on device 0 (error code 14)

This is how I run the script,

[qeuser at node09 ~]$ ~/mpirun/mpirun.sh -p 1 -w 
~/libxphi/xphilibwrapper.sh -x ~/QE530-KNC-OL/espresso-5.3.0/bin/pw.x -i 
~/rolly/AUSURF112/ausurf.in

I have already scp all the lib and bin files to each Xeon Phi 7120P and 
I have also compiled the libxphi lib. This is how the libxphi directory 
reads,

[qeuser at node09 libxphi]$ ls
build-library.sh  libmkl_proxy.so  LICENSE      README.md 
  xphilibmod.mod     xphilib.o          xphilib_proxy.o
clean.sh          libxphi.so       mkl_proxy.c  xphilib.f90 
  xphilibmod.modmic  xphilib_proxy.f90  xphilibwrapper.sh

I suppose this is okay.

However, I found it interesting that I can run a single instance on mic0 
but it is very slow. This is how I did it,

[qeuser at node09 ~]$ export 
LD_LIBRARY_PATH=/home/qeuser/libxphi/:$LD_LIBRARY_PATH

[qeuser at node09 ~]$ 
LD_PRELOAD="/home/qeuser/libxphi/libxphi.so" /home/qeuser/QE530-KNC-OL/espresso-5.3.0/bin/pw.x 
  < /home/qeuser/rolly/AUSURF112/ausurf.in

Error messages were also produced,

  allocating buffers        2048  2048        1024
  on device            0
  threshold    20000000000.0000
ERROR: ld.so: object '/home/qeuser/libxphi/libxphi.so' from LD_PRELOAD 
cannot be preloaded (cannot open shared object file): ignored.
ERROR: ld.so: object '/home/qeuser/libxphi/libxphi.so' from LD_PRELOAD 
cannot be preloaded (cannot open shared object file): ignored.
  buffer allocation   4.02019500732422      s

On the host I can see one copy of pw.x is running, and on mic0 I can see 
that offload_main and coi_daemon are running by the micuser. But it is 
very slow.

So, is this offload error: cannot create buffer on device 0 (error code 
14) related to the mpirun.sh script and the libxphi.so were not 
preloaded even it is present???

I am running CentOS 7.1 + Intel MPSS 3.8.1 + Intel psxe 2017 update 1, 
and I have already made a symbolic link of the psxevars.sh to 
/etc/profile.d and I can use mpirun to pw.x on the host, but not offload 
to mic0 and mic1

Are these compatibility issues because the libxphi and mpirun.sh were 
written 2 years ago? How can these be fixed?

Thank you,

Rolly

-- 

PhD. Research Fellow,
Dept. of Physics & Materials Science,
City University of Hong Kong
Tel: +852 3442 4000
Fax: +852 3442 0538

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20170311/b79603ef/attachment.html>


More information about the users mailing list