[Pw_forum] Memory problems with a job using PWSCF from qe-gpu

Fri Feb 9 01:01:55 CET 2018

Dear folks,

I've managed to compile the gpu-enabled version of PWSCF from the 
sources provided by Filippo Spiga using Portland compilers and MKL 
libraries. The node has CentOS 6.6 and 16 GB of RAM. I ran some 
small tests and results were the same than those obtained with a 
non-gpu version of qe-6.1 compiled with the GNU compilers.

When I try to test the executable with a more realistic job (a Cu 
surface made by 75 atoms with 1 to 6 carbon atoms on it) an "out of 
memory" problem occurs and the job terminates. I must say that that 
job was successfully ran on another similar node (except for the 
fact that it doesn't have a gpu card). When I use "mpirun -np 1" 
before invoking pw.x, I've got

...
      Estimated max dynamical RAM per process >   10128.95MB
      Generating pointlists ...
      new r_m :   0.0689 (alat units)  1.6647 (a.u.) for type    1
      new r_m :   0.0689 (alat units)  1.6647 (a.u.) for type    2

0: ALLOCATE: 2525186688 bytes requested; status = 2(out of memory)
/opt/pgi/linux86-64/17.4/lib/libpgf90_rpm1.so(__fort_abortx+0x17) 
[0x2b646f7f2af7]
   /opt/pgi/linux86-64/17.4/lib/libpgf90.so(__fort_abort+0x5e) 
[0x2b646f41897e]
   /opt/pgi/linux86-64/17.4/lib/libcudafor.so(+0x5ac38) [0x2b6456f6cc38]
/opt/pgi/linux86-64/17.4/lib/libcudafor.so(pgf90_dev_mod_alloc04+0xc9) 
[0x2b6456f6d70e]
   /usr/local/fspiga-qe-gpu-7e1de44/bin/pw.x() [0x5e16d7]
   /usr/local/fspiga-qe-gpu-7e1de44/bin/pw.x() [0x497953]
   /usr/local/fspiga-qe-gpu-7e1de44/bin/pw.x() [0x52b05d]
   /usr/local/fspiga-qe-gpu-7e1de44/bin/pw.x() [0x40d82c]
   /usr/local/fspiga-qe-gpu-7e1de44/bin/pw.x() [0x40d704]
   /lib64/libc.so.6(__libc_start_main+0xfd) [0x36f741ed5d]
   /usr/local/fspiga-qe-gpu-7e1de44/bin/pw.x() [0x40a1c9]
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 23370 on node n13 exited 
on signal 6 (Aborted).
--------------------------------------------------------------------------

When I ran the same job with "mpirun -np 8" then I've got

...
     Estimated total allocated dynamical RAM >   11048.12MB
      Generating pointlists ...
      new r_m :   0.0689 (alat units)  1.6647 (a.u.) for type    1
      new r_m :   0.0689 (alat units)  1.6647 (a.u.) for type    2
0: ALLOCATE: 315564672 bytes requested; status = 2(out of memory)
[a lot of error messages]

I cannot understand the source of the error but I guess that it has 
to do with the gpu card. Running the deviceQuery program that comes 
with CUDA I've got (among a lot of information)

Device 0: "TITAN X (Pascal)"
   CUDA Driver Version / Runtime Version          8.0 / 8.0
   CUDA Capability Major/Minor version number:    6.1
   Total amount of global memory:                 12189 MBytes 
(12781158400 bytes)
   (28) Multiprocessors, (128) CUDA Cores/MP:     3584 CUDA Cores
   GPU Max Clock rate:                            1531 MHz (1.53 GHz)
   Memory Clock rate:                             5005 Mhz
   Memory Bus Width:                              384-bit
   L2 Cache Size:                                 3145728 bytes

Any help is welcome. I can provide the proper input file with the 
corresponding pseudopotentials if requested.

Thanks in advance

Reinaldo Pis Diez
Center of Inorganic Chemistry
Natl Univ of La Plata
Argentina