[Pw_forum] Memory problems with a job using PWSCF from qe-gpu
Reinaldo Pis Diez
reinaldo.pisdiez at gmail.com
Fri Feb 9 01:01:55 CET 2018
Dear folks,
I've managed to compile the gpu-enabled version of PWSCF from the
sources provided by Filippo Spiga using Portland compilers and MKL
libraries. The node has CentOS 6.6 and 16 GB of RAM. I ran some
small tests and results were the same than those obtained with a
non-gpu version of qe-6.1 compiled with the GNU compilers.
When I try to test the executable with a more realistic job (a Cu
surface made by 75 atoms with 1 to 6 carbon atoms on it) an "out of
memory" problem occurs and the job terminates. I must say that that
job was successfully ran on another similar node (except for the
fact that it doesn't have a gpu card). When I use "mpirun -np 1"
before invoking pw.x, I've got
...
Estimated max dynamical RAM per process > 10128.95MB
Generating pointlists ...
new r_m : 0.0689 (alat units) 1.6647 (a.u.) for type 1
new r_m : 0.0689 (alat units) 1.6647 (a.u.) for type 2
0: ALLOCATE: 2525186688 bytes requested; status = 2(out of memory)
/opt/pgi/linux86-64/17.4/lib/libpgf90_rpm1.so(__fort_abortx+0x17)
[0x2b646f7f2af7]
/opt/pgi/linux86-64/17.4/lib/libpgf90.so(__fort_abort+0x5e)
[0x2b646f41897e]
/opt/pgi/linux86-64/17.4/lib/libcudafor.so(+0x5ac38) [0x2b6456f6cc38]
/opt/pgi/linux86-64/17.4/lib/libcudafor.so(pgf90_dev_mod_alloc04+0xc9)
[0x2b6456f6d70e]
/usr/local/fspiga-qe-gpu-7e1de44/bin/pw.x() [0x5e16d7]
/usr/local/fspiga-qe-gpu-7e1de44/bin/pw.x() [0x497953]
/usr/local/fspiga-qe-gpu-7e1de44/bin/pw.x() [0x52b05d]
/usr/local/fspiga-qe-gpu-7e1de44/bin/pw.x() [0x40d82c]
/usr/local/fspiga-qe-gpu-7e1de44/bin/pw.x() [0x40d704]
/lib64/libc.so.6(__libc_start_main+0xfd) [0x36f741ed5d]
/usr/local/fspiga-qe-gpu-7e1de44/bin/pw.x() [0x40a1c9]
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 23370 on node n13 exited
on signal 6 (Aborted).
--------------------------------------------------------------------------
When I ran the same job with "mpirun -np 8" then I've got
...
Estimated total allocated dynamical RAM > 11048.12MB
Generating pointlists ...
new r_m : 0.0689 (alat units) 1.6647 (a.u.) for type 1
new r_m : 0.0689 (alat units) 1.6647 (a.u.) for type 2
0: ALLOCATE: 315564672 bytes requested; status = 2(out of memory)
[a lot of error messages]
I cannot understand the source of the error but I guess that it has
to do with the gpu card. Running the deviceQuery program that comes
with CUDA I've got (among a lot of information)
Device 0: "TITAN X (Pascal)"
CUDA Driver Version / Runtime Version 8.0 / 8.0
CUDA Capability Major/Minor version number: 6.1
Total amount of global memory: 12189 MBytes
(12781158400 bytes)
(28) Multiprocessors, (128) CUDA Cores/MP: 3584 CUDA Cores
GPU Max Clock rate: 1531 MHz (1.53 GHz)
Memory Clock rate: 5005 Mhz
Memory Bus Width: 384-bit
L2 Cache Size: 3145728 bytes
Any help is welcome. I can provide the proper input file with the
corresponding pseudopotentials if requested.
Thanks in advance
Reinaldo Pis Diez
Center of Inorganic Chemistry
Natl Univ of La Plata
Argentina
More information about the users
mailing list