[Pw_forum] [qe-gpu]

Filippo Spiga spiga.filippo at gmail.com
Sat Jun 13 12:51:25 CEST 2015


Dear Anubhav,

run in parallel, 2 MPI and make sure CUDA_VISIBLE_DEVICES is set such

MPI rank 0 -> GPU id 1 (K20)
MPI rank 1 -> GPU id 2 (K20)

Those K20 GPU are active cooled cards, how many sockets this server (or workstation?) have?

F
 
> On Jun 13, 2015, at 11:08 AM, Anubhav Kumar <kanubhav at iitk.ac.in> wrote:
> 
> Dear QE users
> 
> I have configured qe-gpu 14.10.0 with espresso-5.1.2.Parallel compilation
> was successful, but when i run ./pw-gpu.x it gives the following output
> 
> ***WARNING: unbalanced configuration (1 MPI per node, 3 GPUs per node)
> 
>     *******************************************************************
> 
>       GPU-accelerated Quantum ESPRESSO (svn rev. unknown)
>       (parallel: Y , MAGMA : N )
> 
>     *******************************************************************
> 
> 
>     Program PWSCF v.5.1.2 starts on 13Jun2015 at 15:23:59
> 
>     This program is part of the open-source Quantum ESPRESSO suite
>     for quantum simulation of materials; please cite
>         "P. Giannozzi et al., J. Phys.:Condens. Matter 21 395502 (2009);
>          URL http://www.quantum-espresso.org",
>     in publications or presentations arising from this work. More details at
>     http://www.quantum-espresso.org/quote
> 
>     Parallel version (MPI & OpenMP), running on      24 processor cores
>     Number of MPI processes:                 1
>     Threads/MPI process:                    24
>     Waiting for input...
> 
> 
> However when i again run the same command, it gives
> 
> ***WARNING: unbalanced configuration (1 MPI per node, 3 GPUs per node)
> 
> Program received signal SIGSEGV: Segmentation fault - invalid memory
> reference.
> 
> Backtrace for this error:
> #0  0x7FB5001B57D7
> #1  0x7FB5001B5DDE
> #2  0x7FB4FF4C4D3F
> #3  0x7FB4F3391D40
> #4  0x7FB4F33666C3
> #5  0x7FB4F3364C80
> #6  0x7FB4F33759EF
> #7  0x7FB4F345CA1F
> #8  0x7FB4F345CD2F
> #9  0x7FB500B7DBCC
> #10  0x7FB500B7094F
> #11  0x7FB500B7CC56
> #12  0x7FB500B81410
> #13  0x7FB500B7507B
> #14  0x7FB500B6179D
> #15  0x7FB500B940A0
> #16  0x7FB5009BA047
> #17  0x8A4EA3 in phiGemmInit
> #18  0x76F55E in initcudaenv_
> #19  0x66AE90 in __mp_MOD_mp_start at mp.f90:184
> #20  0x66E192 in __mp_world_MOD_mp_world_start at mp_world.f90:58
> #21  0x66DCC0 in __mp_global_MOD_mp_startup at mp_global.f90:65
> #22  0x4082A0 in pwscf at pwscf.f90:23
> #23  0x7FB4FF4AFEC4
> Segmentation fault
> 
> Kindly help me out in solving the problem. My GPU details are
> 
> +------------------------------------------------------+
> | NVIDIA-SMI 346.46     Driver Version: 346.46         |
> |-------------------------------+----------------------+----------------------+
> | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr.
> ECC |
> | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute
> M. |
> |===============================+======================+======================|
> |   0  Tesla C2050         Off  | 0000:02:00.0      On |                  
> 0 |
> | 30%   62C   P12    N/A /  N/A |     87MiB /  2687MiB |      0%     
> Default |
> +-------------------------------+----------------------+----------------------+
> |   1  Tesla K20c          Off  | 0000:83:00.0     Off |                  
> 0 |
> | 42%   55C    P0    46W / 225W |   4578MiB /  4799MiB |      0%     
> Default |
> +-------------------------------+----------------------+----------------------+
> |   2  Tesla K20c          Off  | 0000:84:00.0     Off |                  
> 0 |
> | 34%   46C    P8    17W / 225W |     14MiB /  4799MiB |      0%     
> Default |
> +-------------------------------+----------------------+----------------------+
> 
> +-----------------------------------------------------------------------------+
> | Processes:                                                       GPU
> Memory |
> |  GPU       PID  Type  Process name                               Usage  
>   |
> |=============================================================================|
> |    1     27680    C   ./pw-gpu.x                                   
> 4563MiB |
> +-----------------------------------------------------------------------------+
> _______________________________________________
> Pw_forum mailing list
> Pw_forum at pwscf.org
> http://pwscf.org/mailman/listinfo/pw_forum

--
Mr. Filippo SPIGA, M.Sc.
http://fspiga.github.io ~ skype: filippo.spiga

«Nobody will drive us out of Cantor's paradise.» ~ David Hilbert

*****
Disclaimer: "Please note this message and any attachments are CONFIDENTIAL and may be privileged or otherwise protected from disclosure. The contents are not to be disclosed to anyone other than the addressee. Unauthorized recipients are requested to preserve this confidentiality and to advise the sender immediately of any error in transmission."






More information about the users mailing list