[Q-e-developers] First PWscf GPU-enabled beta-release

Ivan Girotto ivan.girotto at ichec.ie
Thu May 5 18:11:27 CEST 2011


Dear QE users & developers,

We are happy to announce that the first beta GPU-enabled release of 
Quantum ESPRESSO (QE) has been committed today in the official repository.

You can download the new version of the code using the following command:

$ svn checkout 
svn://scm.qe-forge.org/scmrepos/svn/q-e/branches/espresso-PRACE

The Irish Centre for High-End Computing (ICHEC, www.ichec.ie 
<http://www.ichec.ie>) has been mainly responsible for extending the QE 
suite to enhance calculations on NVIDIA GPUs. The porting activity has 
been supported within the PRACE 1st Implementation Phase project. It is 
currently carried out through the Sub-task "Accelerator", led by ICHEC, 
within the Work-Package "Programming Techniques for High-Performance 
Applications" in collaboration with CINECA.

The porting activity is concerned mainly with the PWscf package. But 
ICHEC and the Irish QE user community are interested in exploring any 
other initiatives which come forward from QE users or developers 
interested in porting on GPGPU architecture any of the QE suite related 
codes.

We have successfully accelerated the linear algebra part of the QE suite 
using a library called phiGEMM, some explicit computational kernels 
(newd, addusdense, vloc_psi) and the 3D FFT for the single CPU/GPU 
version. Both linear algebra (matrix multiplication) and the FFT 
accelerated version make use of CUDA libraries. The porting is mainly 
based on wrappers that permit the use of libraries for accelerators. The 
distributed 3D FFT version is still in progress, since this porting 
requires important changes of the current structure of the code and data 
distribution. While running the parallel and distributed multi-GPUs 
version it still uses the original 3D FFT implementations.

The phiGEMM library is distributed as an independent open-source 
external package together with the Quantum ESPRESSO suite. It aims to 
perform matrix multiplication ([SDZ]GEMM) taking advantage of the 
underlying BLAS kernel functions on both CPU and NVIDIA CUDA-based GPU, 
mixing and overlapping computation between the host (CPU) and the 
accelerator (GPU). Whatever code makes intensive use of GEMM it can 
derive a significant advantage linking this library when running on a 
CPU/GPU hybrid system.

Even if the 3D FFT is accelerated only for a single CPU process (not 
when using MPI), other parts are already enabled to take advantage of a 
distributed parallel hybrid system. All of this allows PWscf to 
potentially use distributed message-passing parallelization (MPI) plus 
multi-threading (OpenMP) plus accelerators (NVIDIA GPUs) all together 
and produce good performance enhancement using the latest version of 
NVIDIA GPUs (high performance double precision is needed). This porting 
activity is still in progress, especially the parallel 3D FFT component 
that represents a bottleneck for large calculations. We have been 
testing this beta release using some small/medium benchmarks used in the 
DEISA official bench-suite and several GPU hardware (Tesla and Fermi 
architectures). Special thanks goes to both E4 Computer Engineering and 
CEA for providing access to hybrid GPU systems with differing 
configurations to those available at ICHEC.

We look forward with interest to receiving any suggestions for 
improvement, feedback or request for collaboration by anyone who is 
interested to try and validate PWscf CUDA version on different platforms 
using different scientific cases.For additional information please 
contact qe-gpu at ichec.ie or replay at this mail. We'll be shortly 
available a dedicated forum q-e-gpgpu at qe-forge.org 
<http://qe-forge.org/mail/?group_id=10>. Please use this list for bug 
report and any other issue related to the use of the PWscf GPU version.

Although compilation of the GPU implementation is fairly 
straight-forward, we kindly suggest that users carefully read the 
README.GPU that is included. The intrinsic characteristics of hybrid 
multi- and many-core systems require careful consideration to best 
exploit the available computing power.

Any and all suggestions are welcome and will be very much appreciated.

Ivan Girotto & Filippo Spiga

---

ICHEC GPU developer team

The Tower - 7th floor
Trinity Technology&  Enterprise Campus
Grand Canal Quay - Dublin 2 - Ireland

+353-1-5241608 (ph) / +353-1-7645845 (fax)

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/developers/attachments/20110505/8af10ede/attachment.html>


More information about the developers mailing list