<div dir="ltr"><div>Hi Paolo,</div><div><br></div><div></div><div>In the code QMCPACK I work on, we typedef float/double as RealType and use RealType in places switching precision is desired (performance matters in particle and wavefunction data). This avoids putting ifdef everywhere. In places where high precision is needed, just define another type EstimatorRealType and stick to it. For QE, BP(base precision) can be defined next to DP and ifdef its definition.<br></div><div>For library calls, every BLAS calls used in the code needs to be wrapped with an interface (GEMM) so the right one can be picked based on the arguments(S/C/D/Zgemm).<br></div><div>Then we just build a mixed precision and a full precision binaries. This also desires better out-of-source build capability. In places precision selection at runtime is desired, we use C++ template which is hard for fortran.</div><div><br></div><div>OpenMP 4.5 or 5.0 (upcoming) offload may also have a chance for single source on multiple architectures (CPU/GPU). The multi-core+simd and SMX+cuda-cores have some similarity. Unfortunately, current fortran compilers are still at an early stage.</div><div>I will try it with another fortran code and see how that goes and then give updates to the QE community.<br></div><div><br></div><div>Best,<br></div><div>Ye<br></div></div><div class="gmail_extra"><br clear="all"><div><div class="gmail_signature" data-smartmail="gmail_signature"><div dir="ltr">===================<br>

Ye Luo, Ph.D.<br>

Leadership Computing Facility<br>

Argonne National Laboratory</div></div></div>

<br><div class="gmail_quote">2018-08-06 12:46 GMT-05:00 Paolo Giannozzi <span dir="ltr"><<a href="mailto:p.giannozzi@gmail.com" target="_blank">p.giannozzi@gmail.com</a>></span>:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div>Hi Ye, we know quite well the "divergence" problem you mention, much less about the solutions. If you have good ideas, or just ideas, feel free to propose them. For GPU's, we are trying to isolate strongly GPU-specific parts. For mixed precision, we are at preliminary tests. <br><br>I perfectly agree about unit testing, at least for those parts of QE for which it is suitable. For instance: FFTXlib has already some unit tests; LAXlib could have some; for KS_Solvers, it's not going to be easy. It should also be easy to set up unit tests for exchange-correlation functionals and for symmetry-related stuff.<br><br></div><div>Paolo<br></div><div><br><div class="gmail_extra"><div class="gmail_quote"><div><div class="h5">On Sun, Jul 29, 2018 at 4:06 AM, Ye Luo <span dir="ltr"><<a href="mailto:xw111luoye@gmail.com" target="_blank">xw111luoye@gmail.com</a>></span> wrote:<br></div></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><div class="h5"><div dir="ltr"><div>Hi developers,</div><div><br></div><div></div><div>Many features currently under development are getting QE ready for the future. This is very encouraging.</div><div>However, I feel the code is diverging and needs attention.</div><div>The upcoming GPU code is basically an internal fork with large amount of lines duplicated.The mixed precision attempt also duplicates functions.</div><div>QE already has gamma/k or complex/real forks inside the code. The more forks added to the code, the more needed maintenance and potential bugs.</div><div><br></div><div></div><div>Although fortran has its language limitation, some design choices may help reducing the complexity. Having one binary is appealing but allowing building multiple binaries and fixing precision at compile time to minimize the lines of duplicated code is more beneficial long term.</div><div>The CPU/GPU gap may have a chance to get closed by performance portable solutions although the software stack is not matured especially on the fortran side at the moment.<br></div><div><br></div><div>It is very challenging and the manpower is often limited. Although the reality is often a compromise, I believe QE developers figure out the best way.<br></div><div><br></div><div>One more thing, testing is very important. Although there are some tests but more tests are needed not only integration tests but also unit tests.<br></div><div><br></div><div>Best,<br></div><div>Ye<br></div><div><div><div><div dir="ltr" class="m_7138553132878846888gmail-m_-4324771363652656777gmail_signature"><div dir="ltr">===================<br>

Ye Luo, Ph.D.<br>

Leadership Computing Facility<br>

Argonne National Laboratory</div></div></div></div></div></div>

<br></div></div>______________________________<wbr>_________________<br>

developers mailing list<br>

<a href="mailto:developers@lists.quantum-espresso.org" target="_blank">developers@lists.quantum-espre<wbr>sso.org</a><br>

<a href="https://lists.quantum-espresso.org/mailman/listinfo/developers" rel="noreferrer" target="_blank">https://lists.quantum-espresso<wbr>.org/mailman/listinfo/develope<wbr>rs</a><br>

<br></blockquote></div><span class="HOEnZb"><font color="#888888"><br><br clear="all"><br>-- <br><div class="m_7138553132878846888gmail_signature"><div dir="ltr"><div><div dir="ltr"><div>Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e Fisiche,<br>Univ. <a href="https://maps.google.com/?q=Udine,+via+delle+Scienze+208,+33100+Udine,+Italy&entry=gmail&source=g">Udine, via delle Scienze 208, 33100 Udine, Italy</a><br>Phone +39-0432-558216, fax +39-0432-558222<br><br></div></div></div></div></div>

</font></span></div></div></div>

</blockquote></div><br></div>