[Wannier] Fwd: MPI version large systems
H. Lee
hjunlee at gmail.com
Mon Mar 1 17:44:11 CET 2021
I forgot to forward my replies to W90 mailing lists.
---------- Forwarded message ---------
From: H. Lee <hjunlee at gmail.com>
Date: Mon, Mar 1, 2021 at 10:40 AM
Subject: Re: [Wannier] MPI version large systems
To: Jonathan Backman <jbackman at iis.ee.ethz.ch>
Dear Jonathan:
I think that you have an issue with reading W90 inputs when using 27 cores
even if you have enough memory.
I regenerated all relevant inputs (*.mmn, *.amn, and *.eig) to W90 using
your *.wout and reproduced your issue.
Of course, I just regenerated inputs with the same file size, but contained
random numbers; however, they are surely enough for the test.
I performed the test on the one node with 2 Intel Broadwell processors
running at 2.6 GHz, with 14 cores each (28 cores per node), 128 GB memory,
and GPFS file system.
In this case, I succeeded in reading inputs using 10 cores as you can see
below:
(START OF OUTPUT)
+----------------------------------------------------------------------------+
| b_k Vectors (Ang^-1) and Weights (Ang^2)
|
| ----------------------------------------
|
| No. b_k(x) b_k(y) b_k(z) w_b
|
| --- -------------------------------- --------
|
| 1 0.095739 0.000000 0.000000 54.549228
|
| 2 0.000000 0.095739 0.000000 54.549228
|
| 3 0.000000 0.000000 0.095739 54.549228
|
| 4 0.000000 0.000000 -0.095739 54.549228
|
| 5 0.000000 -0.095739 0.000000 54.549228
|
| 6 -0.095739 0.000000 0.000000 54.549228
|
+----------------------------------------------------------------------------+
| b_k Directions (Ang^-1)
|
| -----------------------
|
| No. x y z
|
| --- --------------------------------
|
| 1 0.095739 0.000000 0.000000
|
| 2 0.000000 0.095739 0.000000
|
| 3 0.000000 0.000000 0.095739
|
+----------------------------------------------------------------------------+
Time to get kmesh 0.022 (sec)
*============================================================================*
| MEMORY ESTIMATE
|
| Maximum RAM allocated during each phase of the calculation
|
*============================================================================*
| Disentanglement 34494.77 Mb
|
| Wannierise: 30250.47 Mb
|
| plot_wannier: 30250.47 Mb
|
*----------------------------------------------------------------------------*
Starting a new Wannier90 calculation ...
Reading overlaps from wannier90.mmn : Created on
Reading projections from wannier90.amn : Created on
Time to read overlaps 1009.729 (sec)
*------------------------------- DISENTANGLE
--------------------------------*
(END OF OUTPUT)
But, when I use the cores larger than 12, I encountered the same issue and
confirmed that the memory footprint increases as the number of cores used
increases.
At this time, I am not sure whether this is due to the increase of internal
MPI memory usage or some kind of buffer for the file system; this is
specific to the system configuration.
The only thing I can tell you is that with your inputs with very large file
size the memory footprint increases with the number of cores used.
So I would suggest you to try to reduce the number of cores to the smaller
one, for example, 8 or 12, etc.
Sincerely,
Hyungjun Lee
UT Austin.
On Fri, Feb 26, 2021 at 5:31 PM Jonathan Backman <jbackman at iis.ee.ethz.ch>
wrote:
> Dear Hyungjun Lee,
>
> Here is the output of the serial run:
>
> ----- START OF OUTPUT -----
>
> Time to get kmesh 0.217 (sec)
>
> *============================================================================*
> | MEMORY
> ESTIMATE |
> | Maximum RAM allocated during each phase of the
> calculation |
>
> *============================================================================*
> | Disentanglement 34494.77
> Mb |
> | Wannierise: 30250.47
> Mb |
> | plot_wannier: 30250.47
> Mb |
>
> *----------------------------------------------------------------------------*
>
> Starting a new Wannier90 calculation ...
>
>
> Reading overlaps from wannier90.mmn : File generated by VASP: unknown
> system
>
> Reading projections from wannier90.amn : Projections from Vasp,
> concatenated by Python.
>
> Time to read overlaps 2195.901 (sec)
>
> ----- END OF OUTPUT -----
>
> Best,
>
> Jonathan
>
>
> On 26/02/2021 21:35, H. Lee wrote:
>
> Dear Jonathan Backman:
>
> Thank you for providing your output.
>
> I think that even if your run stopped during the printing of
> information on b vectors, you would encounter the issue in the next steps.
> In normal run, the W90 output is like the following:
>
> ----- START OF EXAMPLE OUTPUT -----
> ...
>
>
> +----------------------------------------------------------------------------+
>
> | The b-vectors are chosen automatically
> |
>
> | The following shells are used: 1, 2
> |
>
>
> +----------------------------------------------------------------------------+
>
> | Shell # Nearest-Neighbours
> |
>
> | ----- --------------------
> |
>
> | 1 2
> |
>
> | 2 6
> |
>
>
> +----------------------------------------------------------------------------+
>
> | Completeness relation is fully satisfied [Eq. (B1), PRB 56, 12847
> (1997)] |
>
>
> +----------------------------------------------------------------------------+
>
> | b_k Vectors (Ang^-1) and Weights (Ang^2)
> |
>
> | ----------------------------------------
> |
>
> | No. b_k(x) b_k(y) b_k(z) w_b
> |
>
> | --- -------------------------------- --------
> |
>
> | 1 0.000000 0.000000 0.079153 71.124740
> |
>
> | 2 0.000000 0.000000 -0.079153 71.124740
> |
>
> | 3 0.113136 -0.000000 0.026384 26.042079
> |
>
> | 4 -0.113136 0.000000 -0.026384 26.042079
> |
>
> | 5 -0.056568 0.097979 0.026384 26.042079
> |
>
> | 6 0.056568 -0.097979 -0.026384 26.042079
> |
>
> | 7 -0.056568 -0.097979 0.026384 26.042079
> |
>
> | 8 0.056568 0.097979 -0.026384 26.042079
> |
>
>
> +----------------------------------------------------------------------------+
>
> | b_k Directions (Ang^-1)
> |
>
> | -----------------------
> |
>
> | No. x y z
> |
>
> | --- --------------------------------
> |
>
> | 1 0.000000 0.000000 0.079153
> |
>
> | 2 0.113136 -0.000000 0.026384
> |
>
> | 3 -0.056568 0.097979 0.026384
> |
>
> | 4 -0.056568 -0.097979 0.026384
> |
>
>
> +----------------------------------------------------------------------------+
>
>
> *Time to get kmesh ..... (sec)*
>
>
> *============================================================================*
>
> | MEMORY ESTIMATE
> |
>
> | Maximum RAM allocated during each phase of the calculation
> |
>
>
> *============================================================================*
>
> | Disentanglement 9404.64 Mb
> |
>
> | Wannierise: 4942.84 Mb
> |
>
> | plot_wannier: 4942.84 Mb
> |
>
>
> *----------------------------------------------------------------------------*
>
>
> Starting a new Wannier90 calculation ...
>
>
>
> Reading overlaps from wannier90.mmn : File generated by ...
>
>
> Reading projections from wannier90.amn : File generated by ...
>
>
> *Time to read overlaps ..... (sec)*
>
>
> *------------------------------- DISENTANGLE
> --------------------------------*
>
>
> +----------------------------------------------------------------------------+
>
> ...
> ----- END OF EXAMPLE OUTPUT -----
>
> You told me that your serial run proceeded to the disentanglement step.
> Could you let me know the (1) time to get kmesh and (2) time to read
> overlaps (highlighted with red-color texts in the above example output)
> from your W90 output obtained by your serial run?
>
> Sincerely,
>
> Hyungjun Lee
> UT Austin
>
> On Fri, Feb 26, 2021 at 12:23 PM Jonathan Backman <jbackman at iis.ee.ethz.ch>
> wrote:
>
>> Dear Hyungjun Lee,
>>
>> Thank you for the help.
>>
>> I have attached the W90 output file from my calculation. As you can see
>> it stops printing output after picking Shell.
>>
>> When running the calculation using the serial version the output also
>> stops at this point for a long time. Then after a few days it shows that it
>> has done a few steps of the disentanglement, this however never happens for
>> the parallel run.
>>
>> Best,
>>
>> Jonathan
>>
>>
>> On 26/02/2021 18:38, H. Lee wrote:
>>
>> Dear Jonathan Backman:
>>
>> Could you show me your Wannier90 (W90) output with the high verbosity so
>> that I can identify the step at which W90 got stuck; in particular, I would
>> like to know whether your run passed the reading of relevant input
>> matrices, for instance, Mmn.
>>
>> I assume that you use the disentanglement.
>> In this case, the largest array is the global complex-valued array
>> (attributed as SAVE and PUBLIC) of m_matrix_orig and it is only allocated
>> on the ROOT node with the size (in your case) of 16 x 2688 x 2688 x 8 x 27
>> = about 25 GB (I assume nntot is 8 in your case).
>>
>> One of the issues in the current implementation is that even if this
>> matrix is not used any more outside of the subroutine of overlap_read when
>> gamma_only is false, it is not deallocated immediately after reading Mmn
>> and scattering it in the subroutine of overlap_read, thereby leading to the
>> very large (unbalanced) memory footprint on the ROOT node.
>>
>> I understand that in your case, memory might not be a problem, but I
>> would like to confirm it by looking at your W90 output.
>>
>> Sincerely,
>>
>> Hyungjun Lee
>> UT Austin
>>
>> On Fri, Feb 26, 2021 at 4:22 AM Jonathan Backman <jbackman at iis.ee.ethz.ch>
>> wrote:
>>
>>> Dear All,
>>>
>>> I'm trying to run Wannier90 using MPI for a large system.
>>>
>>> 2688 Bloch states, 2048 Wannier functions, and 27 K-points. (3x3x3 grid).
>>>
>>> AMN file size: 6 GB
>>>
>>> MMN file size: 42 GB
>>>
>>> My system does not run out of memory during the parallel run (512 GB
>>> available).
>>>
>>> When using one MPI process then calculation progresses, but very slow
>>> due to the size. However, when running using multiple MPI processes
>>> the calculations runs but does not progress at all, I have tried waiting
>>> over 2 weeks. I tried different number of MPI processes, but I would
>>> assume 27 would be the best since I have 27 k-points.
>>>
>>> Does anyone have experience with the MPI version of the code for large
>>> systems? Are there any specific setting that should be used when running
>>> using MPI?
>>>
>>> Best regards,
>>>
>>> Jonathan Backman, ETH Zürich
>>>
>>>
>>>
>>> _______________________________________________
>>> Wannier mailing list
>>> Wannier at lists.quantum-espresso.org
>>> https://lists.quantum-espresso.org/mailman/listinfo/wannier
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/wannier/attachments/20210301/f4e7f7e2/attachment-0001.html>
More information about the Wannier
mailing list