[Wannier] Fwd: MPI version large systems

H. Lee hjunlee at gmail.com
Mon Mar 1 17:44:11 CET 2021


I forgot to forward my replies to W90 mailing lists.

---------- Forwarded message ---------
From: H. Lee <hjunlee at gmail.com>
Date: Mon, Mar 1, 2021 at 10:40 AM
Subject: Re: [Wannier] MPI version large systems
To: Jonathan Backman <jbackman at iis.ee.ethz.ch>


Dear Jonathan:

I think that you have an issue with reading W90 inputs when using 27 cores
even if you have enough memory.

I regenerated all relevant inputs (*.mmn, *.amn, and *.eig) to W90 using
your *.wout and reproduced your issue.
Of course, I just regenerated inputs with the same file size, but contained
random numbers; however, they are surely enough for the test.

I performed the test on the one node with 2 Intel Broadwell processors
running at 2.6 GHz, with 14 cores each (28 cores per node), 128 GB memory,
and GPFS file system.
In this case, I succeeded in reading inputs using 10 cores as you can see
below:

(START OF OUTPUT)
 +----------------------------------------------------------------------------+

 |                  b_k Vectors (Ang^-1) and Weights (Ang^2)
  |

 |                  ----------------------------------------
  |

 |            No.         b_k(x)      b_k(y)      b_k(z)        w_b
  |

 |            ---        --------------------------------     --------
  |

 |             1         0.095739    0.000000    0.000000    54.549228
  |

 |             2         0.000000    0.095739    0.000000    54.549228
  |

 |             3         0.000000    0.000000    0.095739    54.549228
  |

 |             4         0.000000    0.000000   -0.095739    54.549228
  |

 |             5         0.000000   -0.095739    0.000000    54.549228
  |

 |             6        -0.095739    0.000000    0.000000    54.549228
  |

 +----------------------------------------------------------------------------+

 |                           b_k Directions (Ang^-1)
  |

 |                           -----------------------
  |

 |            No.           x           y           z
  |

 |            ---        --------------------------------
  |

 |             1         0.095739    0.000000    0.000000
  |

 |             2         0.000000    0.095739    0.000000
  |

 |             3         0.000000    0.000000    0.095739
  |

 +----------------------------------------------------------------------------+


 Time to get kmesh              0.022 (sec)

 *============================================================================*

 |                              MEMORY ESTIMATE
  |

 |         Maximum RAM allocated during each phase of the calculation
  |

 *============================================================================*

 |                        Disentanglement        34494.77 Mb
  |

 |                            Wannierise:        30250.47 Mb
  |

 |                          plot_wannier:        30250.47 Mb
  |

 *----------------------------------------------------------------------------*


 Starting a new Wannier90 calculation ...



 Reading overlaps from wannier90.mmn    :  Created on


 Reading projections from wannier90.amn :  Created on


 Time to read overlaps       1009.729 (sec)


 *------------------------------- DISENTANGLE
--------------------------------*
(END OF OUTPUT)

But, when I use the cores larger than 12, I encountered the same issue and
confirmed that the memory footprint increases as the number of cores used
increases.
At this time, I am not sure whether this is due to the increase of internal
MPI memory usage or some kind of buffer for the file system; this is
specific to the system configuration.

The only thing I can tell you is that with your inputs with very large file
size the memory footprint increases with the number of cores used.
So I would suggest you to try to reduce the number of cores to the smaller
one, for example, 8 or 12, etc.

Sincerely,

Hyungjun Lee
UT Austin.


On Fri, Feb 26, 2021 at 5:31 PM Jonathan Backman <jbackman at iis.ee.ethz.ch>
wrote:

> Dear Hyungjun Lee,
>
> Here is the output of the serial run:
>
> ----- START OF OUTPUT -----
>
> Time to get kmesh              0.217 (sec)
>
>  *============================================================================*
>  |                              MEMORY
> ESTIMATE                               |
>  |         Maximum RAM allocated during each phase of the
> calculation         |
>
>  *============================================================================*
>  |                        Disentanglement        34494.77
> Mb                  |
>  |                            Wannierise:        30250.47
> Mb                  |
>  |                          plot_wannier:        30250.47
> Mb                  |
>
>  *----------------------------------------------------------------------------*
>
>  Starting a new Wannier90 calculation ...
>
>
>  Reading overlaps from wannier90.mmn    : File generated by VASP: unknown
> system
>
>  Reading projections from wannier90.amn : Projections from Vasp,
> concatenated by Python.
>
>  Time to read overlaps       2195.901 (sec)
>
> ----- END OF OUTPUT -----
>
> Best,
>
> Jonathan
>
>
> On 26/02/2021 21:35, H. Lee wrote:
>
> Dear Jonathan Backman:
>
> Thank you for providing your output.
>
> I think that even if your run stopped during the printing of
> information on b vectors, you would encounter the issue in the next steps.
> In normal run, the W90 output is like the following:
>
> ----- START OF EXAMPLE OUTPUT -----
> ...
>
>
> +----------------------------------------------------------------------------+
>
>  | The b-vectors are chosen automatically
>     |
>
>  | The following shells are used:   1,  2
>     |
>
>
> +----------------------------------------------------------------------------+
>
>  |                        Shell   # Nearest-Neighbours
>     |
>
>  |                        -----   --------------------
>     |
>
>  |                          1               2
>     |
>
>  |                          2               6
>     |
>
>
> +----------------------------------------------------------------------------+
>
>  | Completeness relation is fully satisfied [Eq. (B1), PRB 56, 12847
> (1997)]  |
>
>
> +----------------------------------------------------------------------------+
>
>  |                  b_k Vectors (Ang^-1) and Weights (Ang^2)
>     |
>
>  |                  ----------------------------------------
>     |
>
>  |            No.         b_k(x)      b_k(y)      b_k(z)        w_b
>     |
>
>  |            ---        --------------------------------     --------
>     |
>
>  |             1         0.000000    0.000000    0.079153    71.124740
>     |
>
>  |             2         0.000000    0.000000   -0.079153    71.124740
>     |
>
>  |             3         0.113136   -0.000000    0.026384    26.042079
>     |
>
>  |             4        -0.113136    0.000000   -0.026384    26.042079
>     |
>
>  |             5        -0.056568    0.097979    0.026384    26.042079
>     |
>
>  |             6         0.056568   -0.097979   -0.026384    26.042079
>     |
>
>  |             7        -0.056568   -0.097979    0.026384    26.042079
>     |
>
>  |             8         0.056568    0.097979   -0.026384    26.042079
>     |
>
>
> +----------------------------------------------------------------------------+
>
>  |                           b_k Directions (Ang^-1)
>     |
>
>  |                           -----------------------
>     |
>
>  |            No.           x           y           z
>     |
>
>  |            ---        --------------------------------
>     |
>
>  |             1         0.000000    0.000000    0.079153
>     |
>
>  |             2         0.113136   -0.000000    0.026384
>     |
>
>  |             3        -0.056568    0.097979    0.026384
>     |
>
>  |             4        -0.056568   -0.097979    0.026384
>     |
>
>
> +----------------------------------------------------------------------------+
>
>
>  *Time to get kmesh              ..... (sec)*
>
>
> *============================================================================*
>
>  |                              MEMORY ESTIMATE
>     |
>
>  |         Maximum RAM allocated during each phase of the calculation
>     |
>
>
> *============================================================================*
>
>  |                        Disentanglement         9404.64 Mb
>     |
>
>  |                            Wannierise:         4942.84 Mb
>     |
>
>  |                          plot_wannier:         4942.84 Mb
>     |
>
>
> *----------------------------------------------------------------------------*
>
>
>  Starting a new Wannier90 calculation ...
>
>
>
>  Reading overlaps from wannier90.mmn    : File generated by ...
>
>
>  Reading projections from wannier90.amn : File generated by ...
>
>
>  *Time to read overlaps        ..... (sec)*
>
>
>  *------------------------------- DISENTANGLE
> --------------------------------*
>
>
> +----------------------------------------------------------------------------+
>
> ...
> ----- END OF EXAMPLE OUTPUT -----
>
> You told me that your serial run proceeded to the disentanglement step.
> Could you let me know the (1) time to get kmesh and (2) time to read
> overlaps (highlighted with red-color texts in the above example output)
> from your W90 output obtained by your serial run?
>
> Sincerely,
>
> Hyungjun Lee
> UT Austin
>
> On Fri, Feb 26, 2021 at 12:23 PM Jonathan Backman <jbackman at iis.ee.ethz.ch>
> wrote:
>
>> Dear Hyungjun Lee,
>>
>> Thank you for the help.
>>
>> I have attached the W90 output file from my calculation. As you can see
>> it stops printing output after picking Shell.
>>
>> When running the calculation using the serial version the output also
>> stops at this point for a long time. Then after a few days it shows that it
>> has done a few steps of the disentanglement, this however never happens for
>> the parallel run.
>>
>> Best,
>>
>> Jonathan
>>
>>
>> On 26/02/2021 18:38, H. Lee wrote:
>>
>> Dear Jonathan Backman:
>>
>> Could you show me your Wannier90 (W90) output with the high verbosity so
>> that I can identify the step at which W90 got stuck; in particular, I would
>> like to know whether your run passed the reading of relevant input
>> matrices, for instance, Mmn.
>>
>> I assume that you use the disentanglement.
>> In this case, the largest array is the global complex-valued array
>> (attributed as SAVE and PUBLIC) of m_matrix_orig and it is only allocated
>> on the ROOT node with the size (in your case) of 16 x 2688 x 2688 x 8 x 27
>> = about 25 GB (I assume nntot is 8 in your case).
>>
>> One of the issues in the current implementation is that even if this
>> matrix is not used any more outside of the subroutine of overlap_read when
>> gamma_only is false, it is not deallocated immediately after reading Mmn
>> and scattering it in the subroutine of overlap_read, thereby leading to the
>> very large (unbalanced) memory footprint on the ROOT node.
>>
>> I understand that in your case, memory might not be a problem, but I
>> would like to confirm it by looking at your W90 output.
>>
>> Sincerely,
>>
>> Hyungjun Lee
>> UT Austin
>>
>> On Fri, Feb 26, 2021 at 4:22 AM Jonathan Backman <jbackman at iis.ee.ethz.ch>
>> wrote:
>>
>>> Dear All,
>>>
>>> I'm trying to run Wannier90 using MPI for a large system.
>>>
>>> 2688 Bloch states, 2048 Wannier functions, and 27 K-points. (3x3x3 grid).
>>>
>>> AMN file size: 6 GB
>>>
>>> MMN file size: 42 GB
>>>
>>> My system does not run out of memory during the parallel run (512 GB
>>> available).
>>>
>>> When using one MPI process then calculation progresses, but very slow
>>> due to the size.  However, when running using multiple MPI processes
>>> the calculations runs but does not progress at all, I have tried waiting
>>> over 2 weeks.  I tried different number of MPI processes, but I would
>>> assume 27 would be the best since I have 27 k-points.
>>>
>>> Does anyone have experience with the MPI version of the code for large
>>> systems? Are there any specific setting that should be used when running
>>> using MPI?
>>>
>>> Best regards,
>>>
>>> Jonathan Backman, ETH Zürich
>>>
>>>
>>>
>>> _______________________________________________
>>> Wannier mailing list
>>> Wannier at lists.quantum-espresso.org
>>> https://lists.quantum-espresso.org/mailman/listinfo/wannier
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/wannier/attachments/20210301/f4e7f7e2/attachment-0001.html>


More information about the Wannier mailing list