[Pw_forum] Error "Not diagonalizing because representation xx is not done" in "image" parallelization by ph.x

Thu May 5 15:42:38 CEST 2016

Dear Dr. Luo,

Thank you for your detailed reply!

I'm sorry I disabled Mail delivery before so I didn't receive the email
until I checked the mailing list archive.

I've successfully run phonon calculation without using *wf_collect=.true.*
following your advise. This helps reduce the size of *outdir* from 142G to
48G.

For threaded MKL and FFT, I tested one case (-nimage  48 -npool 3 -ntg 2
-ndiag 4). To my surprise, it's marginally slower than the calculation
without* -ntg 2 -ndiag 4*. In PHonon/examples/Image_example, I didn't find
any useful info.

> PH_IMAGE_COMMAND="$PARA_IMAGE_PREFIX $BIN_DIR/ph.x $PARA_IMAGE_POSTFIX"
>

In the file environment_variables, no info about ntg and ndiag are given

> PARA_POSTFIX=" -nk 1 -nd 1 -nb 1 -nt 1 "
> PARA_IMAGE_POSTFIX="-ni 2 $PARA_POSTFIX"
> PARA_IMAGE_PREFIX="mpirun -np 4"
>

I also checked the job log for failed calculation ("Not diagonalizing
because representation xx is not done"). Maybe ph.x crashes due to I/O
problem (the size of outdir was 142G).

forrtl: No such file or directory
> forrtl: No such file or directory
> forrtl: severe (28): CLOSE error, unit 20, file "Unknown"
> Image              PC                Routine            Line
> Source
> ph.x               000000000088A00F  Unknown               Unknown  Unknown
> ph.x               0000000000517B26  buffers_mp_close_         620
> buffers.f90
> ph.x               00000000004B85E8  close_phq_                 39
> close_phq.f90
> ph.x               00000000004B7888  clean_pw_ph_               41
> clean_pw_ph.f90
> ph.x               000000000042E5EF  do_phonon_                126
> do_phonon.f90
> ph.x               000000000042A554  MAIN__                     78
> phonon.f90
> ph.x               000000000042A4B6  Unknown               Unknown  Unknown
> libc.so.6          0000003921A1ED1D  Unknown               Unknown  Unknown
> ph.x               000000000042A3A9  Unknown               Unknown
> Unknown

forrtl: severe (28): CLOSE error, unit 20, file "Unknown"
>

Btw, I'm from School of Earth and Space Science of USTC.

On Wed, May 4, 2016 at 07:41:30 CEST, Ye Luo <xw111luoye at gmail.com
<coiby.xu at gmail.com>> wrote:

> Hi Coiby,
>
> "it seems to be one requirement to let ph.x and pw.x have the same number
> of processors."
> This is not true.
>
> If you are using image parallelization in your phonon calculation, you need
> to maintain the same amount of processes per image as your pw calculation.
> In this way, wf_collect=.true. is not needed.
>
> Here is an example. I assume you use k point parallelization (-nk).
> 1, mpirun -np 48 pw.x -nk 12 -inp your_pw.input
> 2, mpirun -np 192 ph.x -ni 4 -nk 12 -inp your_ph.input
> In this step, you might notice "Not diagonalizing because representation
> xx is not done" which is normal.
> The code should not abort because of this.
> 3, After calculating all the representations belongs a given q or q-mesh.
>     Just add "recover = .true."  in your_ph.input and run
>     mpirun -np 48 ph.x -nk  12 -inp your_ph.input
>     The dynamical matrix will be computed for that q.
>
> If you are confident with threaded pw.x, ph.x also gets benefit from
> threaded MKL and FFT and the time to solution is further reduced.
>
> For more details, you can look into PHonon/examples/Image_example.
>
> P.S.
> Your affiliation is missing.
>
> ===================
> Ye Luo, Ph.D.
> Leadership Computing Facility
> Argonne National Laboratory
>
>
>
> On Wed, May 4, 2016 at 11:33 AM, Coiby Xu <coiby.xu at gmail.com> wrote:
>
>> Dear Quantum Espresso Developers and Users,
>>
>>
>> I'm running a phonon calculation parallelizing over the representations/q
>> vectors. For my cluster, there are 24 cores per node. I want to use as many
>> nodes as possible to speed up the calculation.
>>
>> I set the number of parallelizations to be the number of nodes,
>>
>>> mpirun -np NUMBER_OF_NODESx24  ph.x -nimage NUMBER_OF_NODES
>>>
>>
>>
>> If I only use 4 nodes (4 images), 8 nodes ( 8 images), the calculation
>> will be finished successfully. However, more than 8 nodes, say 16 or 32
>> nodes, are used, each time running the calculation, such error will be
>> given,
>>
>>> Not diagonalizing because representation  xx is not done
>>>
>>
>> Btw, I want to reduce I/O overhead by discarding `wf_collect` option, but
>> the following way doesn't work (the number of processors and pools for scf
>> calculation is the same to that in phonon calculation)
>>
>> mpirun -np NUMBER_OF_NODESx24  pw.x
>>>
>>
>> ph.x complains,
>>
>>> Error in routine phq_readin (1):pw.x run with a different number of
>>> processors.
>>> Use wf_collect=.true.
>>>
>>
>> The beginning output of pw.x,
>>
>>>     Parallel version (MPI), running on    96 processors
>>>      R & G space division:  proc/nbgrp/npool/nimage =      96
>>>      Waiting for input...
>>>      Reading input from standard input
>>>
>>
>> and the beginning output of ph.x,
>>
>>>  Parallel version (MPI), running on    96 processors
>>>      path-images division:  nimage    =       4
>>>      R & G space division:  proc/nbgrp/npool/nimage =      24
>>>
>>
>> Do I miss something? I know it's inefficient to let pw.x use so many
>> processors, but it seems to be one requirement to let ph.x and pw.x have
>> the same number of processors.
>>
>> Thank you!
>>
>> --
>> *Best regards,*
>> *Coiby*
>>
>>

-- 
*Best regards,*
*Coiby*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20160505/72753522/attachment.html>