[Pw_forum] Error "Not diagonalizing because representation xx is not done" in "image" parallelization by ph.x
Coiby Xu
coiby.xu at gmail.com
Thu May 5 15:42:38 CEST 2016
Dear Dr. Luo,
Thank you for your detailed reply!
I'm sorry I disabled Mail delivery before so I didn't receive the email
until I checked the mailing list archive.
I've successfully run phonon calculation without using *wf_collect=.true.*
following your advise. This helps reduce the size of *outdir* from 142G to
48G.
For threaded MKL and FFT, I tested one case (-nimage 48 -npool 3 -ntg 2
-ndiag 4). To my surprise, it's marginally slower than the calculation
without* -ntg 2 -ndiag 4*. In PHonon/examples/Image_example, I didn't find
any useful info.
> PH_IMAGE_COMMAND="$PARA_IMAGE_PREFIX $BIN_DIR/ph.x $PARA_IMAGE_POSTFIX"
>
In the file environment_variables, no info about ntg and ndiag are given
> PARA_POSTFIX=" -nk 1 -nd 1 -nb 1 -nt 1 "
> PARA_IMAGE_POSTFIX="-ni 2 $PARA_POSTFIX"
> PARA_IMAGE_PREFIX="mpirun -np 4"
>
I also checked the job log for failed calculation ("Not diagonalizing
because representation xx is not done"). Maybe ph.x crashes due to I/O
problem (the size of outdir was 142G).
forrtl: No such file or directory
> forrtl: No such file or directory
> forrtl: severe (28): CLOSE error, unit 20, file "Unknown"
> Image PC Routine Line
> Source
> ph.x 000000000088A00F Unknown Unknown Unknown
> ph.x 0000000000517B26 buffers_mp_close_ 620
> buffers.f90
> ph.x 00000000004B85E8 close_phq_ 39
> close_phq.f90
> ph.x 00000000004B7888 clean_pw_ph_ 41
> clean_pw_ph.f90
> ph.x 000000000042E5EF do_phonon_ 126
> do_phonon.f90
> ph.x 000000000042A554 MAIN__ 78
> phonon.f90
> ph.x 000000000042A4B6 Unknown Unknown Unknown
> libc.so.6 0000003921A1ED1D Unknown Unknown Unknown
> ph.x 000000000042A3A9 Unknown Unknown
> Unknown
forrtl: severe (28): CLOSE error, unit 20, file "Unknown"
>
Btw, I'm from School of Earth and Space Science of USTC.
On Wed, May 4, 2016 at 07:41:30 CEST, Ye Luo <xw111luoye at gmail.com
<coiby.xu at gmail.com>> wrote:
> Hi Coiby,
>
> "it seems to be one requirement to let ph.x and pw.x have the same number
> of processors."
> This is not true.
>
> If you are using image parallelization in your phonon calculation, you need
> to maintain the same amount of processes per image as your pw calculation.
> In this way, wf_collect=.true. is not needed.
>
> Here is an example. I assume you use k point parallelization (-nk).
> 1, mpirun -np 48 pw.x -nk 12 -inp your_pw.input
> 2, mpirun -np 192 ph.x -ni 4 -nk 12 -inp your_ph.input
> In this step, you might notice "Not diagonalizing because representation
> xx is not done" which is normal.
> The code should not abort because of this.
> 3, After calculating all the representations belongs a given q or q-mesh.
> Just add "recover = .true." in your_ph.input and run
> mpirun -np 48 ph.x -nk 12 -inp your_ph.input
> The dynamical matrix will be computed for that q.
>
> If you are confident with threaded pw.x, ph.x also gets benefit from
> threaded MKL and FFT and the time to solution is further reduced.
>
> For more details, you can look into PHonon/examples/Image_example.
>
> P.S.
> Your affiliation is missing.
>
> ===================
> Ye Luo, Ph.D.
> Leadership Computing Facility
> Argonne National Laboratory
>
>
>
> On Wed, May 4, 2016 at 11:33 AM, Coiby Xu <coiby.xu at gmail.com> wrote:
>
>> Dear Quantum Espresso Developers and Users,
>>
>>
>> I'm running a phonon calculation parallelizing over the representations/q
>> vectors. For my cluster, there are 24 cores per node. I want to use as many
>> nodes as possible to speed up the calculation.
>>
>> I set the number of parallelizations to be the number of nodes,
>>
>>> mpirun -np NUMBER_OF_NODESx24 ph.x -nimage NUMBER_OF_NODES
>>>
>>
>>
>> If I only use 4 nodes (4 images), 8 nodes ( 8 images), the calculation
>> will be finished successfully. However, more than 8 nodes, say 16 or 32
>> nodes, are used, each time running the calculation, such error will be
>> given,
>>
>>> Not diagonalizing because representation xx is not done
>>>
>>
>> Btw, I want to reduce I/O overhead by discarding `wf_collect` option, but
>> the following way doesn't work (the number of processors and pools for scf
>> calculation is the same to that in phonon calculation)
>>
>> mpirun -np NUMBER_OF_NODESx24 pw.x
>>>
>>
>> ph.x complains,
>>
>>> Error in routine phq_readin (1):pw.x run with a different number of
>>> processors.
>>> Use wf_collect=.true.
>>>
>>
>> The beginning output of pw.x,
>>
>>> Parallel version (MPI), running on 96 processors
>>> R & G space division: proc/nbgrp/npool/nimage = 96
>>> Waiting for input...
>>> Reading input from standard input
>>>
>>
>> and the beginning output of ph.x,
>>
>>> Parallel version (MPI), running on 96 processors
>>> path-images division: nimage = 4
>>> R & G space division: proc/nbgrp/npool/nimage = 24
>>>
>>
>> Do I miss something? I know it's inefficient to let pw.x use so many
>> processors, but it seems to be one requirement to let ph.x and pw.x have
>> the same number of processors.
>>
>> Thank you!
>>
>> --
>> *Best regards,*
>> *Coiby*
>>
>>
--
*Best regards,*
*Coiby*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20160505/72753522/attachment.html>
More information about the users
mailing list