[Pw_forum] Pw_forum Digest, Vol 14, Issue 46

L.F.Huang lfhuang at theory.issp.ac.cn
Wed Aug 27 03:38:08 CEST 2008


Deaar Stewart:
  Thank you for your advice and your question! And I am sorry for that
I have made a mistake: When I found calculation on 8 nodes is not fast
enouph for me, I changed up to 12 nodes, keeping in mind I was using
8 nodes(Ah, so forgetful!). The number of reduced k points(52) can not
be divided by 12, but 8 or 13. And when I changed it to 13 nodes, It
works well. It may be clear now that the error is due to the
communication failure as you said.

Thanks for your kind heart!
Best Wishes!

Yours Sincerely
Huang


> Date: Tue, 26 Aug 2008 08:32:52 -0400
> From: stewart at cnf.cornell.edu
> Subject: Re: [Pw_forum] about one "fatal error"
> To: PWSCF Forum <pw_forum at pwscf.org>
> Message-ID: <20080826123252.99698.qmail at mail.spidergraphics.com>
> Content-Type: text/plain; charset="utf-8"; format=flowed
>
> Dear L. F. Huang,
>
> I don't think that this is a Quantum Espresso error.  It looks like your
>  calculation was working well and then node 10 on your cluster failed or
> lost  communication with the rest of the network.  Are you still able to
> log into  node 10?
>
> Best regards,
>
> Derek
>
>
>
> L.F.Huang writes:
>
>> Hello everyone:
>>   I am calculating graphene supercell with one vacancy impurity, whose
>> size
>> is 4*4*1. It is an magnetic system whose magnetization is 0.8 bohr
>> magneton. And there are 31 atoms and 93 representations with one mode
>> for each. At first, ph.x is executing well, however, when the 23th
>> representation is being done, there comes out some strange thing:
>> **********************************************************************
>> FATAL ERROR on MPI node 3 (ganode054): GM send to MPI node 10 (???
>> [00:60:dd:49:08:2a]) failed: status 18 (target node was unreachable)
>> check the target host, mapping or cables
>> Small/Ctrl message completion error!
>> forrtl: error (76): IOT trap signal
>> FATAL ERROR on MPI node 9 (ganode057): GM send to MPI node 10 (???
>> [00:60:dd:49:08:2a]) failed: status 18 (target node was unreachable)
>> check the target host, mapping or cables
>> Small/Ctrl message completion error!
>> forrtl: error (76): IOT trap signal
>> **********************************************************************
>> AND:
>> PWSCF version: QE3.2.3
>> K_point grid: 10*10*1 with 52*2(the 2 is due to magneticity) reduced
>> k_points Number of nodes: -np 8
>>
>> Does anyone knows what is the reason of this error? and How can I do
>> to solve it?
>> I would like appreciate any advice very much!!!
>>
>> Yours Sincerely
>> Huang
>>
>> ======================================================================
>> L.F.Huang(?????) lfhuang at theory.issp.ac.cn
>> ======================================================================
>>  Add: Research Laboratory for Computational Materials Sciences,
>>       Instutue of Solid State Physics,the Chinese Academy of Sciences,
>> P.O.Box 1129, Hefei 230031, P.R.China
>>  Tel: 86-551-5591464-328(office)
>>  Fax: 86-551-5591434
>>  Web: http://theory.issp.ac.cn (website of our theory group)
>>       http://www.issp.ac.cn    (website of our institute)
>> ======================================================================





More information about the users mailing list