[Pw_forum] PHONON errors varies when i use 6 or 2 cpu?
Xunlei Ding
ding at sissa.it
Fri Jun 8 18:36:00 CEST 2007
Dear xu,
I am confused by your questions.
I suggest you to do:
1. parallel calculation on 4 cpu for a scf calculation from scratch
(restart_mode='from_scratch').
2. parallel calculation on 4 cpu for a relax calculation from scratch.
If there is error, please put your input and output file here, and tell
me how you submit your job.
Best wishes,
Ding
xu yuehua wrote:
> Dear Ding
> thanks for your help .accoding to suggestion ,i did some small tests
> on these question .
> but i find if i use >2 cpu from relax,there is always error ,for
> example :in relax outfile:
>
>
>
> ATOMIC_POSITIONS (crystal)
> H -0.198038036 0.118658320 0.038439471
> H -0.156238802 0.183672385 0.037451256
> O -0.201997079 0.167103737 0.035109275
>
>
>
> Writing output data file gash2o.save
> Check: negative starting charge= -0.027067
>
> second order charge density extrapolation
> p3_4229: p4_error: net_recv read: probable EOF on socket: 1
> p2_4108: p4_error: net_recv read: probable EOF on socket: 1
> p1_4369: p4_error: : 8097
> [1] MPI Abort by user Aborting program !
> [1] Aborting program!
> rm_l_3_4346: (16650.921875) net_send: could not write to fd=5, errno = 32
> rm_l_2_4225: (16651.347656) net_send: could not write to fd=7, errno = 32
> p2_4108: (16657.351562) net_send: could not write to fd=5, errno = 32
> p3_4229: (16656.933594) net_send: could not write to fd=5, errno = 32
> Fri Jun 8 21:58:05 CST 2007
>
>
>
>
> i check the cpus they are ok not down.because other job is runing on
> it .i wonder what happend ? this problem is parallel issuue ?or
> others?need your help .thanks again
>
>
> 2007/6/8, Xunlei Ding <ding at sissa.it <mailto:ding at sissa.it>>:
>
> Dear xu,
> Yes, you are right because ph.x need to read wfc files of scf
> calculation. So the number of cpu should be the same.
> Maybe you can try wf_collect=.true. in scf calculation if you want to
> change the cpu number.
>
> And I suggest you to do some small tests on these questions.
>
> Best wishes,
> Ding
>
> xu yuehua wrote:
>
> > Dear Ding:
> > I think about your idea , if your idea is correct ,that says:if
> i use
> > 6 cpu to do scf ,then i must use the same number of cpu to continue
> > tophonon calculation .is it right for me to comprehend your idea ?
> > need your help thanks a lot
> >
> >
> > 2007/6/8, Xunlei Ding <ding at sissa.it <mailto:ding at sissa.it>
> <mailto:ding at sissa.it <mailto:ding at sissa.it>>>:
> >
> > Dear Xu,
> > I think,
> > error for 6 cpu calculation is just because one of the six nodes
> > is down,
> > and error for 4 cpu calculation is because you change 6 cpu
> to 4 cpu.
> > So my suggestion is, doing the ph calculation with 6 cpu again.
> >
> > Hope it will works.
> >
> > Yours,
> > ding
> >
> >
> >
> > xu yuehua wrote:
> >
> > > hi everyone?
> > > today i met a problem when i compute phonon :first i do
> scf using 6
> > > cpu ,then i also use 6 cpu to do phono at G,BUT a problem came
> > out in
> > > out.file :
> > >
> > >
> > >
> > > Proc/ planes cols G planes cols G columns G
> > > Pool (dense grid) (smooth grid) (wavefct grid)
> > > 1 5 3284 53988 4 2408 34052 719 5577
> > > 2 4 3283 53987 4 2407 34051 719 5577
> > > 3 4 3283 53987 4 2407 34049 719 5577
> > > 4 4 3283 53987 4 2407 34051 719 5577
> > > 5 4 3283 53987 4 2407 34049 719 5577
> > > 6 4 3283 53987 4 2407 34051 720 5576
> > > 0 25 19699 323923 24 14443 204303 4315 33461
> > >
> > >
> > > nbndx = 20 nbnd = 20 natomwfc = 30 npwx
> > = 4282
> > > nelec = 40.00 nkb = 50 ngl = 10269
> > > p0_9381: p4_error: net_recv read: probable EOF on socket: 1
> > > Killed by signal 2.^M
> > > forrtl: error (69): process interrupted (SIGINT)
> > > Killed by signal 2.^M
> > > Killed by signal 2.^M
> > > Killed by signal 2.^M
> > > Killed by signal 2.^M
> > > p0_9381: (12.363281) net_send: could not write to fd=4,
> errno = 32
> > > Fri Jun 8 09:41:35 CST 2007
> > >
> > > because i do not know the reason .and then i try to use 4
> cpu to
> > > compute phono ,this time the error is like this :
> > >
> > >
> > >
> > >
> > > Representation 44 1 modes - To be done
> > >
> > > Representation 45 1 modes - To be done
> > > IOS = 36
> > >
> >
> > %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
> > > from davcio : error # 20
> > > i/o error in davcio
> >
> > %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
>
> > >
> > >
> > > stopping ...
> > >
> >
> > %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
> >
> > > from davcio : error # 20
> > > i/o error in davcio
> >
> > %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
> > >
> > >
> > > stopping ...
> > > [0] MPI Abort by user Aborting program !
> > > [0] Aborting program!
> > > p0_11006: p4_error: : 0
> > > Killed by signal 2.^M
> > > forrtl: error (69): process interrupted (SIGINT)
> > > p0_11006: (18.296875 ) net_send: could not write to fd=4,
> errno
> > = 32
> > > Fri Jun 8 09:57:22 CST 2007
> > >
> > > above two case ,the same input:
> > > phonons of fiveringwater at Gamma
> > > &inputph
> > > tr2_ph=1.0d-14,
> > > prefix='fxx_specify_ibra_500_12+force',
> > > epsil=.true.,
> > > amass(1)=1.0,
> > > amass(2)= 15.999,
> > > outdir='/raid/xx/pwscf/tmp/',
> > > fildyn='fxx.dynG',
> > > /
> > > 0.0 0.0 0.0
> > >
> > >
> > >
> > >
> > >
> > > so my question is why different number of cpu can change the
> > error ?
> > > befor a few days ago ,i use 2 cpu to do relax ,scf and
> phonon about
> > > another case ,there was well ,but now .....?
> > > i need your help .thanks
> > >
> > > --
> > > Xu Yuehua
> > > physics Department of Nanjing university
> > > China
> >
> > _______________________________________________
> > Pw_forum mailing list
> > Pw_forum at pwscf.org <mailto:Pw_forum at pwscf.org>
> <mailto:Pw_forum at pwscf.org <mailto:Pw_forum at pwscf.org>>
> > http://www.democritos.it/mailman/listinfo/pw_forum
> >
> >
> >
> >
> > --
> > Xu Yuehua
> > physics Department of Nanjing university
> > China
>
> _______________________________________________
> Pw_forum mailing list
> Pw_forum at pwscf.org <mailto:Pw_forum at pwscf.org>
> http://www.democritos.it/mailman/listinfo/pw_forum
>
>
>
>
> --
> Xu Yuehua
> physics Department of Nanjing university
> China
More information about the users
mailing list