[Pw_forum] PHONON errors varies when i use 6 or 2 cpu?

xu yuehua njuxuyuehua at gmail.com
Fri Jun 8 16:27:49 CEST 2007


Dear Ding
 thanks for your help .accoding to suggestion ,i did  some small tests on
these question .
but i find if i use >2 cpu from relax,there is always error ,for example :in
relax outfile:



ATOMIC_POSITIONS (crystal)
H       -0.198038036   0.118658320   0.038439471
H       -0.156238802   0.183672385   0.037451256
O       -0.201997079   0.167103737   0.035109275



     Writing output data file gash2o.save
     Check: negative starting charge=   -0.027067

     second order charge density extrapolation
p3_4229:  p4_error: net_recv read:  probable EOF on socket: 1
p2_4108:  p4_error: net_recv read:  probable EOF on socket: 1
p1_4369:  p4_error: : 8097
[1] MPI Abort by user Aborting program !
[1] Aborting program!
rm_l_3_4346: (16650.921875) net_send: could not write to fd=5, errno = 32
rm_l_2_4225: (16651.347656) net_send: could not write to fd=7, errno = 32
p2_4108: (16657.351562) net_send: could not write to fd=5, errno = 32
p3_4229: (16656.933594) net_send: could not write to fd=5, errno = 32
Fri Jun  8 21:58:05 CST 2007



i check the cpus they are ok not down.because other job is runing on it .i
wonder what happend ? this problem is parallel  issuue ?or others?need your
help .thanks again


2007/6/8, Xunlei Ding <ding at sissa.it>:
>
> Dear xu,
> Yes, you are right because ph.x need to read wfc files of scf
> calculation. So the number of cpu should be the same.
> Maybe you can try wf_collect=.true. in scf calculation if you want to
> change the cpu number.
>
> And I suggest you to do some small tests on these questions.
>
> Best wishes,
> Ding
>
> xu yuehua wrote:
>
> > Dear Ding:
> > I think  about your idea , if your idea is correct ,that says:if i use
> > 6 cpu to do scf ,then i must use the same number of cpu to continue
> > tophonon calculation .is it right for  me to comprehend your idea ?
> > need your help thanks a lot
> >
> >
> > 2007/6/8, Xunlei Ding <ding at sissa.it <mailto:ding at sissa.it>>:
> >
> >     Dear Xu,
> >     I think,
> >     error for 6 cpu calculation is just because one of the six nodes
> >     is down,
> >     and error for 4 cpu calculation is because you change 6 cpu to 4
> cpu.
> >     So my suggestion is, doing the ph calculation with 6 cpu again.
> >
> >     Hope it will works.
> >
> >     Yours,
> >     ding
> >
> >
> >
> >     xu yuehua wrote:
> >
> >     > hi everyone?
> >     > today i met a problem when i compute phonon :first i do scf using
> 6
> >     > cpu ,then i also use 6 cpu to do phono at G,BUT a problem came
> >     out in
> >     > out.file :
> >     >
> >     >
> >     >
> >     >  Proc/  planes cols    G   planes cols    G    columns  G
> >     >  Pool       (dense grid)      (smooth grid)   (wavefct grid)
> >     >   1      5   3284  53988    4   2408  34052  719   5577
> >     >   2      4   3283  53987    4   2407  34051  719   5577
> >     >   3      4   3283  53987    4   2407  34049  719   5577
> >     >   4      4   3283  53987    4   2407  34051  719   5577
> >     >   5      4   3283  53987    4   2407  34049  719   5577
> >     >   6      4   3283  53987    4   2407  34051  720   5576
> >     >   0     25  19699 323923   24  14443 204303 4315  33461
> >     >
> >     >
> >     >      nbndx  =    20  nbnd   =    20  natomwfc =    30  npwx
> >     =    4282
> >     >      nelec  =  40.00  nkb   =    50  ngl    =   10269
> >     > p0_9381:  p4_error: net_recv read:  probable EOF on socket: 1
> >     > Killed by signal 2.^M
> >     > forrtl: error (69): process interrupted (SIGINT)
> >     > Killed by signal 2.^M
> >     > Killed by signal 2.^M
> >     > Killed by signal 2.^M
> >     > Killed by signal 2.^M
> >     > p0_9381: (12.363281) net_send: could not write to fd=4, errno = 32
> >     > Fri Jun  8 09:41:35 CST 2007
> >     >
> >     > because i do not know the reason .and then i try to use 4 cpu to
> >     > compute phono  ,this time the error is like this :
> >     >
> >     >
> >     >
> >     >
> >     > Representation    44      1 modes - To be done
> >     >
> >     >      Representation    45      1 modes - To be done
> >     >  IOS = 36
> >     >
> >
> >  %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
> >     >      from davcio : error #        20
> >     >      i/o error in davcio
> >
> >  %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
> >     >
> >     >
> >     >      stopping ...
> >     >
> >
> >  %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
> >
> >     >      from davcio : error #        20
> >     >      i/o error in davcio
> >
> >  %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
> >     >
> >     >
> >     >      stopping ...
> >     > [0] MPI Abort by user Aborting program !
> >     > [0] Aborting program!
> >     > p0_11006:  p4_error: : 0
> >     > Killed by signal 2.^M
> >     > forrtl: error (69): process interrupted (SIGINT)
> >     > p0_11006: (18.296875 ) net_send: could not write to fd=4, errno
> >     = 32
> >     > Fri Jun  8 09:57:22 CST 2007
> >     >
> >     > above two case ,the same input:
> >     > phonons of fiveringwater at Gamma
> >     >  &inputph
> >     >   tr2_ph=1.0d-14,
> >     >   prefix='fxx_specify_ibra_500_12+force',
> >     >   epsil=.true.,
> >     >   amass(1)=1.0,
> >     >   amass(2)=15.999,
> >     >   outdir='/raid/xx/pwscf/tmp/',
> >     >   fildyn='fxx.dynG',
> >     >  /
> >     > 0.0 0.0 0.0
> >     >
> >     >
> >     >
> >     >
> >     >
> >     > so my question is  why different number of cpu can change the
> >     error ?
> >     > befor a few days ago ,i use 2 cpu to do relax ,scf and phonon
> about
> >     > another case ,there was well ,but now .....?
> >     > i need your  help .thanks
> >     >
> >     > --
> >     > Xu Yuehua
> >     > physics Department of Nanjing university
> >     > China
> >
> >     _______________________________________________
> >     Pw_forum mailing list
> >     Pw_forum at pwscf.org <mailto:Pw_forum at pwscf.org>
> >     http://www.democritos.it/mailman/listinfo/pw_forum
> >
> >
> >
> >
> > --
> > Xu Yuehua
> > physics Department of Nanjing university
> > China
>
> _______________________________________________
> Pw_forum mailing list
> Pw_forum at pwscf.org
> http://www.democritos.it/mailman/listinfo/pw_forum
>



-- 
Xu Yuehua
physics Department of Nanjing university
China
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20070608/85bb808f/attachment.html>


More information about the users mailing list