[Pw_forum] PHONON errors varies when i use 6 or 2 cpu?

Xunlei Ding ding at sissa.it
Fri Jun 8 18:36:00 CEST 2007


Dear xu,
I am confused by your questions.
I suggest you to do:
1. parallel calculation on 4 cpu for a scf calculation from scratch 
(restart_mode='from_scratch').
2. parallel calculation on 4 cpu for a relax calculation from scratch.
If there is error, please put your input and output file here, and tell 
me how you submit your job.

Best wishes,
Ding

xu yuehua wrote:

> Dear Ding
>  thanks for your help .accoding to suggestion ,i did  some small tests 
> on these question .
> but i find if i use >2 cpu from relax,there is always error ,for 
> example :in relax outfile:
>
>  
>
> ATOMIC_POSITIONS (crystal)
> H       -0.198038036   0.118658320   0.038439471
> H       -0.156238802   0.183672385   0.037451256
> O       -0.201997079   0.167103737   0.035109275
>
>  
>
>      Writing output data file gash2o.save
>      Check: negative starting charge=   -0.027067
>
>      second order charge density extrapolation
> p3_4229:  p4_error: net_recv read:  probable EOF on socket: 1
> p2_4108:  p4_error: net_recv read:  probable EOF on socket: 1
> p1_4369:  p4_error: : 8097
> [1] MPI Abort by user Aborting program !
> [1] Aborting program!
> rm_l_3_4346: (16650.921875) net_send: could not write to fd=5, errno = 32
> rm_l_2_4225: (16651.347656) net_send: could not write to fd=7, errno = 32
> p2_4108: (16657.351562) net_send: could not write to fd=5, errno = 32
> p3_4229: (16656.933594) net_send: could not write to fd=5, errno = 32
> Fri Jun  8 21:58:05 CST 2007
>
>  
>
>  
> i check the cpus they are ok not down.because other job is runing on 
> it .i wonder what happend ? this problem is parallel  issuue ?or 
> others?need your help .thanks again 
>
>  
> 2007/6/8, Xunlei Ding <ding at sissa.it <mailto:ding at sissa.it>>:
>
>     Dear xu,
>     Yes, you are right because ph.x need to read wfc files of scf
>     calculation. So the number of cpu should be the same.
>     Maybe you can try wf_collect=.true. in scf calculation if you want to
>     change the cpu number.
>
>     And I suggest you to do some small tests on these questions.
>
>     Best wishes,
>     Ding
>
>     xu yuehua wrote:
>
>     > Dear Ding:
>     > I think  about your idea , if your idea is correct ,that says:if
>     i use
>     > 6 cpu to do scf ,then i must use the same number of cpu to continue
>     > tophonon calculation .is it right for  me to comprehend your idea ?
>     > need your help thanks a lot
>     >
>     >
>     > 2007/6/8, Xunlei Ding <ding at sissa.it <mailto:ding at sissa.it>
>     <mailto:ding at sissa.it <mailto:ding at sissa.it>>>:
>     >
>     >     Dear Xu,
>     >     I think,
>     >     error for 6 cpu calculation is just because one of the six nodes
>     >     is down,
>     >     and error for 4 cpu calculation is because you change 6 cpu
>     to 4 cpu.
>     >     So my suggestion is, doing the ph calculation with 6 cpu again.
>     >
>     >     Hope it will works.
>     >
>     >     Yours,
>     >     ding
>     >
>     >
>     >
>     >     xu yuehua wrote:
>     >
>     >     > hi everyone?
>     >     > today i met a problem when i compute phonon :first i do
>     scf using 6
>     >     > cpu ,then i also use 6 cpu to do phono at G,BUT a problem came
>     >     out in
>     >     > out.file :
>     >     >
>     >     >
>     >     >
>     >     >  Proc/  planes cols    G   planes cols    G    columns  G
>     >     >  Pool       (dense grid)      (smooth grid)   (wavefct grid)
>     >     >   1      5   3284  53988    4   2408  34052  719   5577
>     >     >   2      4   3283  53987    4   2407  34051  719   5577
>     >     >   3      4   3283  53987    4   2407  34049  719   5577
>     >     >   4      4   3283  53987    4   2407  34051  719   5577
>     >     >   5      4   3283  53987    4   2407  34049  719   5577
>     >     >   6      4   3283  53987    4   2407  34051  720   5576
>     >     >   0     25  19699 323923   24  14443 204303 4315  33461
>     >     >
>     >     >
>     >     >      nbndx  =    20  nbnd   =    20  natomwfc =    30  npwx
>     >     =    4282
>     >     >      nelec  =  40.00  nkb   =    50  ngl    =   10269
>     >     > p0_9381:  p4_error: net_recv read:  probable EOF on socket: 1
>     >     > Killed by signal 2.^M
>     >     > forrtl: error (69): process interrupted (SIGINT)
>     >     > Killed by signal 2.^M
>     >     > Killed by signal 2.^M
>     >     > Killed by signal 2.^M
>     >     > Killed by signal 2.^M
>     >     > p0_9381: (12.363281) net_send: could not write to fd=4,
>     errno = 32
>     >     > Fri Jun  8 09:41:35 CST 2007
>     >     >
>     >     > because i do not know the reason .and then i try to use 4
>     cpu to
>     >     > compute phono  ,this time the error is like this :
>     >     >
>     >     >
>     >     >
>     >     >
>     >     > Representation    44      1 modes - To be done
>     >     >
>     >     >      Representation    45      1 modes - To be done
>     >     >  IOS = 36
>     >     >
>     >    
>     >  %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
>     >     >      from davcio : error #        20
>     >     >      i/o error in davcio
>     >    
>     >  %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
>
>     >     >
>     >     >
>     >     >      stopping ...
>     >     >
>     >    
>     >  %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
>     >
>     >     >      from davcio : error #        20
>     >     >      i/o error in davcio
>     >    
>     >  %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
>     >     >
>     >     >
>     >     >      stopping ...
>     >     > [0] MPI Abort by user Aborting program !
>     >     > [0] Aborting program!
>     >     > p0_11006:  p4_error: : 0
>     >     > Killed by signal 2.^M
>     >     > forrtl: error (69): process interrupted (SIGINT)
>     >     > p0_11006: (18.296875 ) net_send: could not write to fd=4,
>     errno
>     >     = 32
>     >     > Fri Jun  8 09:57:22 CST 2007
>     >     >
>     >     > above two case ,the same input:
>     >     > phonons of fiveringwater at Gamma
>     >     >  &inputph
>     >     >   tr2_ph=1.0d-14,
>     >     >   prefix='fxx_specify_ibra_500_12+force',
>     >     >   epsil=.true.,
>     >     >   amass(1)=1.0,
>     >     >   amass(2)= 15.999,
>     >     >   outdir='/raid/xx/pwscf/tmp/',
>     >     >   fildyn='fxx.dynG',
>     >     >  /
>     >     > 0.0 0.0 0.0
>     >     >
>     >     >
>     >     >
>     >     >
>     >     >
>     >     > so my question is  why different number of cpu can change the
>     >     error ?
>     >     > befor a few days ago ,i use 2 cpu to do relax ,scf and
>     phonon about
>     >     > another case ,there was well ,but now .....?
>     >     > i need your  help .thanks
>     >     >
>     >     > --
>     >     > Xu Yuehua
>     >     > physics Department of Nanjing university
>     >     > China
>     >
>     >     _______________________________________________
>     >     Pw_forum mailing list
>     >     Pw_forum at pwscf.org <mailto:Pw_forum at pwscf.org>
>     <mailto:Pw_forum at pwscf.org <mailto:Pw_forum at pwscf.org>>
>     >     http://www.democritos.it/mailman/listinfo/pw_forum
>     >
>     >
>     >
>     >
>     > --
>     > Xu Yuehua
>     > physics Department of Nanjing university
>     > China
>
>     _______________________________________________
>     Pw_forum mailing list
>     Pw_forum at pwscf.org <mailto:Pw_forum at pwscf.org>
>     http://www.democritos.it/mailman/listinfo/pw_forum
>
>
>
>
> -- 
> Xu Yuehua
> physics Department of Nanjing university
> China 




More information about the users mailing list