[Pw_forum] Distributing phonon calculations todifferent machines
Huiqun Zhou
hqzhou at nju.edu.cn
Thu Jul 8 05:11:12 CEST 2010
Andrea,
I checked the directory, all files are there. See below,
ls /gpfsTMP/hqzhou/tmp/sill_1.04v0/q1/_ph0sill_1.04v0.phsave/
data-file.xml data-file.xml.2 data-file.xml.5 data-file.xml.8
data-file.xml.1 data-file.xml.3 data-file.xml.6
data-file.xml.1.0 data-file.xml.4 data-file.xml.7
It's a little bit insane, this error only occurred when I compute the
first q point (gamma). ph.x worked fine while calculating other
q points. You can see the script snippet in my second post in this
thread, I had created neccessary directories and copied all needed
files in advance before distributing jobs.
Any ideas?
huiqun zhou
@earth sciences, nanjing university, china
----- Original Message -----
From: "Dal Corso Andrea" <dalcorso at sissa.it>
To: "PWSCF Forum" <pw_forum at pwscf.org>
Sent: Wednesday, July 07, 2010 8:06 PM
Subject: Re: [Pw_forum] Distributing phonon calculations todifferent
machines
> On Tue, 2010-07-06 at 17:55 +0800, Huiqun Zhou wrote:
>> Thanks, Andrea.
>>
>> As indicated in the script below, I have copied all files and directories
>> created
>> by pw.x run.
>>
>> if test ! -d $TMP_DIR/${system}/q${i} ; then
>> mkdir $TMP_DIR/${system}/q${i}
>> cp -r $TMP_DIR/${system}/${system}.* $TMP_DIR/${system}/q${i}
>> fi
>>
>
> Are the files really there? If all the files are in the directory
> $TMP_DIR/${system}/q${i} there is no reason why the ph.x code stops.
> If they are not, I cannot help you. It is not a problem of ph.x.
>
>
>> BTW, could you please describe more in detail about the newly added
>> information
>> in INPUT_PH.html for distributing phonon calculations to cluster? If I
>> set
>> wf_collect
>> to true, there should be no relation in nproc and npool between pw.x run
>> and
>> later
>> two ph.x runs, right? Taking AlAs in the GRID_example as example, If I
>> want
>> to use
>> server with 8 CPU core to do calculations for each one q point (8 servers
>> in
>> total),
>> what are the values of images and pools?
>>
>>
> You are right, the explanation refer only to the case
> wf_collect=.false.. However, image parallelization of ph.x is very
> experimental, so be patient. At the moment it divides both q and irreps.
> Load balancing on q only is not implemented.
>
>
> Andrea
>
>
>
>> Huiqun Zhou
>> @Earth Sciences, Nanjing University, China
>>
>> ----- Original Message -----
>> From: "Dal Corso Andrea" <dalcorso at sissa.it>
>> To: "PWSCF Forum" <pw_forum at pwscf.org>
>> Sent: Tuesday, July 06, 2010 4:14 PM
>> Subject: Re: [Pw_forum] Distributing phonon calculations todifferent
>> machines
>>
>>
>> > On Tue, 2010-07-06 at 12:13 +0800, Huiqun Zhou wrote:
>> >> Sorry, I sent an unfinished message.
>> >>
>> >> When using _ph0{prefix}.phsave, I got the error message shown in the
>> >> previous
>> >> message.
>> >>
>> >> Here is the snippet of my script for distributing lsf tasks:
>> >>
>> >> ......
>> >> nq=`sed -n '2p' ./${system}_q${nq1}${nq2}${nq3}.dyn0`
>> >>
>> >> for ((i=1; i<=$nq; i++))
>> >> do
>> >> if test ! -d $TMP_DIR/${system}/q${i} ; then
>> >> mkdir $TMP_DIR/${system}/q${i}
>> >> cp -r $TMP_DIR/${system}/${system}.* $TMP_DIR/${system}/q${i}
>> >> fi
>> >> if test ! -d $TMP_DIR/${system}/q${i}/_ph0${system}.phsave ; then
>> >> mkdir $TMP_DIR/${system}/q${i}/_ph0${system}.phsave
>> >> cp -r $TMP_DIR/${system}/_ph0${system}.phsave/*
>> >> $TMP_DIR/${system}/q${i}/_ph0${system}.phsave
>> >> fi
>> >> done
>> >>
>> >> for ((i=1; i<=$nq; i++))
>> >> do
>> >> cat > ${system}_q${i}.in << EOF
>> >> phonons of ${system}
>> >> &inputph
>> >> tr2_ph = 1.0d-13,
>> >> alpha_mix(1) = 0.2,
>> >> prefix = '${system}',
>> >> ldisp = .true.,
>> >> recover = .true.
>> >> nq1 = ${nq1}, nq2 = ${nq2}, nq3 = ${nq3}
>> >> start_q = $i, last_q = $i
>> >> outdir = '$TMP_DIR/${system}/q${i}',
>> >> fildyn = '${system}_q${nq1}${nq2}${nq3}.dyn'
>> >> ......
>> >> EOF
>> >> $ECHO "calculation of q point $i"
>> >> bsub -a intelmpi -n $processes \
>> >> -R "span[ptile=8]" \
>> >> -J ${r}q${i}anda \
>> >> -oo ${system}_q${i}.out \
>> >> -eo ${system}_q${i}.err \
>> >> $PH_COMMAND -input ./${system}_q${i}.in
>> >> done
>> >>
>> >>
>> >> Huiqun Zhou
>> >> @Earth Sciences, Nanjing University, China
>> >> ----- Original Message -----
>> >> From: Huiqun Zhou
>> >> To: pw_forum at pwscf.org
>> >> Sent: Tuesday, July 06, 2010 12:00 PM
>> >> Subject: [Pw_forum] Distributing phonon calculations to
>> >> different machines
>> >>
>> >>
>> >> dear developers,
>> >>
>> >> Please clarify what directory should be copied
>> >> for distributing phonon calculations
>> >> to different machines, _ph{prefix}.phsave
>> >> or _ph0{prefix}.phsave? The former is
>> >> described in the manual INPUT_PH.html, the latter is used in
>> >> the GRID_example.
>> >> Although there is no _ph{prefix}.phsave existed after the
>> >> preparatory run with
>> >> start_irr=0 and last_irr=0, using the former works OK at the
>> >> cost of redundant
>> >> calculations.
>> >>
>> >> Representation # 1 mode # 1
>> >>
>> >> Self-consistent Calculation
>> >>
>> >> %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
>> >> %%%%%%%%%%%%%%%%%
>> >> from davcio : error # 25
>> >> error while reading from file
>> >> %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
>> >> %%%%%%%%%%%%%%%%%
>> >>
>> >> stopping ...
>> >
>> > Thank you for the message. I will correct the INPUT_PH documentation.
>> > The correct directory is _ph0{prefix}.phsave.
>> >
>> > This message usually means that you have not copied all the required
>> > files. Did you copy all the files produced by pw.x?
>> >
>> > HTH,
>> >
>> > Andrea
>> >
>> >
>> >
>> >>
>> >> ______________________________________________________________
>> >>
>> >> _______________________________________________
>> >> Pw_forum mailing list
>> >> Pw_forum at pwscf.org
>> >> http://www.democritos.it/mailman/listinfo/pw_forum
>> >> _______________________________________________
>> >> Pw_forum mailing list
>> >> Pw_forum at pwscf.org
>> >> http://www.democritos.it/mailman/listinfo/pw_forum
>> > --
>> > Andrea Dal Corso Tel. 0039-040-3787428
>> > SISSA, Via Beirut 2/4 Fax. 0039-040-3787528
>> > 34151 Trieste (Italy) e-mail: dalcorso at sissa.it
>> >
>> >
>> > _______________________________________________
>> > Pw_forum mailing list
>> > Pw_forum at pwscf.org
>> > http://www.democritos.it/mailman/listinfo/pw_forum
>> >
>>
>>
>> _______________________________________________
>> Pw_forum mailing list
>> Pw_forum at pwscf.org
>> http://www.democritos.it/mailman/listinfo/pw_forum
> --
> Andrea Dal Corso Tel. 0039-040-3787428
> SISSA, Via Beirut 2/4 Fax. 0039-040-3787528
> 34151 Trieste (Italy) e-mail: dalcorso at sissa.it
>
>
> _______________________________________________
> Pw_forum mailing list
> Pw_forum at pwscf.org
> http://www.democritos.it/mailman/listinfo/pw_forum
>
More information about the users
mailing list