[QE-users] Issue with running parallel version
Martina Lessio
ml4132 at columbia.edu
Thu May 24 18:59:15 CEST 2018
Dear Paolo,
Thanks so much for your response and for testing my input file on your
machine with the latest version of Quantum Espresso. I should compile the
latest version on my machine.
Regarding the memory: that number is quite close to my memory limit (60 GB)
and such large memory requirement probably explains why my calculations are
so slow. I will consider using a smaller unit cell and using
pseudopotentials with less valence electrons to reduce the memory
requirement. I have actually been searching for a norm conserving fully
relativistic pseudopotential for tellurium with only s and p valence
electrons (this would reduce the number of valence electrons from 16 to 6)
but I have not been able to find it. I would appreciate any suggestion
regarding this aspect.
Thank you so much,
Martina
On Thu, May 24, 2018 at 4:30 AM, Paolo Giannozzi <p.giannozzi at gmail.com>
wrote:
> On 16 processors with the latest QE version, I get
>
> Estimated max dynamical RAM per process > 3.18 GB
> Estimated total dynamical RAM > 50.87 GB
>
> Do you have that much memory?
>
> Paolo
>
>
> On Tue, May 22, 2018 at 1:14 AM, Martina Lessio <ml4132 at columbia.edu>
> wrote:
>
>> Dear Quantum Espresso community,
>>
>> I recently started running the parallel version of QE 5.4.0 and I am
>> getting the following error message in my output file (the error never
>> appeared when I run the code serial on one processor):
>>
>> [compute-0-6.local:31540] 23 more processes have sent help message
>> help-mpi-btl-base.txt / btl:no-nics
>>
>> [compute-0-6.local:31540] Set MCA parameter "orte_base_help_aggregate" to
>> 0 to see all help / error messages
>>
>> where "compute-0-1" is the name of the node I run my calculation on.
>> After the message is printed in the output the calculation typically
>> continues normally and in same cases gets successfully to the end. I other
>> cases, usually when I had large supercell with about 100 atoms, the
>> calculation becomes extremely slow and does not get to the end in a
>> reasonable time. Therefore, I suspect that the error being printed also
>> signals that the calculation will start running only on one processor.
>> However, upon checking, I noticed that all the processors I requested are
>> still busy with the calculation after the error message shows up.
>>
>> I tried to search the forum and the FAQs for this type of issue but did
>> not find much. I would really appreciate if anybody could share their
>> experience with this type of error.
>> I am providing below my submission script and input file (for a large
>> calculation that runs very slowly after the error message is printed).
>>
>> Thanks so much,
>> Martina
>>
>> --
>> Martina Lessio, Ph.D.
>> Frontiers of Science Lecturer in Discipline
>> Postdoctoral Research Scientist
>> Department of Chemistry
>> Columbia University
>>
>> *Submission script:*
>> #!/bin/bash
>> #SBATCH --job-name=QErun
>> #SBATCH -n 24 # node count
>> #SBATCH -p New # node count
>> #SBATCH -o MoTe2ml_super551OPT.out
>> #SBATCH --mem=60000
>> module load openmpi
>> module load mkl
>> module load compilers
>> mpirun -np 24 pw.x < MoTe2ml_super551OPT.in
>>
>> *Input file:*
>> &control
>> calculation = 'relax'
>> restart_mode='from_scratch',
>> prefix='MoTe2ml_super5x5relax',
>> pseudo_dir = '/home/mlessio/espresso-5.4.0/pseudo/',
>> outdir='/home/mlessio/espresso-5.4.0/tempdir/'
>> /
>>
>> &system
>> ibrav= 4, A=17.65, B=17.65, C=16.882, cosAB=-0.5, cosAC=0, cosBC=0,
>> nat= 75, ntyp= 2,
>> ecutwfc =60.
>> lspinorb =.true., noncolin=.true.
>> /
>>
>> &electrons
>> mixing_mode = 'plain'
>> mixing_beta = 0.7
>> conv_thr = 1.0d-10
>> diago_david_ndim=2
>> diagonalization='cg'
>> /
>>
>> &ions
>> /
>>
>> ATOMIC_SPECIES
>> Te 127.6 Te_ONCV_PBE_FR-1.1.upf
>> Mo 95.96 Mo_ONCV_PBE_FR-1.0.upf
>>
>> ATOMIC_POSITIONS {crystal}
>> Te 0.133333330 0.066666657 0.313489588
>> Te 0.133333336 0.266666661 0.313489588
>> Te 0.133333334 0.466666672 0.313489588
>> Te 0.133333325 0.666666683 0.313489588
>> Te 0.133333336 0.866666694 0.313489588
>> Te 0.333333312 0.066666657 0.313489588
>> Te 0.333333306 0.266666661 0.313489588
>> Te 0.333333310 0.466666672 0.313489588
>> Te 0.333333304 0.666666683 0.313489588
>> Te 0.333333319 0.866666694 0.313489588
>> Te 0.533333329 0.066666657 0.313489588
>> Te 0.533333336 0.266666661 0.313489588
>> Te 0.533333320 0.466666672 0.313489588
>> Te 0.533333317 0.666666683 0.313489588
>> Te 0.533333335 0.866666694 0.313489588
>> Te 0.733333372 0.066666657 0.313489588
>> Te 0.733333352 0.266666661 0.313489588
>> Te 0.733333390 0.466666672 0.313489588
>> Te 0.733333374 0.666666683 0.313489588
>> Te 0.733333385 0.866666694 0.313489588
>> Te 0.933333361 0.066666657 0.313489588
>> Te 0.933333341 0.266666661 0.313489588
>> Te 0.933333379 0.466666672 0.313489588
>> Te 0.933333363 0.666666683 0.313489588
>> Te 0.933333347 0.866666694 0.313489588
>> Te 0.133333330 0.066666657 0.097661430
>> Te 0.133333336 0.266666661 0.097661430
>> Te 0.133333334 0.466666672 0.097661430
>> Te 0.133333325 0.666666683 0.097661430
>> Te 0.133333336 0.866666694 0.097661430
>> Te 0.333333312 0.066666657 0.097661430
>> Te 0.333333306 0.266666661 0.097661430
>> Te 0.333333310 0.466666672 0.097661430
>> Te 0.333333304 0.666666683 0.097661430
>> Te 0.333333319 0.866666694 0.097661430
>> Te 0.533333329 0.066666657 0.097661430
>> Te 0.533333336 0.266666661 0.097661430
>> Te 0.533333320 0.466666672 0.097661430
>> Te 0.533333317 0.666666683 0.097661430
>> Te 0.533333335 0.866666694 0.097661430
>> Te 0.733333372 0.066666657 0.097661430
>> Te 0.733333352 0.266666661 0.097661430
>> Te 0.733333390 0.466666672 0.097661430
>> Te 0.733333374 0.666666683 0.097661430
>> Te 0.733333385 0.866666694 0.097661430
>> Te 0.933333361 0.066666657 0.097661430
>> Te 0.933333341 0.266666661 0.097661430
>> Te 0.933333379 0.466666672 0.097661430
>> Te 0.933333363 0.666666683 0.097661430
>> Te 0.933333347 0.866666694 0.097661430
>> Mo 0.066666675 0.133333330 0.205570934
>> Mo 0.066666667 0.333333310 0.205570934
>> Mo 0.066666685 0.533333321 0.205570934
>> Mo 0.066666655 0.733333332 0.205570934
>> Mo 0.066666666 0.933333343 0.205570934
>> Mo 0.266666695 0.133333330 0.205570934
>> Mo 0.266666683 0.333333310 0.205570934
>> Mo 0.266666698 0.533333321 0.205570934
>> Mo 0.266666678 0.733333332 0.205570934
>> Mo 0.266666696 0.933333343 0.205570934
>> Mo 0.466666671 0.133333330 0.205570934
>> Mo 0.466666666 0.333333310 0.205570934
>> Mo 0.466666690 0.533333321 0.205570934
>> Mo 0.466666668 0.733333332 0.205570934
>> Mo 0.466666681 0.933333343 0.205570934
>> Mo 0.666666687 0.133333330 0.205570934
>> Mo 0.666666655 0.333333310 0.205570934
>> Mo 0.666666666 0.533333321 0.205570934
>> Mo 0.666666650 0.733333332 0.205570934
>> Mo 0.666666674 0.933333343 0.205570934
>> Mo 0.866666676 0.133333330 0.205570934
>> Mo 0.866666644 0.333333310 0.205570934
>> Mo 0.866666682 0.533333321 0.205570934
>> Mo 0.866666666 0.733333332 0.205570934
>> Mo 0.866666650 0.933333343 0.205570934
>>
>> K_POINTS {automatic}
>> 2 2 1 0 0 0
>>
>> _______________________________________________
>> users mailing list
>> users at lists.quantum-espresso.org
>> https://lists.quantum-espresso.org/mailman/listinfo/users
>>
>
>
>
> --
> Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e Fisiche,
> Univ. Udine, via delle Scienze 208, 33100 Udine, Italy
> Phone +39-0432-558216, fax +39-0432-558222
>
>
> _______________________________________________
> users mailing list
> users at lists.quantum-espresso.org
> https://lists.quantum-espresso.org/mailman/listinfo/users
>
--
Martina Lessio, Ph.D.
Frontiers of Science Lecturer in Discipline
Postdoctoral Research Scientist
Department of Chemistry
Columbia University
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20180524/23490bfc/attachment.html>
More information about the users
mailing list