[Q-e-developers] QE random number generator (randy) error: j out of range
LIDing
dingmaotu at 126.com
Sun Nov 11 13:31:11 CET 2012
Hi,
It seems that PLUMED is the cause of the problem.
I tried ten different jobs with the same input, five of which have the -plumed option. 4 out of 5 plumed jobs crashed
with the message "From randy: error #1038 j out of range", and those without -plumed just ran with no problem.
So with -plumed, PWscf will crash randomly (with a large probability). I don't understand why.
Here is another piece of info that may help:
I found that set_random_seed uses current time components to generate a seed:
83 ! itime contains: year, month, day, time difference in minutes, hours,
84 ! minutes, seconds and milliseconds.
85 iseed = ( itime(8) + itime(6) ) * ( itime(7) + itime(4) )
Here in China we have itime(4) as 480 (timezone UTC+8), which is a constant for any particular region,
and quite likely iseed will be larger than the ic = 150889. if iseed is negative or smaller than ic,
which is the case in most European countries, everything will be fine.
In randy, it first processes the seed as idum using:
53 idum = MOD( ic - idum, m )
I think this is the problem of generating many negetive numbers.
If I remove the itime(4) in set_random_seed, PWscf never crashes with or without -plumed.
I believe this correction has no problem since it is what a British user (UTC time, with itime(4) = 0) will get. I tested this with many runs.
By the way, if j is negative, errore will do nothing (in error_handler.f90):
42 IF ( ierr <= 0 ) RETURN
It means that many incorrect random numbers will be returned until a positive j stops the program.
So here are the problems:
1. It seems that randy (and set_random_seed) does have some problems,
and removing the itime(4) in set_random_seed fixes the problem, but why is it OK without -plumed?
3. Every time it crashes (with -plumed), j is 1038, not any other value.
Any suggestion?
Thank you for your help!
Regards,
Li Ding
Institute of Geology and Geophysics, Chinese Academy of Science
Email: dingmaotu at 126.com
At 2012-11-11 00:43:49,"Paolo Giannozzi" <giannozz at democritos.it> wrote:
>It seems to me exceedingly unlikely that there is a bug in a simple
>routine like randy. It is more likely that there is either a bug in
>the compiler,
>or some array going out of bounds. In any case, it would be important
>to know
>whether this happens only in conjunction with PLUMED or not, and to
>have an
>input that produces this problem
>
>Paolo
>
>On Nov 10, 2012, at 15:57 , LIDing wrote:
>
>> Dear QE developers,
>>
>> I am using QE 4.3.2 with PLUMED 1.3 (PWscf with metadynamics), and
>> encountered a problem. I think it may be a bug.
>> The problem occurs most of the time, and only occasionally the
>> error did not occur.
>> The message was always the same:
>>
>> Molecular Dynamics Calculation
>>
>> Starting temperature = 9000.00 K
>>
>> temperature is set once at start
>>
>> mass Mg = 24.30
>> mass Si = 28.09
>> mass O &nbs! p; = 16.00
>> Time step = 20.00 a.u., 0.9676 femto-seconds
>>
>> %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
>> %%%%%%%%%%%%
>> from randy : error # 1038
>> j out of range
>> %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
>> %%%%%%%%%%%%
>>
>> stopping ...
>>
>> I checked the relevant sources, and found that it occurred right in
>> the md_init call.
>> The error was reported by randy function in the ramdom_numbers.f90
>> file when it was first called by set_random_seed.
>> In the random_numbers.f90:
>>
>> 50 IF ( first ) T! HEN
>> 51 ;!
>> 52 first = .false.
>> 53 idum = MOD( ic - idum, m )
>> 54 !
>>
>> ic = 150889, if you pass a random seed that is larger than ic, then
>> idum will be a negative number after line 53.
>> I wrote a little program to call set_random_seed, most of the time
>> idum will be a negative number and the test in
>> randy
>> 63 IF( j > ntab .OR. j < 1 ) call errore('randy','j out
>> of range',j)
>> fails. j will be a negative number when I test this.
>>
>> But in actual run, j is always 1038 whenever the error occurs,
>> which does not match what I see in my little test program. It is
>> very strange.
>>
>> If I remove the line 53, everything seems OK in my test program but
>> I am no! t sure if this line should be corrected in QE.
>> So is it a bug in QE, or I just made some mistakes (the random seed
>> can be negative, and the error is caused by other problems)?
>>
>> I work on a Linux PC cluster with intel compilers and openmpi, and
>> QE is linked with the mkl library.
>>
>> I am not a subscriber of this mail list. Contact me by email to
>> dingmaotu at 126.com. Thank you!
>>
>> Regards,
>> Li Ding
>> Institute of Geology and Geophysics, Chinese Academy of Science
>>
>>
>> _______________________________________________
>> Q-e-developers mailing list
>> Q-e-developers at qe-forge.org
>> http://qe-forge.org/mailman/listinfo/q-e-developers
>
>---
>Paolo Giannozzi, Dept of Chemistry&Physics&Environment,
>Univ. Udine, via delle Scienze 208, 33100 Udine, Italy
>Phone +39-0432-558216, fax +39-0432-558222
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/developers/attachments/20121111/6442ab1a/attachment.html>
More information about the developers
mailing list