[Q-e-developers] QE random number generator (randy) error: j out of range

LIDing dingmaotu at 126.com
Sun Nov 11 13:31:11 CET 2012


Hi,


It seems that PLUMED is the cause of the problem.


I tried ten different jobs with the same input, five of which have the -plumed option. 4 out of 5 plumed jobs crashed
with the message "From randy: error #1038 j out of range", and those without -plumed just ran with no problem.


So with -plumed, PWscf will crash randomly (with a large probability). I don't understand why.


Here is another piece of info that may help:


I found that set_random_seed uses current time components to generate a seed:
   83       ! itime contains: year, month, day, time difference in minutes, hours,
   84       !                 minutes, seconds and milliseconds. 
   85       iseed = ( itime(8) + itime(6) ) * ( itime(7) + itime(4) )
Here in China we have itime(4) as 480 (timezone UTC+8), which is a constant for any particular region,
and quite likely iseed will be larger than the ic = 150889. if iseed is negative or smaller than ic, 
which is the case in most European countries, everything will be fine.


In randy, it first processes the seed as idum using:
   53          idum = MOD( ic - idum, m )
I  think this is the problem of generating many negetive numbers. 


If I remove the itime(4) in set_random_seed, PWscf never crashes with or without -plumed.
I believe this correction has no problem since it is what a British user (UTC time, with itime(4) = 0) will get. I tested this with many runs.


By the way, if j is negative, errore will do nothing (in error_handler.f90):
   42   IF ( ierr <= 0 ) RETURN
It means that many incorrect random numbers will be returned until a positive j stops the program.


So here are the problems:
1. It seems that randy (and set_random_seed) does have some problems, 
    and removing the itime(4) in set_random_seed fixes the problem, but why is it OK without -plumed?
3. Every time it crashes (with -plumed), j is 1038, not any other value.


Any suggestion?


Thank you for your help!


Regards,
Li Ding
Institute of Geology and Geophysics, Chinese Academy of Science
Email: dingmaotu at 126.com







At 2012-11-11 00:43:49,"Paolo Giannozzi" <giannozz at democritos.it> wrote:
>It seems to me exceedingly unlikely that there is a bug in a simple
>routine like randy. It is more likely that there is either a bug in  
>the compiler,
>or some array going out of bounds. In any case, it would be important  
>to know
>whether this happens only in conjunction with PLUMED or not, and to  
>have an
>input that produces this problem
>
>Paolo
>
>On Nov 10, 2012, at 15:57 , LIDing wrote:
>
>> Dear QE developers,
>>
>> I am using QE 4.3.2 with PLUMED 1.3 (PWscf with metadynamics), and  
>> encountered a problem. I think it may be a bug.
>> The problem occurs most of the time, and only occasionally the  
>> error did not occur.
>> The message was always the same:
>>
>>      Molecular Dynamics Calculation
>>
>>       Starting temperature  =  9000.00 K
>>
>>       temperature is set once at start
>>
>>       mass Mg               =    24.30
>>       mass Si               =    28.09
>>       mass O     &nbs! p;          =    16.00
>>       Time step             =    20.00 a.u.,  0.9676 femto-seconds
>>
>>   %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 
>> %%%%%%%%%%%%
>>       from randy : error #      1038
>>       j out of range
>>   %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 
>> %%%%%%%%%%%%
>>
>>       stopping ...
>>
>> I checked the relevant sources, and found that it occurred right in  
>> the md_init call.
>> The error was reported by randy function in the ramdom_numbers.f90  
>> file when it was first called by set_random_seed.
>> In the random_numbers.f90:
>>
>>      50       IF ( first ) T! HEN
>>      51           ;!
>>      52          first = .false.
>>      53          idum = MOD( ic - idum, m )
>>      54          !
>>
>> ic = 150889, if you pass a random seed that is larger than ic, then  
>> idum will be a negative number after line 53.
>> I wrote a little program to call set_random_seed, most of the time  
>> idum will be a negative number and the test in
>> randy
>>    63       IF( j > ntab .OR. j <  1 ) call errore('randy','j out  
>> of range',j)
>> fails. j will be a negative number when I test this.
>>
>> But in actual run, j is always 1038 whenever the error occurs,  
>> which does not match what I see in my little test program. It is  
>> very strange.
>>
>> If I remove the line 53, everything seems OK in my test program but  
>> I am no! t sure if this line should be corrected in QE.
>> So is it a bug in QE, or I just made some mistakes (the random seed  
>> can be negative, and the error is caused by other problems)?
>>
>> I work on a Linux PC cluster with intel compilers and openmpi, and  
>> QE is linked with the mkl library.
>>
>> I am not a subscriber of this mail list. Contact me by email to  
>> dingmaotu at 126.com. Thank you!
>>
>> Regards,
>> Li Ding
>> Institute of Geology and Geophysics, Chinese Academy of Science
>>
>>
>> _______________________________________________
>> Q-e-developers mailing list
>> Q-e-developers at qe-forge.org
>> http://qe-forge.org/mailman/listinfo/q-e-developers
>
>---
>Paolo Giannozzi, Dept of Chemistry&Physics&Environment,
>Univ. Udine, via delle Scienze 208, 33100 Udine, Italy
>Phone +39-0432-558216, fax +39-0432-558222
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/developers/attachments/20121111/6442ab1a/attachment.html>


More information about the developers mailing list