[Pw_forum] wrong record length

Mon Mar 2 21:50:47 CET 2009

Hi Marton,

Just in case you didn't run across this page:

http://www.nersc.gov/nusers/systems/bassi/programming.php

Cheers,

Lex Kemper
Department of Physics
University of Florida

Marci wrote:
> Hi Axel,
> 
>> marton,
>>
>> are you trying to run the postprocessing on your local
>> machine or on the IBM machine?
> 
> on the IBM machine. I had bad experiences with postprocessing on a
> different machine because of using the iotk package, converting binary
> files to text files and back is quite time consuming... (and I hate
> ssh-ing gygabites of files)
> 
>> that depends on what is causing this. it could just be that you
>> have an integer overflow, due to the size of your system, or it
>> could be that you try to read unformatted data on a different
>> endian machine. i would suggest you insert a print statment into
>> the code that prints out the values of DIRECT_IO_FACTOR and recl
>> as well as unf_recl and then get back to use with the information
>> about the architectures and these numbers (ideally also for the
>> smaller test, where it worked).
> 
> Unfortunately, the espresso I'm using on BASSI was not compiled by
> myself, and now I'm scared of compiling mine because I'm not sure that
> it will be able to read the binary that was made with an espresso
> probably compiled with different compilers and/or compiler options.
> Yeah, I know... I should have compiled my own version of quantum
> espresso before making serious calculations to avoid these
> situtations.
> 
> So... I made some changes in diropn.f90 in espresso4.0/PW and compiled
> my own version of espresso (with this I get the same error) to print
> the values below in the case of the big run, honestly I do not really
> know much about this cluster, but I'm sure I'm using compiler xl
> fortran version 11.1.0.3 and library essl 4.2.0.3.
> 
> recl:  415578000
> DIRECT_IO_FACTOR:          8
> unf_recl: -970343296
> 
> On my home cluster, I used a parallelized espresso-4.0.3 on system
> "Intel Xeon E5410 @ 2.33Ghz, 16 GB RAM" with ifort 10.1.015, intel mkl
> libraries 10.0.1.014 and openmpi-1.2.6 and with a smaller but similar
> system (same pseudos, same cutoff, only gamma point), as I said there
> is no "wrong record length" error and I got the following values:
> 
> recl:   97079200
> DIRECT_IO_FACTOR:          8
> unf_recl:  776633600
> 
> If I'm right... 415578000*8 = 3324624000 which is bigger than the
> largest value of a signed 32 bit integer, maybe that causes the
> problem?
> 
> Thanks for your help,
> Marton
> _______________________________________________
> Pw_forum mailing list
> Pw_forum at pwscf.org
> http://www.democritos.it/mailman/listinfo/pw_forum