[Pw_forum] questions about intel CPU and vc-relax using bfgs cell optimization

Thu Jun 26 05:20:43 CEST 2008

Dear all,

I built a cluster of 5 computers with intel Core TM 2 Q6600 CPU (quadcore), and 40G memory total (8G each) on S3000AH system board. The network is 1Gbit Ethernet. I also checked the em64t option in BIOS is on, so I think Q6600 is a cpu using em64t technology. For more information about my CPU, see http://processorfinder.intel.com/details.aspx?sSpec=SL9UM 
also I typed "more /proc/cpuinfo", the information for my cpu and OS as follows:
LSB Version:    :core-3.0-amd64:core-3.0-ia32:core-3.0-noarch:graphics-3.0-amd64:graphics-3.0-ia32:graphics-3.0-noarch
Distributor ID:    RedHatEnterpriseAS
Description:    Red Hat Enterprise Linux AS release 4 (Nahant Update 4)
Release:    4
Codename:    NahantUpdate4
processor    : 0
vendor_id    : GenuineIntel
cpu family    : 6
model        : 15
model name    : Intel(R) Core(TM)2 Quad CPU    Q6600  @ 2.40GHz
stepping    : 11
cpu MHz        : 2400.150
cache size    : 4096 KB
physical id    : 0
siblings    : 4
core id        : 0
cpu cores    : 4
fpu        : yes
fpu_exception    : yes
cpuid level    : 10
wp        : yes
flags        : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx lm pni monitor ds_cpl est tm2 cx16 xtpr
bogomips    : 4806.03
clflush size    : 64
cache_alignment    : 64
address sizes    : 36 bits physical, 48 bits virtual
power management:

processor    : 1
vendor_id    : GenuineIntel
cpu family    : 6
model        : 15
model name    : Intel(R) Core(TM)2 Quad CPU    Q6600  @ 2.40GHz
stepping    : 11
cpu MHz        : 2400.150
cache size    : 4096 KB
physical id    : 0
siblings    : 4
core id        : 2
cpu cores    : 4
fpu        : yes
fpu_exception    : yes
cpuid level    : 10
wp        : yes
flags        : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx lm pni monitor ds_cpl est tm2 cx16 xtpr
bogomips    : 4798.75
clflush size    : 64
cache_alignment    : 64
address sizes    : 36 bits physical, 48 bits virtual
power management:

processor    : 2
vendor_id    : GenuineIntel
cpu family    : 6
model        : 15
model name    : Intel(R) Core(TM)2 Quad CPU    Q6600  @ 2.40GHz
stepping    : 11
cpu MHz        : 2400.150
cache size    : 4096 KB
physical id    : 0
siblings    : 4
core id        : 1
cpu cores    : 4
fpu        : yes
fpu_exception    : yes
cpuid level    : 10
wp        : yes
flags        : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx lm pni monitor ds_cpl est tm2 cx16 xtpr
bogomips    : 4799.49
clflush size    : 64
cache_alignment    : 64
address sizes    : 36 bits physical, 48 bits virtual
power management:

processor    : 3
vendor_id    : GenuineIntel
cpu family    : 6
model        : 15
model name    : Intel(R) Core(TM)2 Quad CPU    Q6600  @ 2.40GHz
stepping    : 11
cpu MHz        : 2400.150
cache size    : 4096 KB
physical id    : 0
siblings    : 4
core id        : 3
cpu cores    : 4
fpu        : yes
fpu_exception    : yes
cpuid level    : 10
wp        : yes
flags        : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx lm pni monitor ds_cpl est tm2 cx16 xtpr
bogomips    : 4799.52
clflush size    : 64
cache_alignment    : 64
address sizes    : 36 bits physical, 48 bits virtual
power management:

Therefore, I updated my intel C++ and Fortran Compilers from 10.1.008 to latest vision 10.1.017 for Intel(R) 64 and MKL from 10.0.011 to latest 10.0.3.020, file names displayed on website were l_cc_p_10.1.017_intel64.tar.gz, l_cc_p_10.1.017_intel64.tar.gz and l_mkl_p_10.0.3.020.tgz. After installation of the three, I compiled for em64t vision of blas95 lapack95 in /opt/intel/mkl/10.0.3.020/interfaces/ using ifort under /opt/intel/fce/10.1.017/bin/. Then compiled  mpich2 using ifort and icc. But when I compile fftw 2.1.5 an error occurred, so I compile the fftw 2.1.5 using 10.1.008 ifort and icc on other node with same hardware, the scp it to master node. After all above done, I turned to compile QE.

But to my surprise, QE detected my architecture as amd64, not ia32 or ia64. My first question is does QE support the intel EM64T technology and take advantages from it ?

At last, I compile the QE using amd64 architecture schedule by intel C++ and Fortran 10.1.017 vision and MKL 10.0.3.020 library, but I find it less efficienct the the QE compiled by intel C++ and Fortran 10.1.008 vision and 10.0.011 library. The efficiency of QE compiled by 10.1.008 compiler and 10.0.011 is about 60% but the QE compiled by 10.1.017compiler is 10% tested by input file like this:
 &CONTROL
                       title = 'Anatase lattice BFGS' ,
                 calculation = 'vc-relax' ,
                restart_mode = 'from_scratch' ,
                      outdir = '/home/vega/tmp/' ,
                  pseudo_dir = '/home/vega/espresso-4.0/pseudo/' ,
                      prefix = 'Anatase lattice default' ,
               etot_conv_thr = 0.000000735 ,
               forc_conv_thr = 0.0011668141375 ,
                       nstep = 1000 ,
 /
 &SYSTEM
                       ibrav = 6,
                   celldm(1) = 7.135605333,
                   celldm(3) = 2.5121822033898305084745762711864,
                         nat = 12,
                        ntyp = 2,
                     ecutwfc = 25 ,
                     ecutrho = 200 ,
 /
 &ELECTRONS
                    conv_thr = 7.3D-8 ,
 /
 &IONS
                ion_dynamics = 'bfgs' ,
 /
 &CELL
               cell_dynamics = 'bfgs' ,
                 cell_dofree = 'xyz' ,
 /
ATOMIC_SPECIES
   Ti   47.86700  Ti.pw91-sp-van_ak.UPF 
    O   15.99940  O.pw91-van_ak.UPF 
ATOMIC_POSITIONS angstrom 
   Ti      0.000000000    0.000000000    0.000000000    
   Ti      1.888000000    1.888000000    4.743000000    
   Ti      0.000000000    1.888000000    2.372000000    
   Ti      1.888000000    0.000000000    7.115000000    
    O      0.000000000    0.000000000    1.973000000    
    O      1.888000000    1.888000000    6.716000000    
    O      0.000000000    1.888000000    4.345000000    
    O      1.888000000    0.000000000    9.088000000    
    O      1.888000000    0.000000000    5.141000000    
    O      0.000000000    1.888000000    0.398000000    
    O      1.888000000    1.888000000    2.770000000    
    O      0.000000000    0.000000000    7.513000000    
K_POINTS automatic 
  7 7 3   1 1 1 

My second question is about the efficiency: 
Which compiler and MKL vision is the best one for my cluster?
Why I updated my MKL and compilers brings me less efficiency?
What is the best efficiency of my cluster can reach ? 60% is low or high for QE?

Third question is about bfgs cell optimization. When I run the above input file with the 'cell_dofree = 'xyz'' in &CELL section. I think it mean only a,b,c of the lattice are changeable, and three angles, alpha, beta, gamma is fixed to 90 degrees according to the PWgui. So that, the lattice will remain orthogonal. But the results showed the angles were still changing. the results file as follows:

......
     entering subroutine stress ...

          total   stress  (Ry/bohr**3)                   (kbar)     P=   -1.85
   0.00004989   0.00000000   0.00000000          7.34      0.00      0.00
   0.00000000   0.00005560   0.00000000          0.00      8.18      0.00
   0.00000000   0.00000000  -0.00014314          0.00      0.00    -21.06

     number of scf cycles    =   5
     number of bfgs steps    =   2

     enthalpy old            =    -725.4093146855 Ry
     enthalpy new            =    -725.4093492895 Ry

     CASE: enthalpy_new < enthalpy_old

     new trust radius        =       0.0190701467 bohr
     new conv_thr            =       0.0000000074 Ry

CELL_PARAMETERS (alat)
   0.992368528   0.000000000  -0.000000009
   0.000000000   0.992410788   0.000000037
  -0.000000021   0.000000091   2.503790203

ATOMIC_POSITIONS (angstrom)
Ti       0.000000000   0.000000000   0.000040364
Ti       1.873591740   1.873671740   4.727209100
Ti      -0.000000020   1.873671654   2.362997426
Ti       1.873591720   0.000000258   7.091885462
O       -0.000000017   0.000000072   1.972736887
O        1.873591723   1.873671812   6.700607058
O       -0.000000037   1.873671726   4.336885456
O        1.873591703   0.000000330   9.064014758
O        1.873591737   0.000000186   5.117531758
O       -0.000000004   1.873671582   0.390380409
O        1.873591756   1.873671668   2.753778039
O       -0.000000064   0.000000273   7.481645190

 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
     from checkallsym : error #         2
     not orthogonal operation
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

     stopping ...
application called MPI_Abort(MPI_COMM_WORLD, 0) - process 0[cli_0]: aborting job:
application called MPI_Abort(MPI_COMM_WORLD, 0) - process 0
rank 0 in job 3  node5_32785   caused collective abort of all ranks
  exit status of rank 0: killed by signal 9 

Do you think, I shloud never using BFGS to optimize Anatase lattice? But CASTEP can do so,why?

thanking for reading. I'm looking forward to responding.

Vega Lew
PH.D Candidate in Chemical Engineering
College of Chemistry and Chemical Engineering
Nanjing University of Technology, 210009, Nanjing, Jiangsu, China
_________________________________________________________________
Discover the new Windows Vista
http://search.msn.com/results.aspx?q=windows+vista&mkt=en-US&form=QBRE
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20080626/e73977db/attachment.html>