Jerkwin分享 http://blog.sciencenet.cn/u/Jerkwin

博文

VASP.5.2.12编译: Intel Fortran+MPI+MKL

已有 18728 次阅读 2014-5-21 06:03 |个人分类:我的工具箱|系统分类:科研笔记| VASP

VASP.5.2.12编译: Intel Fortran+MPI+MKL
2014–05–19 09:56:26

并行版本VASP编译

  • 编译器: Intel Fortran

  • 并行库: Intel MPI

  • 数学库: Intel MKL

准备工作

1 . Fortran编译器, MPI库与MKL库安装好. 若系统使用module, 只须load即可.

module show intel/13.1.0

-------------------------------------------------------------------/share/apps/modules/Modules/modulefiles/intel/13.1.0:module-whatis    Intel Compiler module-whatis    Version: 13.1.0 module-whatis    Category: compiler, runtime support module-whatis    Description: Intel Compiler Family (C/C++/Fortran for x86_64) module-whatis    URL: http://www.intel.com/cd/software/products/asmo-na/eng/compilers/284132.htm prepend-path     PATH /share/apps/intel/composer_xe_2013.2.146/bin prepend-path     MANPATH /share/apps/intel/composer_xe_2013.2.146/man/en_US prepend-path     INCLUDE /share/apps/intel/composer_xe_2013.2.146/mkl/include:/share/apps/intel/composer_xe_2013.2.146/ipp/include prepend-path     LD_LIBRARY_PATH /share/apps/intel/composer_xe_2013.2.146/lib/intel64 prepend-path     LIBRARY_PATH /share/apps/intel/composer_xe_2013.2.146/lib/intel64 prepend-path     NLS_PATH /share/apps/intel/composer_xe_2013.2.146/lib/intel64/locale/%l_%t/%N setenv           COMPILER_TYPE intel setenv           COMPILER_VERSION 13.1.0 setenv           INTEL_LICENSE_FILE 28518@192.168.100.1 -------------------------------------------------------------------

module show impi/4.1.0

-------------------------------------------------------------------/share/apps/modules/Modules/modulefiles/impi/4.1.0:module-whatis    Intel Compiler module-whatis    Version: 4.1.0 module-whatis    Category: compiler, runtime support module-whatis    Description: Intel Compiler Family (C/C++/Fortran for x86_64) module-whatis    URL: http://www.intel.com/cd/software/products/asmo-na/eng/compilers/284132.htm setenv           version 4.1.0 setenv           Intel_FC_Home /share/apps/intel/impi/4.1.0.030 prepend-path     PATH /share/apps/intel/impi/4.1.0.030/intel64/bin prepend-path     MANPATH /share/apps/intel/impi/4.1.0.030/man prepend-path     LD_LIBRARY_PATH /share/apps/intel/impi/4.1.0.030/intel64/lib setenv           I_MPI_ROOT /share/apps/intel/impi/4.1.0.030 setenv           I_MPI_FABRICS shm:tmi setenv           TMI_CONFIG /share/apps/intel/impi/4.1.0.030/intel64/etc/tmi.conf setenv           INTEL_LICENSE_FILE 28518@192.168.100.1 -------------------------------------------------------------------

module show mkl/13.1.0

-------------------------------------------------------------------/share/apps/modules/Modules/modulefiles/mkl/13.1.0:module-whatis    Intel MKL module-whatis    Version: 13.1.0 module-whatis    Category: compiler, runtime support module-whatis    Description: Intel Compiler Family (C/C++/Fortran for x86_64) module-whatis    URL: http://www.intel.com/cd/software/products/asmo-na/eng/compilers/284132.htm setenv           MKL_ROOT /share/apps/intel/composer_xe_2013.2.146/composer_xe_2013.2.146/mkl -------------------------------------------------------------------

2 . 检查编译器, 运行库, 路径无误

which ifort 给出

/share/apps/intel/composer_xe_2013.2.146/bin/ifort

which mpiifort 给出

/share/apps/intel/impi/4.1.0.030/intel64/bin/mpiifort

which mpirun 给出

/share/apps/intel/impi/4.1.0.030/intel64/bin/mpirun

编译

  1. 下载VASP源码, vasp.5.2.12.tar.gzvasp.5.lib.tar.gz

  2. 解压

    tar -xzvf vasp.5.2.12.tar.gz, 得文件夹 vasp.5.2

    tar -xzvf vasp.5.lib.tar.gz, 得文件夹 vasp.5.lib

  3. 编译库文件, 简单, 直接使用makefile.linux_ifc_P4

    将19行FC=ifc改为FC=mpiifort

    make -f makefile.linux_ifc_P4

    libdmy.alinpack_double.o, 即成功

  4. 编译主程序, 复杂, 牵涉到数学库, FFT库, 并行库的选择, 需要修改makefile.linux_ifc_P4.
    原则是尽可能使用Intel自家的东西, 简单且效率好, 故使用MKL及其自带的FFTW, 并行库使用IntelMPI
    编译器选项可在Intel官网查询
    一份修改好的makefile及其简单注释如下

    将其保存为makefile

    make

    vasp即成功, 编译中有警告, 但不致命

  1. # MKL及其FFTW路径

  2. MKLROOT=/share/apps/intel/composer_xe_2013.2.146/composer_xe_2013.2.146/mkl

  3. FFTWROOT=${MKLROOT}/include/fftw


  4. # 扩展名

  5. .SUFFIXES: .inc .f .f90 .F


  6. # 预处理扩展名

  7. SUFFIX=.f90


  8. # MPI Fortran编译器, 链接器, 可使用绝对路径

  9. # 增加FFTW路径, 以便使用MKL自带的FFTW

  10. FC=mpiifort -I${FFTWROOT}

  11. FCL=$(FC)


  12. # fpp预处理选项

  13. CPP_=fpp -f_com=no -free -w0 $*.F $*$(SUFFIX)


  14. # 编译选项. 注意

  15. # 1. 行尾必须有空格, 数目不限

  16. # 2. byterecl必须使用, 否则WAVECAR文件极大

  17. FFLAGS =  -FR -lowercase -assume byterecl  


  18. # 优化选项

  19. # 增加 -xHost -axAVX

  20. OFLAG=-O2 -ip -ftz -xHost -axAVX


  21. # 其他编译选项

  22. OFLAG_HIGH = $(OFLAG)

  23. OBJ_HIGH =

  24. OBJ_NOOPT =

  25. DEBUG  = -FR -O0

  26. INLINE = $(OFLAG)


  27. # MKL, BLAS, LAPACK使用选项

  28. # 1. CPP选项中必须设置 -DRPROMU_DGEMV -DRACCMU_DGEMV

  29. # 2. 使用静态库, 速度可能稍快

  30. # 3. 可参考https://software.intel.com/en-us/articles/intel-mkl-link-line-advisor

  31. BLAS=$(MKLROOT)/lib/intel64/libmkl_blas95_lp64.a

  32. LAPACK=$(MKLROOT)/lib/intel64/libmkl_lapack95_lp64.a


  33. # 链接选项, 使用静态库

  34. LINK=-Wl,--start-group \

  35. $(MKLROOT)/lib/intel64/libmkl_intel_lp64.a \

  36. $(MKLROOT)/lib/intel64/libmkl_core.a \

  37. $(MKLROOT)/lib/intel64/libmkl_intel_thread.a \

  38. -Wl,--end-group \

  39. -lpthread -liomp5 -lmpi -lm


  40. # CPP并行选项

  41. # NGZhalf               charge density   reduced in Z direction

  42. # wNGZhalf              gamma point only reduced in Z direction

  43. # scaLAPACK             use scaLAPACK (usually slower on 100 Mbit Net)

  44. # avoidalloc          avoid ALLOCATE if possible

  45. # PGF90               work around some for some PGF90 / IFC bugs

  46. # CACHE_SIZE          1000 for PII,PIII, 5000 for Athlon, 8000-12000 P4, PD

  47. # RPROMU_DGEMV        use DGEMV instead of DGEMM in RPRO (depends on used BLAS)

  48. # RACCMU_DGEMV        use DGEMV instead of DGEMM in RACC (depends on used BLAS)

  49. # tbdyn                 MD package of Tomas  Bucko

  50. CPP = $(CPP_) -DMPI  -DHOST=\"LinuxIFC\" -DIFC \

  51. -DCACHE_SIZE=4000 -DPGF90 -Davoidalloc -DNGZhalf \

  52. -DMPI_BLOCK=8000 \

  53. -DRPROMU_DGEMV  -DRACCMU_DGEMV


  54. # MPI库

  55. LIB = -L../vasp.5.lib -ldmy  \

  56. ../vasp.5.lib/linpack_double.o $(LAPACK) $(BLAS)


  57. # 使用MKL自带的FFTW

  58. FFT3D   = fftmpiw.o fftmpi_map.o fftw3d.o fft3dlib.o

  59. # 或使用VASP自带的FFTW

  60. #FFT3D   = fftmpi.o fftmpi_map.o fft3dfurth.o fft3dlib.o



  61. # 一般规则, 编译命令行, 以下不可修改

  62. BASIC=   symmetry.o symlib.o   lattlib.o  random.o


  63. SOURCE=  base.o     mpi.o      smart_allocate.o      xml.o  \

  64.         constant.o jacobi.o   main_mpi.o  scala.o   \

  65.         asa.o      lattice.o  poscar.o   ini.o  mgrid.o  xclib.o  vdw_nl.o  xclib_grad.o \

  66.         radial.o   pseudo.o   gridq.o     ebs.o  \

  67.         mkpoints.o wave.o     wave_mpi.o  wave_high.o  \

  68.         $(BASIC)   nonl.o     nonlr.o    nonl_high.o dfast.o    choleski2.o \

  69.         mix.o      hamil.o    xcgrad.o   xcspin.o    potex1.o   potex2.o  \

  70.         constrmag.o cl_shift.o relativistic.o LDApU.o \

  71.         paw_base.o metagga.o  egrad.o    pawsym.o   pawfock.o  pawlhf.o   rhfatm.o  paw.o   \

  72.         mkpoints_full.o       charge.o   Lebedev-Laikov.o  stockholder.o dipol.o    pot.o \

  73.         dos.o      elf.o      tet.o      tetweight.o hamil_rot.o \

  74.         steep.o    chain.o    dyna.o     sphpro.o    us.o  core_rel.o \

  75.         aedens.o   wavpre.o   wavpre_noio.o broyden.o \

  76.         dynbr.o    rmm-diis.o reader.o   writer.o   tutor.o xml_writer.o \

  77.         brent.o    stufak.o   fileio.o   opergrid.o stepver.o  \

  78.         chgloc.o   fast_aug.o fock.o     mkpoints_change.o sym_grad.o \

  79.         mymath.o   internals.o dynconstr.o dimer_heyden.o dvvtrajectory.o vdwforcefield.o \

  80.         hamil_high.o nmr.o    pead.o     mlwf.o     subrot.o   subrot_scf.o \

  81.         force.o    pwlhf.o  gw_model.o optreal.o   davidson.o  david_inner.o \

  82.         electron.o rot.o  electron_all.o shm.o    pardens.o  paircorrection.o \

  83.         optics.o   constr_cell_relax.o   stm.o    finite_diff.o elpol.o    \

  84.         hamil_lr.o rmm-diis_lr.o  subrot_cluster.o subrot_lr.o \

  85.         lr_helper.o hamil_lrf.o   elinear_response.o ilinear_response.o \

  86.         linear_optics.o linear_response.o   \

  87.         setlocalpp.o  wannier.o electron_OEP.o electron_lhf.o twoelectron4o.o \

  88.         ratpol.o screened_2e.o wave_cacher.o chi_base.o wpot.o local_field.o \

  89.         ump2.o bse_te.o bse.o acfdt.o chi.o sydmat.o dmft.o \

  90.         rmm-diis_mlr.o  linear_response_NMR.o


  91. vasp: $(SOURCE) $(FFT3D) $(INC) main.o

  92. rm -f vasp

  93. $(FCL) -o vasp main.o  $(SOURCE)   $(FFT3D) $(LIB) $(LINK)

  94. makeparam: $(SOURCE) $(FFT3D) makeparam.o main.F $(INC)

  95. $(FCL) -o makeparam  $(LINK) makeparam.o $(SOURCE) $(FFT3D) $(LIB)

  96. zgemmtest: zgemmtest.o base.o random.o $(INC)

  97. $(FCL) -o zgemmtest $(LINK) zgemmtest.o random.o base.o $(LIB)

  98. dgemmtest: dgemmtest.o base.o random.o $(INC)

  99. $(FCL) -o dgemmtest $(LINK) dgemmtest.o random.o base.o $(LIB)

  100. ffttest: base.o smart_allocate.o mpi.o mgrid.o random.o ffttest.o $(FFT3D) $(INC)

  101. $(FCL) -o ffttest $(LINK) ffttest.o mpi.o mgrid.o random.o smart_allocate.o base.o $(FFT3D) $(LIB)

  102. kpoints: $(SOURCE) $(FFT3D) makekpoints.o main.F $(INC)

  103. $(FCL) -o kpoints $(LINK) makekpoints.o $(SOURCE) $(FFT3D) $(LIB)


  104. clean:

  105. -rm -f *.g *.f *.o *.L *.mod *.f90; touch *.F


  106. main.o: main$(SUFFIX)

  107. $(FC) $(FFLAGS)$(DEBUG)  $(INCS) -c main$(SUFFIX)

  108. xcgrad.o: xcgrad$(SUFFIX)

  109. $(FC) $(FFLAGS) $(INLINE)  $(INCS) -c xcgrad$(SUFFIX)

  110. xcspin.o: xcspin$(SUFFIX)

  111. $(FC) $(FFLAGS) $(INLINE)  $(INCS) -c xcspin$(SUFFIX)


  112. makeparam.o: makeparam$(SUFFIX)

  113. $(FC) $(FFLAGS)$(DEBUG)  $(INCS) -c makeparam$(SUFFIX)


  114. makeparam$(SUFFIX): makeparam.F main.F

  115. #

  116. # MIND: I do not have a full dependency list for the include

  117. # and MODULES: here are only the minimal basic dependencies

  118. # if one strucuture is changed then touch_dep must be called

  119. # with the corresponding name of the structure

  120. #

  121. base.o: base.inc base.F

  122. mgrid.o: mgrid.inc mgrid.F

  123. constant.o: constant.inc constant.F

  124. lattice.o: lattice.inc lattice.F

  125. setex.o: setexm.inc setex.F

  126. pseudo.o: pseudo.inc pseudo.F

  127. poscar.o: poscar.inc poscar.F

  128. mkpoints.o: mkpoints.inc mkpoints.F

  129. wave.o: wave.F

  130. nonl.o: nonl.inc nonl.F

  131. nonlr.o: nonlr.inc nonlr.F


  132. $(OBJ_HIGH):

  133. $(CPP)

  134. $(FC) $(FFLAGS) $(OFLAG_HIGH) $(INCS) -c $*$(SUFFIX)

  135. $(OBJ_NOOPT):

  136. $(CPP)

  137. $(FC) $(FFLAGS) $(INCS) -c $*$(SUFFIX)


  138. fft3dlib_f77.o: fft3dlib_f77.F

  139. $(CPP)

  140. $(F77) $(FFLAGS_F77) -c $*$(SUFFIX)


  141. .F.o:

  142. $(CPP)

  143. $(FC) $(FFLAGS) $(OFLAG) $(INCS) -c $*$(SUFFIX)

  144. .F$(SUFFIX):

  145. $(CPP)

  146. $(SUFFIX).o:

  147. $(FC) $(FFLAGS) $(OFLAG) $(INCS) -c $*$(SUFFIX)


  148. # special rules

  149. #-----------------------------------------------------------------------

  150. # these special rules are cummulative (that is once failed

  151. #   in one compiler version, stays in the list forever)

  152. # -tpp5|6|7 P, PII-PIII, PIV

  153. # -xW use SIMD (does not pay of on PII, since fft3d uses double prec)

  154. # all other options do no affect the code performance since -O1 is used


  155. fft3dlib.o : fft3dlib.F

  156. $(CPP)

  157. $(FC) -FR -lowercase -O2 -c $*$(SUFFIX)


  158. fft3dfurth.o : fft3dfurth.F

  159. $(CPP)

  160. $(FC) -FR -lowercase -O1 -c $*$(SUFFIX)


  161. fftw3d.o : fftw3d.F

  162. $(CPP)

  163. $(FC) -FR -lowercase -O1 -c $*$(SUFFIX)


  164. wave_high.o : wave_high.F

  165. $(CPP)

  166. $(FC) -FR -lowercase -O1 -c $*$(SUFFIX)


  167. radial.o : radial.F

  168. $(CPP)

  169. $(FC) -FR -lowercase -O1 -c $*$(SUFFIX)


  170. symlib.o : symlib.F

  171. $(CPP)

  172. $(FC) -FR -lowercase -O1 -c $*$(SUFFIX)


  173. symmetry.o : symmetry.F

  174. $(CPP)

  175. $(FC) -FR -lowercase -O1 -c $*$(SUFFIX)


  176. wave_mpi.o : wave_mpi.F

  177. $(CPP)

  178. $(FC) -FR -lowercase -O1 -c $*$(SUFFIX)


  179. wave.o : wave.F

  180. $(CPP)

  181. $(FC) -FR -lowercase -O1 -c $*$(SUFFIX)


  182. dynbr.o : dynbr.F

  183. $(CPP)

  184. $(FC) -FR -lowercase -O1 -c $*$(SUFFIX)


  185. asa.o : asa.F

  186. $(CPP)

  187. $(FC) -FR -lowercase -O1 -c $*$(SUFFIX)


  188. broyden.o : broyden.F

  189. $(CPP)

  190. $(FC) -FR -lowercase -O2 -c $*$(SUFFIX)


  191. us.o : us.F

  192. $(CPP)

  193. $(FC) -FR -lowercase -O1 -c $*$(SUFFIX)


  194. LDApU.o : LDApU.F

  195. $(CPP)

  196. $(FC) -FR -lowercase -O2 -c $*$(SUFFIX)

运行测试

利用VASP自带的bench.Hg.tar.gz进行测试

1 . 解压 tar -xzvf bench.Hg.tar.gz

2 . 复制INCAR, KPOINTS, POSCAR, POTCAR四个文件到vasp.5.2文件夹下

3 . 单核运行 ./vasp, 耗时45.221s, 屏幕输出

running on    1 nodesdistr:  one band on    1 nodes,    1 groupsvasp.5.2.12 11Nov11 complex......entering main loop       N       E                     dE             d eps       ncg     rms          rms(c)RMM:   1    -0.514507058760E+05   -0.51451E+05   -0.13177E+05   316   0.780E+02RMM:   2    -0.527604338595E+05   -0.13097E+04   -0.23675E+04   316   0.234E+02RMM:   3    -0.529743353776E+05   -0.21390E+03   -0.41254E+03   316   0.116E+02RMM:   4    -0.531145169975E+05   -0.14018E+03   -0.15769E+03   316   0.784E+01RMM:   5    -0.531789029672E+05   -0.64386E+02   -0.67142E+02   316   0.452E+01RMM:   6    -0.532264453365E+05   -0.47542E+02   -0.47991E+02   720   0.309E+01RMM:   7    -0.532330334403E+05   -0.65881E+01   -0.94371E+01   762   0.919E+00    0.871E+00RMM:   8    -0.532322794427E+05    0.75400E+00   -0.37182E+01   697   0.816E+00    0.265E+00RMM:   9    -0.532327283030E+05   -0.44886E+00   -0.88476E+00   702   0.383E+00    0.129E+00RMM:  10    -0.532327148448E+05    0.13458E-01   -0.69686E-01   695   0.120E+00    0.550E-01RMM:  11    -0.532327089541E+05    0.58908E-02   -0.18550E-01   693   0.501E-01    0.247E-01RMM:  12    -0.532327075118E+05    0.14423E-02   -0.34613E-02   691   0.226E-01    0.756E-02RMM:  13    -0.532327075990E+05   -0.87187E-04   -0.65477E-03   688   0.823E-02   1 F= -.53232708E+05 E0= -.53232710E+05  d E =0.749678E-02

4 . 多核并行 mpirun -np 12 ./vasp, 耗时7.931s, 屏幕输出

running on   12 nodesdistr:  one band on    3 nodes,    4 groupsvasp.5.2.12 11Nov11 complex......entering main loop       N       E                     dE             d eps       ncg     rms          rms(c)RMM:   1    -0.514507058760E+05   -0.51451E+05   -0.13177E+05   316   0.780E+02RMM:   2    -0.527604338595E+05   -0.13097E+04   -0.23675E+04   316   0.234E+02RMM:   3    -0.529743353776E+05   -0.21390E+03   -0.41254E+03   316   0.116E+02RMM:   4    -0.531145169975E+05   -0.14018E+03   -0.15769E+03   316   0.784E+01RMM:   5    -0.531789029672E+05   -0.64386E+02   -0.67142E+02   316   0.452E+01RMM:   6    -0.532264453365E+05   -0.47542E+02   -0.47991E+02   720   0.309E+01RMM:   7    -0.532330334403E+05   -0.65881E+01   -0.94371E+01   762   0.919E+00    0.871E+00RMM:   8    -0.532322794427E+05    0.75400E+00   -0.37182E+01   697   0.816E+00    0.265E+00RMM:   9    -0.532327283030E+05   -0.44886E+00   -0.88476E+00   702   0.383E+00    0.129E+00RMM:  10    -0.532327148448E+05    0.13458E-01   -0.69686E-01   695   0.120E+00    0.550E-01RMM:  11    -0.532327089541E+05    0.58908E-02   -0.18550E-01   693   0.501E-01    0.247E-01RMM:  12    -0.532327075118E+05    0.14423E-02   -0.34613E-02   691   0.226E-01    0.756E-02RMM:  13    -0.532327075990E+05   -0.87187E-04   -0.65477E-03   688   0.823E-02   1 F= -.53232708E+05 E0= -.53232710E+05  d E =0.749678E-02

多核与单核结果一致, 说明并行无误

5 . 将OSZICAROSZICAR.ref, OUTCAROUTCAR.ref做比较, 所得结果有所不同, 原因在于OSZICAR.refOUTCAR.ref所用版本为 vasp.4.4.4 10Jan99

说明

  1. Windows下面的编译, 原则方法相同, 但需要注意的地方更多, 暂不推荐



https://wap.sciencenet.cn/blog-548663-796287.html

上一篇:空间分布函数SDF的计算及三维图示
下一篇:90年代五年制小学语文课本
收藏 IP: 130.184.197.*| 热度|

0

该博文允许注册用户评论 请点击登录 评论 (0 个评论)

数据加载中...
扫一扫,分享此博文

Archiver|手机版|科学网 ( 京ICP备07017567号-12 )

GMT+8, 2024-5-15 20:44

Powered by ScienceNet.cn

Copyright © 2007- 中国科学报社

返回顶部