VASP.5.2.12编译:Intel Fortran+MPI+MKL

类别:    标签: vasp   阅读次数:   版权: (CC) BY-NC-SA

2014-05-19 09:56:26

并行版本VASP编译

准备工作

1 . Fortran编译器, MPI库与MKL库安装好. 若系统使用module, 只须load即可.

module show intel/13.1.0

-------------------------------------------------------------------
/share/apps/modules/Modules/modulefiles/intel/13.1.0:

module-whatis    Intel Compiler 
module-whatis    Version: 13.1.0 
module-whatis    Category: compiler, runtime support 
module-whatis    Description: Intel Compiler Family (C/C++/Fortran for x86_64) 
module-whatis    URL: http://www.intel.com/cd/software/products/asmo-na/eng/compilers/284132.htm 
prepend-path     PATH /share/apps/intel/composer_xe_2013.2.146/bin 
prepend-path     MANPATH /share/apps/intel/composer_xe_2013.2.146/man/en_US 
prepend-path     INCLUDE /share/apps/intel/composer_xe_2013.2.146/mkl/include:/share/apps/intel/composer_xe_2013.2.146/ipp/include 
prepend-path     LD_LIBRARY_PATH /share/apps/intel/composer_xe_2013.2.146/lib/intel64 
prepend-path     LIBRARY_PATH /share/apps/intel/composer_xe_2013.2.146/lib/intel64 
prepend-path     NLS_PATH /share/apps/intel/composer_xe_2013.2.146/lib/intel64/locale/%l_%t/%N 
setenv           COMPILER_TYPE intel 
setenv           COMPILER_VERSION 13.1.0 
setenv           INTEL_LICENSE_FILE 28518@192.168.100.1 
-------------------------------------------------------------------

module show impi/4.1.0

-------------------------------------------------------------------
/share/apps/modules/Modules/modulefiles/impi/4.1.0:

module-whatis    Intel Compiler 
module-whatis    Version: 4.1.0 
module-whatis    Category: compiler, runtime support 
module-whatis    Description: Intel Compiler Family (C/C++/Fortran for x86_64) 
module-whatis    URL: http://www.intel.com/cd/software/products/asmo-na/eng/compilers/284132.htm 
setenv           version 4.1.0 
setenv           Intel_FC_Home /share/apps/intel/impi/4.1.0.030 
prepend-path     PATH /share/apps/intel/impi/4.1.0.030/intel64/bin 
prepend-path     MANPATH /share/apps/intel/impi/4.1.0.030/man 
prepend-path     LD_LIBRARY_PATH /share/apps/intel/impi/4.1.0.030/intel64/lib 
setenv           I_MPI_ROOT /share/apps/intel/impi/4.1.0.030 
setenv           I_MPI_FABRICS shm:tmi 
setenv           TMI_CONFIG /share/apps/intel/impi/4.1.0.030/intel64/etc/tmi.conf 
setenv           INTEL_LICENSE_FILE 28518@192.168.100.1 
-------------------------------------------------------------------

module show mkl/13.1.0

-------------------------------------------------------------------
/share/apps/modules/Modules/modulefiles/mkl/13.1.0:

module-whatis    Intel MKL 
module-whatis    Version: 13.1.0 
module-whatis    Category: compiler, runtime support 
module-whatis    Description: Intel Compiler Family (C/C++/Fortran for x86_64) 
module-whatis    URL: http://www.intel.com/cd/software/products/asmo-na/eng/compilers/284132.htm 
setenv           MKL_ROOT /share/apps/intel/composer_xe_2013.2.146/composer_xe_2013.2.146/mkl 
-------------------------------------------------------------------

2 . 检查编译器, 运行库, 路径无误

which ifort 给出

/share/apps/intel/composer_xe_2013.2.146/bin/ifort

which mpiifort 给出

/share/apps/intel/impi/4.1.0.030/intel64/bin/mpiifort

which mpirun 给出

/share/apps/intel/impi/4.1.0.030/intel64/bin/mpirun

编译

  1. 下载VASP源码, vasp.5.2.12.tar.gzvasp.5.lib.tar.gz

  2. 解压

    tar -xzvf vasp.5.2.12.tar.gz, 得文件夹 vasp.5.2

    tar -xzvf vasp.5.lib.tar.gz, 得文件夹 vasp.5.lib

  3. 编译库文件, 简单, 直接使用makefile.linux_ifc_P4

    将19行FC=ifc改为FC=mpiifort

    make -f makefile.linux_ifc_P4

    libdmy.alinpack_double.o, 即成功

  4. 编译主程序, 复杂, 牵涉到数学库, FFT库, 并行库的选择, 需要修改makefile.linux_ifc_P4.
    原则是尽可能使用Intel自家的东西, 简单且效率好, 故使用MKL及其自带的FFTW, 并行库使用IntelMPI
    编译器选项可在Intel官网查询
    一份修改好的makefile及其简单注释如下

    将其保存为makefile

    make

    vasp即成功, 编译中有警告, 但不致命

makefile
  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
# MKL及其FFTW路径
MKLROOT=/share/apps/intel/composer_xe_2013.2.146/composer_xe_2013.2.146/mkl
FFTWROOT=${MKLROOT}/include/fftw

# 扩展名
.SUFFIXES: .inc .f .f90 .F

# 预处理扩展名
SUFFIX=.f90

# MPI Fortran编译器, 链接器, 可使用绝对路径
# 增加FFTW路径, 以便使用MKL自带的FFTW
FC=mpiifort -I${FFTWROOT}
FCL=$(FC)

# fpp预处理选项
CPP_=fpp -f_com=no -free -w0 $*.F $*$(SUFFIX)

# 编译选项. 注意
# 1. 行尾必须有空格, 数目不限
# 2. byterecl必须使用, 否则WAVECAR文件极大
FFLAGS =  -FR -lowercase -assume byterecl

# 优化选项
# 增加 -xHost -axAVX
OFLAG=-O2 -ip -ftz -xHost -axAVX

# 其他编译选项
OFLAG_HIGH = $(OFLAG)
OBJ_HIGH =
OBJ_NOOPT =
DEBUG  = -FR -O0
INLINE = $(OFLAG)

# MKL, BLAS, LAPACK使用选项
# 1. CPP选项中必须设置 -DRPROMU_DGEMV -DRACCMU-DGEMV
# 2. 使用静态库, 速度可能稍快
# 3. 可参考https://software.intel.com/en-us/articles/intel-mkl-link-line-advisor
BLAS=$(MKLROOT)/lib/intel64/libmkl_blas95_lp64.a
LAPACK=$(MKLROOT)/lib/intel64/libmkl_lapack95_lp64.a

# 链接选项, 使用静态库
LINK=-Wl,--start-group \
	$(MKLROOT)/lib/intel64/libmkl_intel_lp64.a \
	$(MKLROOT)/lib/intel64/libmkl_core.a \
	$(MKLROOT)/lib/intel64/libmkl_intel_thread.a \
	-Wl,--end-group \
	-lpthread -liomp5 -lmpi -lm

# CPP并行选项
# NGZhalf               charge density   reduced in Z direction
# wNGZhalf              gamma point only reduced in Z direction
# scaLAPACK             use scaLAPACK (usually slower on 100 Mbit Net)
# avoidalloc          avoid ALLOCATE if possible
# PGF90               work around some for some PGF90 / IFC bugs
# CACHE-SIZE          1000 for PII,PIII, 5000 for Athlon, 8000-12000 P4, PD
# RPROMU-DGEMV        use DGEMV instead of DGEMM in RPRO (depends on used BLAS)
# RACCMU-DGEMV        use DGEMV instead of DGEMM in RACC (depends on used BLAS)
# tbdyn                 MD package of Tomas  Bucko
CPP = $(CPP_) -DMPI  -DHOST=\"LinuxIFC\" -DIFC \
	-DCACHE_SIZE=4000 -DPGF90 -Davoidalloc -DNGZhalf \
	-DMPI_BLOCK=8000 \
	-DRPROMU_DGEMV  -DRACCMU_DGEMV

# MPI库
LIB = -L../vasp.5.lib -ldmy  \
	../vasp.5.lib/linpack_double.o $(LAPACK) $(BLAS)

# 使用MKL自带的FFTW
FFT3D   = fftmpiw.o fftmpi_map.o fftw3d.o fft3dlib.o
# 或使用VASP自带的FFTW
#FFT3D   = fftmpi.o fftmpi_map.o fft3dfurth.o fft3dlib.o

# 一般规则, 编译命令行, 以下不可修改
BASIC=   symmetry.o symlib.o   lattlib.o  random.o

SOURCE=  base.o     mpi.o      smart_allocate.o      xml.o  \
         constant.o jacobi.o   main_mpi.o  scala.o   \
         asa.o      lattice.o  poscar.o   ini.o  mgrid.o  xclib.o  vdw_nl.o  xclib_grad.o \
         radial.o   pseudo.o   gridq.o     ebs.o  \
         mkpoints.o wave.o     wave_mpi.o  wave_high.o  \
         $(BASIC)   nonl.o     nonlr.o    nonl_high.o dfast.o    choleski2.o \
         mix.o      hamil.o    xcgrad.o   xcspin.o    potex1.o   potex2.o  \
         constrmag.o cl_shift.o relativistic.o LDApU.o \
         paw_base.o metagga.o  egrad.o    pawsym.o   pawfock.o  pawlhf.o   rhfatm.o  paw.o   \
         mkpoints_full.o       charge.o   Lebedev-Laikov.o  stockholder.o dipol.o    pot.o \
         dos.o      elf.o      tet.o      tetweight.o hamil_rot.o \
         steep.o    chain.o    dyna.o     sphpro.o    us.o  core_rel.o \
         aedens.o   wavpre.o   wavpre_noio.o broyden.o \
         dynbr.o    rmm-diis.o reader.o   writer.o   tutor.o xml_writer.o \
         brent.o    stufak.o   fileio.o   opergrid.o stepver.o  \
         chgloc.o   fast_aug.o fock.o     mkpoints_change.o sym_grad.o \
         mymath.o   internals.o dynconstr.o dimer_heyden.o dvvtrajectory.o vdwforcefield.o \
         hamil_high.o nmr.o    pead.o     mlwf.o     subrot.o   subrot_scf.o \
         force.o    pwlhf.o  gw_model.o optreal.o   davidson.o  david_inner.o \
         electron.o rot.o  electron_all.o shm.o    pardens.o  paircorrection.o \
         optics.o   constr_cell_relax.o   stm.o    finite_diff.o elpol.o    \
         hamil_lr.o rmm-diis_lr.o  subrot_cluster.o subrot_lr.o \
         lr_helper.o hamil_lrf.o   elinear_response.o ilinear_response.o \
         linear_optics.o linear_response.o   \
         setlocalpp.o  wannier.o electron_OEP.o electron_lhf.o twoelectron4o.o \
         ratpol.o screened_2e.o wave_cacher.o chi_base.o wpot.o local_field.o \
         ump2.o bse_te.o bse.o acfdt.o chi.o sydmat.o dmft.o \
         rmm-diis_mlr.o  linear_response_NMR.o

vasp: $(SOURCE) $(FFT3D) $(INC) main.o
	rm -f vasp
	$(FCL) -o vasp main.o  $(SOURCE)   $(FFT3D) $(LIB) $(LINK)
makeparam: $(SOURCE) $(FFT3D) makeparam.o main.F $(INC)
	$(FCL) -o makeparam  $(LINK) makeparam.o $(SOURCE) $(FFT3D) $(LIB)
zgemmtest: zgemmtest.o base.o random.o $(INC)
	$(FCL) -o zgemmtest $(LINK) zgemmtest.o random.o base.o $(LIB)
dgemmtest: dgemmtest.o base.o random.o $(INC)
	$(FCL) -o dgemmtest $(LINK) dgemmtest.o random.o base.o $(LIB)
ffttest: base.o smart_allocate.o mpi.o mgrid.o random.o ffttest.o $(FFT3D) $(INC)
	$(FCL) -o ffttest $(LINK) ffttest.o mpi.o mgrid.o random.o smart_allocate.o base.o $(FFT3D) $(LIB)
kpoints: $(SOURCE) $(FFT3D) makekpoints.o main.F $(INC)
	$(FCL) -o kpoints $(LINK) makekpoints.o $(SOURCE) $(FFT3D) $(LIB)

clean:
	-rm -f *.g *.f *.o *.L *.mod *.f90; touch *.F

main.o: main$(SUFFIX)
	$(FC) $(FFLAGS)$(DEBUG)  $(INCS) -c main$(SUFFIX)
xcgrad.o: xcgrad$(SUFFIX)
	$(FC) $(FFLAGS) $(INLINE)  $(INCS) -c xcgrad$(SUFFIX)
xcspin.o: xcspin$(SUFFIX)
	$(FC) $(FFLAGS) $(INLINE)  $(INCS) -c xcspin$(SUFFIX)

makeparam.o: makeparam$(SUFFIX)
	$(FC) $(FFLAGS)$(DEBUG)  $(INCS) -c makeparam$(SUFFIX)

makeparam$(SUFFIX): makeparam.F main.F
#
# MIND: I do not have a full dependency list for the include
# and MODULES: here are only the minimal basic dependencies
# if one strucuture is changed then touch-dep must be called
# with the corresponding name of the structure
#
base.o: base.inc base.F
mgrid.o: mgrid.inc mgrid.F
constant.o: constant.inc constant.F
lattice.o: lattice.inc lattice.F
setex.o: setexm.inc setex.F
pseudo.o: pseudo.inc pseudo.F
poscar.o: poscar.inc poscar.F
mkpoints.o: mkpoints.inc mkpoints.F
wave.o: wave.F
nonl.o: nonl.inc nonl.F
nonlr.o: nonlr.inc nonlr.F

$(OBJ_HIGH):
	$(CPP)
	$(FC) $(FFLAGS) $(OFLAG_HIGH) $(INCS) -c $*$(SUFFIX)
$(OBJ_NOOPT):
	$(CPP)
	$(FC) $(FFLAGS) $(INCS) -c $*$(SUFFIX)

fft3dlib_f77.o: fft3dlib_f77.F
	$(CPP)
	$(F77) $(FFLAGS_F77) -c $*$(SUFFIX)

.F.o:
	$(CPP)
	$(FC) $(FFLAGS) $(OFLAG) $(INCS) -c $*$(SUFFIX)
.F$(SUFFIX):
	$(CPP)
$(SUFFIX).o:
	$(FC) $(FFLAGS) $(OFLAG) $(INCS) -c $*$(SUFFIX)

# special rules
#-----------------------------------------------------------------------
# these special rules are cummulative (that is once failed
#   in one compiler version, stays in the list forever)
# -tpp5|6|7 P, PII-PIII, PIV
# -xW use SIMD (does not pay of on PII, since fft3d uses double prec)
# all other options do no affect the code performance since -O1 is used

fft3dlib.o : fft3dlib.F
	$(CPP)
	$(FC) -FR -lowercase -O2 -c $*$(SUFFIX)

fft3dfurth.o : fft3dfurth.F
	$(CPP)
	$(FC) -FR -lowercase -O1 -c $*$(SUFFIX)

fftw3d.o : fftw3d.F
	$(CPP)
	$(FC) -FR -lowercase -O1 -c $*$(SUFFIX)

wave_high.o : wave_high.F
	$(CPP)
	$(FC) -FR -lowercase -O1 -c $*$(SUFFIX)

radial.o : radial.F
	$(CPP)
	$(FC) -FR -lowercase -O1 -c $*$(SUFFIX)

symlib.o : symlib.F
	$(CPP)
	$(FC) -FR -lowercase -O1 -c $*$(SUFFIX)

symmetry.o : symmetry.F
	$(CPP)
	$(FC) -FR -lowercase -O1 -c $*$(SUFFIX)

wave_mpi.o : wave_mpi.F
	$(CPP)
	$(FC) -FR -lowercase -O1 -c $*$(SUFFIX)

wave.o : wave.F
	$(CPP)
	$(FC) -FR -lowercase -O1 -c $*$(SUFFIX)

dynbr.o : dynbr.F
	$(CPP)
	$(FC) -FR -lowercase -O1 -c $*$(SUFFIX)

asa.o : asa.F
	$(CPP)
	$(FC) -FR -lowercase -O1 -c $*$(SUFFIX)

broyden.o : broyden.F
	$(CPP)
	$(FC) -FR -lowercase -O2 -c $*$(SUFFIX)

us.o : us.F
	$(CPP)
	$(FC) -FR -lowercase -O1 -c $*$(SUFFIX)

LDApU.o : LDApU.F
	$(CPP)
	$(FC) -FR -lowercase -O2 -c $*$(SUFFIX)

运行测试

利用VASP自带的bench.Hg.tar.gz进行测试

1 . 解压 tar -xzvf bench.Hg.tar.gz

2 . 复制INCAR, KPOINTS, POSCAR, POTCAR四个文件到vasp.5.2文件夹下

3 . 单核运行 ./vasp, 耗时45.221s, 屏幕输出

running on    1 nodes
distr:  one band on    1 nodes,    1 groups
vasp.5.2.12 11Nov11 complex
......
entering main loop
       N       E                     dE             d eps       ncg     rms          rms(c)
RMM:   1    -0.514507058760E+05   -0.51451E+05   -0.13177E+05   316   0.780E+02
RMM:   2    -0.527604338595E+05   -0.13097E+04   -0.23675E+04   316   0.234E+02
RMM:   3    -0.529743353776E+05   -0.21390E+03   -0.41254E+03   316   0.116E+02
RMM:   4    -0.531145169975E+05   -0.14018E+03   -0.15769E+03   316   0.784E+01
RMM:   5    -0.531789029672E+05   -0.64386E+02   -0.67142E+02   316   0.452E+01
RMM:   6    -0.532264453365E+05   -0.47542E+02   -0.47991E+02   720   0.309E+01
RMM:   7    -0.532330334403E+05   -0.65881E+01   -0.94371E+01   762   0.919E+00    0.871E+00
RMM:   8    -0.532322794427E+05    0.75400E+00   -0.37182E+01   697   0.816E+00    0.265E+00
RMM:   9    -0.532327283030E+05   -0.44886E+00   -0.88476E+00   702   0.383E+00    0.129E+00
RMM:  10    -0.532327148448E+05    0.13458E-01   -0.69686E-01   695   0.120E+00    0.550E-01
RMM:  11    -0.532327089541E+05    0.58908E-02   -0.18550E-01   693   0.501E-01    0.247E-01
RMM:  12    -0.532327075118E+05    0.14423E-02   -0.34613E-02   691   0.226E-01    0.756E-02
RMM:  13    -0.532327075990E+05   -0.87187E-04   -0.65477E-03   688   0.823E-02
   1 F= -.53232708E+05 E0= -.53232710E+05  d E =0.749678E-02

4 . 多核并行 mpirun -np 12 ./vasp, 耗时7.931s, 屏幕输出

running on   12 nodes
distr:  one band on    3 nodes,    4 groups
vasp.5.2.12 11Nov11 complex
......
entering main loop
       N       E                     dE             d eps       ncg     rms          rms(c)
RMM:   1    -0.514507058760E+05   -0.51451E+05   -0.13177E+05   316   0.780E+02
RMM:   2    -0.527604338595E+05   -0.13097E+04   -0.23675E+04   316   0.234E+02
RMM:   3    -0.529743353776E+05   -0.21390E+03   -0.41254E+03   316   0.116E+02
RMM:   4    -0.531145169975E+05   -0.14018E+03   -0.15769E+03   316   0.784E+01
RMM:   5    -0.531789029672E+05   -0.64386E+02   -0.67142E+02   316   0.452E+01
RMM:   6    -0.532264453365E+05   -0.47542E+02   -0.47991E+02   720   0.309E+01
RMM:   7    -0.532330334403E+05   -0.65881E+01   -0.94371E+01   762   0.919E+00    0.871E+00
RMM:   8    -0.532322794427E+05    0.75400E+00   -0.37182E+01   697   0.816E+00    0.265E+00
RMM:   9    -0.532327283030E+05   -0.44886E+00   -0.88476E+00   702   0.383E+00    0.129E+00
RMM:  10    -0.532327148448E+05    0.13458E-01   -0.69686E-01   695   0.120E+00    0.550E-01
RMM:  11    -0.532327089541E+05    0.58908E-02   -0.18550E-01   693   0.501E-01    0.247E-01
RMM:  12    -0.532327075118E+05    0.14423E-02   -0.34613E-02   691   0.226E-01    0.756E-02
RMM:  13    -0.532327075990E+05   -0.87187E-04   -0.65477E-03   688   0.823E-02
   1 F= -.53232708E+05 E0= -.53232710E+05  d E =0.749678E-02

多核与单核结果一致, 说明并行无误

5 . 将OSZICAROSZICAR.ref, OUTCAROUTCAR.ref做比较, 所得结果有所不同, 原因在于OSZICAR.refOUTCAR.ref所用版本为 vasp.4.4.4 10Jan99

说明

  1. Windows下面的编译, 原则方法相同, 但需要注意的地方更多, 暂不推荐
  2. 只计算gamma点的版本编译时, 在cpp中打开-DwNGZhalf即可

评论

随意赞赏

微信

支付宝
◆本文地址: , 转载请注明◆
◆评论问题: https://jerkwin.herokuapp.com/category/3/博客, 欢迎留言◆


前一篇: 空间分布函数SDF的计算及三维图示
后一篇: VASP使用

访问人次(2015年7月 9日起): | 最后更新: 2017-08-15 19:57:07 UTC | 版权所有 © 2008 - 2017 Jerkwin