# MODELLER基础教程---基于单模板的同源模建

MODELLER

Step1

>P1;TvLDH

sequence:TvLDH:::::::0.00: 0.00

MSEAAHVLITGAAGQIGYILSHWIASGELYGDRQVYLHLLDIPPAMNRLTALTMELEDCAFPHLAGFVATTDPKA

GHRAWDILEHRGFTSAASPTKAAIQHMKAWLFGTAPGEVLSMGIPVPEGNPYGIKPGVVFSFPCNVDKEGKIHVV

EGFKVNDWLREKLDFTEKDLFHEKEIALNHLAQGG*

from modeller import *

log.verbose()

env = environ()

#-- Prepare the input files

#-- Read in the sequence database

sdb = sequence_db(env)

chains_list='ALL', minmax_db_seq_len=(30, 4000), clean_sequences=True)

#-- Write the sequence database in binaryform

sdb.write(seq_database_file='pdb_95.bin',seq_database_format='BINARY',

chains_list='ALL')

#-- Now, read in the binary database

chains_list='ALL')

#-- Read in the target sequence/alignment

aln = alignment(env)

aln.append(file='TvLDH.ali',alignment_format='PIR', align_codes='ALL')

#-- Convert the input sequence/alignmentinto

#  profile format

prf = aln.to_profile()

#-- Scan sequence database to pick uphomologous sequences

prf.build(sdb, matrix_offset=-450,rr_file='{LIB}/blosum62.sim.mat', gap_penalties_1d=(-500, -50), n_prof_iterations=1, check_profile=False, max_aln_evalue=0.01) #-- Write out the profile in text format prf.write(file='build_profile.prf',profile_format='TEXT') #-- Convert the profile back to alignmentformat aln = prf.to_alignment() #-- Write out the alignment file aln.write(file='build_profile.ali',alignment_format='PIR') 输入文件：pdb_95.pir #这是一个结构数据库的序列文件，查询的序列文件TvLDH.ali; 利用到的氨基酸相似度矩阵：blosum62.sim.mat 输出文件：build_profile.ali,build_profile.prf 整个脚本的思路：读结构序列的数据库文件，然后将其转成机器码； 读要查询的序列文件（ali,将其转成Prf格式。 设置查询条件（1氨基酸相似度的矩阵blosum62.sim.mat 2 ‘’’ The profile.build()command has many options. In this example rr_file is set to use the BLOSUM62similarity matrix (file "blosum62.sim.mat" provided in the MODELLER distribution). Accordingly, the parameters matrix_offset and gap_penalties_1d areset to the appropriate values for the BLOSUM62 matrix. For this example, wewill run only one search iteration by setting the parameter n_prof_iterationsequal to 1. Thus, there is no need for checking the profile for deviation (check_profile set toFalse). Finally, the parameter max_aln_evalueis set to 0.01, indicating that only sequences with e-values smaller than orequal to 0.01 will be included in the final profile. ‘’’’ 注意：ali即为PIR格式，序列比对格式 Prf序列搜索格式 直接用python searchSTR.py就可以运行脚本，不推荐使用python2.3库的mod9.12解释器，因为这需要另外安装python2.3的库。 Step2挑选合适的模板 首先分析build_profile.prf文件， 11列（倒数第二列）是序列的相似度，越大越好；最后一列是显著性，越小越好。 第二列是对应的pdbcode（这时一组pdb的代表）。 根据显著性来挑选4-10pdbcode 这里我们首先挑选了显著性为06pdb出来分析（显著性的指标更加好）。 基于显著性进行排序。 得到(1bdm:A, 5mdh:A, 1b8p:A,1civ:A, 7mdh:A, and 1smk:A).6个结果。 再从这6个结构当中选择最合适的作为模板（6个结构之间结构相似度序列相似度）。 借助于脚本compare.py from modeller import * env = environ() aln = alignment(env) for (pdb, chain) in (('1b8p', 'A'), ('1bdm','A'), ('1civ', 'A'), ('5mdh', 'A'), ('7mdh','A'), ('1smk', 'A')): m =model(env, file=pdb, model_segment=('FIRST:'+chain, 'LAST:'+chain)) aln.append_model(m, atom_files=pdb, align_codes=pdb+chain) aln.malign() aln.malign3d() aln.compare_structures() aln.id_table(matrix_file='family.mat') env.dendrogram(matrix_file='family.mat',cluster_cut=-1.0) 在运行这个脚本之前，需要把相关的蛋白下载到你的工作目录中，我利用pymol中的fetch进行下载。 这个脚本不能自动产生log文件，所以需要你主动保存为log文件 pythoncompare.py >compare.log 找相近的，分组找代表的。 ?? 不知道这里的（compare.log39 50.5 55.37是什么意思？？不清楚， 知道的人请和我联系，744891290@qq.com，感觉好像是越小越好。 一共聚了3类，第一类有3第二类有2个，第3类有1；所以第一类是优先的，然后在 从分辨率上考虑的化选1bdm。这里只挑选了1个模板，所以是基于单模板的同源模建。 step3 比对 在进行基于模板的同源模建之前需要进行序列比对。 把模板的序列和要模建的序列进行比对。 利用脚本align2d.py from modeller import * env = environ() aln = alignment(env) mdl = model(env, file='1bdm',model_segment=('FIRST:A','LAST:A')) aln.append_model(mdl, align_codes='1bdmA',atom_files='1bdm.pdb') aln.append(file='TvLDH.ali',align_codes='TvLDH') aln.align2d() aln.write(file='TvLDH-1bdmA.ali',alignment_format='PIR') aln.write(file='TvLDH-1bdmA.pap',alignment_format='PAP') Step4 同源模建，根据得到的比对，进行模建，产生不同的模型； Generatemodel.py from modeller import * from modeller.automodel import * env = environ() a = automodel(env, alnfile='TvLDH-1bdmA.ali', knowns='1bdmA', sequence='TvLDH', assess_methods=(assess.DOPE, assess.GA341)) a.starting_model = 1 a.ending_model = 5 a.make() 这里是产生5个模建的结果。 注意这里也不会自动产生log文件，需要你主动生成 pythongeneratemodel.py >genemodes.log Step5 对产生的模型进行评价，挑选最好的模型 方法1：根据打分进行挑选 DOPE打分越低越好，负的越大越好 GA341打分越高越好 GA341 is not as good as DOPE at distinguishing 'good' modelsfrom 'bad' models molPDF是默认打分函数，越小越好 Yun He wrote: Currently, I am working on a model based on a very lowsequence-identity template (about 13%). I use an alignment to produce 30 models,and then I sort these models by their molpdf or GA341 or DOPE. However, thesorts turn out very different results, some models with higher GA341 scorestakes higher DOPE scores. I am confused, which one should I choose? No model assessment score is perfect, soyou should not expect them to be perfectly correlated. A 'good' model will have a GA341 score near1.0 (larger is better) but a small DOPE score (DOPE scores, like molpdf, are"energies", so smaller or negative is better). Neither DOPE nor GA341 show a lot of discrimination forpoor models - e.g. you can't really say that a model with a GA341 score of 0.1is 'worse' than one with 0.2 (the score was developed to tell between good andbad models, not to rank bad models). Anything worse than 0.6 or so is a badmodel. 对挑选的模板再进行评价； 看看那些区域是合理的，那些区域是不合理的。 借助于脚本evaluate_mode.py from modeller import * from modeller.scripts import complete_pdb log.verbose() # request verbose output env = environ() env.libs.topology.read(file='(LIB)/top_heav.lib')# read topology

mdl = complete_pdb(env,'TvLDH.B99990002.pdb')

# Assess with DOPE:

s = selection(mdl)   # all atom selection

s.assess_dope(output='ENERGY_PROFILENO_REPORT', file='TvLDH.profile',

normalize_profile=True, smoothing_window=15)

Qq 744891290@qq.com

(国外网站要用代理看)

https://wap.sciencenet.cn/blog-950202-726944.html

## 相关博文

GMT+8, 2021-12-5 03:09