|
A matrix is arectangular table of values divided into rows and columns. An m×n matrixhas m rows and n columns. Given a matrix A, wewrite Ai,j to indicate the value found at the intersection ofrow i and column j.
Say that we have a collection of DNA strings, all having thesame length n. Their profile matrix isa 4×n matrix P inwhich P1,j represents the number of times that 'A' occurs inthe jthposition ofone of the strings, P2,j represents the number of times that C occursin the jth position, and so on (see below).
A consensus string c isa string of length n formed from our collection by taking the mostcommon symbol at each position; the jth symbol of c thereforecorresponds to the symbol having the maximum value in the j-th column ofthe profile matrix. Of course, there may be more than one most common symbol,leading to multiple possible consensus strings.
A T C C A G C T | |
G G G C A A C T | |
A T G G A T C T | |
A A G C A A C C | |
T T G G A A C T | |
A T G C C A T T | |
A T G G C A C T | |
A 5 1 0 0 5 5 0 0 | |
C 0 0 1 4 2 0 6 1 | |
G 1 1 6 3 0 1 0 0 | |
T 1 5 0 0 0 1 1 6 | |
A T G C A A C T |
Given: Acollection of at most 10 DNA strings of equallength (at most 1 kbp)in FASTA format.
Return: Aconsensus string and profile matrix for the collection. (If several possibleconsensus strings exist, then you may return any one of them.)
Sample Dataset
>Rosalind_1
ATCCAGCT
>Rosalind_2
GGGCAACT
>Rosalind_3
ATGGATCT
>Rosalind_4
AAGCAACC
>Rosalind_5
TTGGAACT
>Rosalind_6
ATGCCATT
>Rosalind_7
ATGGCACT
Sample Output
ATGCAACT
A: 5 1 0 0 5 5 0 0
C: 0 0 1 4 2 0 6 1
G: 1 1 6 3 0 1 0 0
T: 1 5 0 0 0 1 1 6
针对以上案例,我写了如下代码:
#!/usr/bin/python
f=open("data1.txt","r")
f1=f.readlines()
titleID=[]
sequence=[]
for i in f1:
i.strip()
if i.startswith(">"):
titleID.append(i)
else:
sequence.append(i)
b=zip(*sequence)
j=0
g=0
h=0
k=0
l=0
A=[]
C=[]
G=[]
T=[]
while j<len(b):
for c in b[j]:
if c=='A':
g+=1
elif c=='C':
h+=1
elif c=='G':
k+=1
else:
l+=1
A.append(g)
C.append(h)
G.append(k)
T.append(l)
g=0
h=0
k=0
l=0
j+=1
H=zip(A,C,G,T)
consensus=[]
m=0
while m<len(H):
if max(H[m])==H[m][0]:
consensus.append('A')
elif max(H[m])==H[m][1]:
consensus.append('C')
elif max(H[m])==H[m][2]:
consensus.append('G')
elif max(H[m])==H[m][3]:
consensus.append('T')
m+=1
print consensus
print 'A: ', A
print 'G: ', G
print 'C: ', C
print 'T: ', T
Archiver|手机版|科学网 ( 京ICP备07017567号-12 )
GMT+8, 2024-3-29 05:33
Powered by ScienceNet.cn
Copyright © 2007- 中国科学报社