haiferry的个人博客分享 http://blog.sciencenet.cn/u/haiferry

博文

Counting Point Mutations

已有 1774 次阅读 2015-11-4 20:54 |系统分类:科研笔记

Counting Point Mutations

 

Evolution as a Sequence of Mistakesclick tocollapse

 

 

Figure 1. A point mutation in DNA changinga C-G pair to an A-T pair.

A mutation is simply a mistake that occursduring the creation or copying of a nucleic acid, in particular DNA. Becausenucleic acids are vital to cellular functions, mutations tend to cause a rippleeffect throughout the cell. Although mutations are technically mistakes, a veryrare mutation may equip the cell with a beneficial attribute. In fact, themacro effects of evolution are attributable by the accumulated result ofbeneficial microscopic mutations over many generations.

 

The simplest and most common type ofnucleic acid mutation is a point mutation, which replaces one base with anotherat a single nucleotide. In the case of DNA, a point mutation must change thecomplementary base accordingly; see Figure 1.

 

Two DNA strands taken from differentorganism or species genomes are homologous if they share a recent ancestor;thus, counting the number of bases at which homologous strands differ providesus with the minimum number of point mutations that could have occurred on theevolutionary path between the two strands.

 

We are interested in minimizing the numberof (point) mutations separating two species because of the biological principleof parsimony, which demands that evolutionary histories should be as simplyexplained as possible.

 

Problem

 

 

Figure 2. The Hamming distance betweenthese two strings is 7. Mismatched symbols are colored red.

Given two strings s and t of equal length,the Hamming distance between s and t, denoted dH(s,t), is the number ofcorresponding symbols that differ in s and t. See Figure 2.

Given: Two DNA strings s and t of equallength (not exceeding 1 kbp).

 

Return: The Hamming distance dH(s,t).

 

Sample Dataset

GAGCCTACTAACGGGAT

CATCGTAATGACGGCCT

 

Sample Output

 

7

针对以上案例,选用以下简短代码:

#!/usr/bin/python

s1='GAGCCTACTAACGGGAT'

s2='CATCGTAATGACGGCCT'

i=0

j=0

c=0

while 0<=i<len(s1):

  while 0<=j<len(s2):

      if s1[i]!=s2[j]:

              c+=1

      else:

              c=c

      i+=1

      j=i

print c




https://wap.sciencenet.cn/blog-2887147-933374.html

上一篇:Counting GC content
下一篇:Translating RNA into Protein
收藏 IP: 159.226.67.*| 热度|

0

该博文允许注册用户评论 请点击登录 评论 (0 个评论)

数据加载中...

Archiver|手机版|科学网 ( 京ICP备07017567号-12 )

GMT+8, 2024-4-30 10:58

Powered by ScienceNet.cn

Copyright © 2007- 中国科学报社

返回顶部