4

String alignment

instructor: Ross A. Lippert

http://www-math.mit.edu/~lippert/18.417/

For class:
  • modification to the schedule
  • we have a grader/TA
gacttaagactttat
a * ** * *
c * *
c * *
a * ** * *
a * ** * *
g* *
g* *
g* *
t ** *** *
t ** *** *


Making change


Noticing a recurrence

A solution to the change problem can be expressed in terms ofsimpler solutions.


Making Change Recursively

In pseudocode

Just one problem: still O(M^d) !!!


A look at the Change search tree:

For 77 by (1,3,7)


Solution: store intermediate results

This is dynamic programming


Files of strings (FASTA format)

FASTA (fasta-A) format is ubiquitous

>sequence_id comments
atgc...
atgc...
...
>sequence_id comments
atgc...
atgc...
...
...

The initial lines are called deflines and sometimes get used to store all sorts of extra information.

Some programs impose or expect line limits on the sequence lines


Alignments

An example alignment of ATCTGATG and TGCATAC


Global Alignment by Recurrence

I found a cute demo at this site


Variation 1: Longest common subsequence


Variation 2: Sparse alignments

The problem:

Score = [Scores] - [Gaps]