17

Hidden Markov Models

instructor: Ross A. Lippert

http://www-math.mit.edu/~lippert/18.417/

Announcements:
  • Start chapter 11
  • Problem set 6 available


The CG island phenomenon

Nucleotide frequencies in the human genome

acgt
29.520.420.529.6
aaacagatcacccgctgagcgggttatctgtt
0.09780.05030.06990.07730.07250.05210.00980.070.05930.04260.05210.05050.06570.05940.07270.098
0.08720.06040.06040.08730.06040.04180.04180.06050.06040.04180.04180.06050.08730.06050.06050.0875

Explanation: CG frequently mutates into TG


How can we definitively locate high CG regions?

Numerous heuristics

None of these satisfying

Investigate two models of sequence

What can help us?


Simplification: two coins Bernoulli sequence

Suppose we have two coins, differently biased

coin usedXXXYYYXX
result11000010

Probability problem 1: likelihood of result given hidden information

What if we don't have the s's, but just have the x's?

A probabilistic model for the s's


Markov Models

Markov models generate sequences over an alphabet

Alternative description/notation

Transition matrix T =
P(F->F)P(F->B)
P(B->F)P(B->B)
Emission matrix E =
P(F~>H)P(F~>T)
P(B~>H)P(B~>T)
Starting vector pi =
P(0->F)
P(0->B)


Using Hidden Markov Models for sequence scoring

Probability problem 2:

Formally reduces to a sum of problem 1 over all possible s strings, weighted by the Markov model

This sum is very big, but can be cast in terms of matrix multiplies

pi D(x1) T D(x2) T ... D(xn) 1
T = transition matrix, D(x) = diag(E(:,x)), 1 is a vector of 1s


Obtaining conditional probabilities with HMMs

Two traditionally important auxillary quantities

Can now calculate P(sk=s|x1...xn)

Can we use P(sk=s|x1...xn) to infer a sequence of states?


Inferring best state sequence

Maximum likelihood problem 1:

It is really a tropicalized version of P(x1...xn)

This is called `the Viterbi algorithm'


Training an HMM

We can score sequences, we can tag states, but where do HMMs come from?

Maximum likelihood problem 2:

Nonlinear optimization problem with constraints: M >= 0, E >= 0, M 1 = 1, E 1 = 1


Expectation maximization

Very commonly used in machine learning

An outgrowth of general considerations of gradient search for constrained polynomials

a^ = avg # of times a is used

Baum-Welch EM algorithm

Converges to a local optimum


GENSCAN

leading HMM for gene discovery


Global alignment

The maximum likelihood path in this HMM is equivalent to global alignment with affine gap penalties

Can be generalized to sequence profiles