17
Hidden Markov Models: applications
instructor: Ross A. Lippert
http://www-math.mit.edu/~lippert/18.417/
| Announcements: |  |
Review of hidden Markov model quantities
T(i,j) : prob(state j from state i)
E(i,x) : prob(letter x emitted from state i)
Two traditionally important auxillary quantities
- f(k,s) = P(x{1}...x{k},s{k}=s), (in vectors) f(k,:) = pi D(x{1}) T ... T D(x{k})
- b(k,s) = P(x{k+1}...x{n},s{k}=s), (in vectors) b(k,:) = T D(x{k+1}) ... T D(x{n}) 1
Viterbi algorithm, tropicalized f and b
A short trip to the tropical semi-ring
Statistical mechanics motivation: partition functions
defn of a semi-ring: (Set, id+, id*, op+, op*) closed and has distributive law
Example: (R>=0, 0, 1, +, *), the non-negative reals
Example: ({R,inf}, inf, 0, min_T, +), the ``Boltzmann'' semi-ring
- min_T(a,b) = T log( exp(-a/T) + exp(-b/T) )
Example: ({R,-inf}, -inf, 0, max, +), the tropical semi-ring
- tropical quantity = log(non-neg real quantity)
Interesting results:
- Viterbi as a tropicalization of the likelihood calculation
- approximation of random gaped alignment scores
- tropical ``determinants'' of distance matrices tell us things about trees
Training an HMM
Maximum likelihood estimation (ML):
- input: a (set of) sequence x{1}...x{n} and a parameter #states
- output: the #states-order HMM with max P(x{1}...x{n})
Maximum a posteriori (MAP) also used
Nonlinear optimization problem with constraints: M >= 0, E >= 0, M 1 = 1, E 1 = 1
- P(x{1}...x{n}) is a polynomial in M,E
- polynomial optimization is NP complete
Expectation maximization
Very commonly used in machine learning
An outgrowth of general considerations of gradient search for constrained polynomials
a^ = avg # of times 'a' happens
- P(i~>x) <-- (1/C) SUM_{k: x{k}=x} f(k,i) b(k,i)
- P(i->j) <-- (1/C) SUM_{k} f(k,i) P(i->j) P(j->x{k}) b(k+1,j)
Baum-Welch EM algorithm
Converges to a local optimum
Global alignment
The maximum likelihood path in this HMM is equivalent to global alignment with affine gap penalties
HMM for some sequence y:
- I = insertion states (cyclic, emitting background bases)
- D = deletion states (no emissions)
- M = matching/mismatching states (emitting one base, very biased)

Equivalence:
- log(max{ f(i,M{j}), f(i,I{j}), f(i,D{j}) }) = i-prefix of x vs j-prefix of y = s(i,j)
| log(P(s~>x)) = | | a | c | g | t | | I{i} | s(a,_) - B | s(c,_) - B | s(g,_) - B | s(t,_) - B | | M{i} | s(a,y{i}) - C | s(c,y{i}) - C | s(g,y{i}) - C | s(t,y{i}) - C |
|
| log(P(s->s)) = | | D{i+1} | I{i} | M{i+1} | | D{i} | s(_,y{i+1}) + A | B | A + C | | I{i} | s(_,y{i+1}) + A | B | A + C | | M{i} | s(_,y{i+1}) + A | B | A + C |
|
- A, B, and C are normalizing constants
A graphical depiction of the ML path

Extensions of alignment via HMMs
Using HMMs we can align to profiles:
Krogh et al: took this idea to an extreme
- Took families of related sequences
- Trained HMMs for each family - HMMs that give high scores to all members
What can a multisequence HMM tell us?
- Viterbi on the training sequences produces good multialignments
- probabilities give position dependent alignment penalties
- probabilities highlight regions of high and low conservation
- pairs of HMMs can provide a means for divisive clustering
GENSCAN
Chris Burge, et al
leading HMM for gene discovery

GenScan's HMM
A generalized HMM
- Informally: an HMM with random strings as emissions
- Emissions: E(s,x) where s is a state and x is some finite string
Double stranded: model divided into forward and reverse strands