19

Probabilistic methods: Gibbs Sampling

instructor: Ross A. Lippert

http://www-math.mit.edu/~lippert/18.417/

Announcements:
  • Start chapter 12
  • Problem sets to hand back


Review of the motif finding problem

A motif logo

Motifs are captured by profiles

gtatacSTART
atataaSTART
gaataaSTART
gtcttaSTART

123456
a113033
c001001
g300000
t030410
-GTATAA

Scoring function to reflect nearness to consensus

Brute force enumeration can take a great deal of time


The basic problem

This is a ridiculously general version of the problem:

This is a ridiculous sounding solution:

Candidate probabilistic systems

Tropics: sampling from f^(1/T) as T->0 should give f(opt)


Gibbs sampling

Solves the following problem:

How do we do it?

Specialized instance of more general idea Monte Carlo Markov Chains


Recasting the motif problem as a probability problem

Given letter frequencies p(x) and profile positions s{1}...s{n}

This is incidental motivation for this scoring function -- it will have nice properties


Gibbs sampling of motifs

  1. generate start state: s{1}...s{n}
  2. pick uniformly m from 1...n
  3. replace s{m} <-- s picked by 1/prob(C(s{1}...s...s{n})) weights
  4. <do whatever with the sample>
  5. goto 2

At this point, the algorithm is essentially done.

For the chosen objective function, something cute happens

prob(s) =~ C(s{1}...s...s{n}){T[s+i],i} / p(T[s+i])

Typical hack:


What can we do with this?

Other applications

  1. measure relative uncertainties in the profile quantities
  2. in general: establish sensitivity coefficients or find moments
  3. what structures are stable over time?

Simulated Annealing

We found a way to sample from prob =~ f(y{1}...y{k})

Annealing is a process by which glass is put into a highly durable state via a process of slow cooling

Same idea here: prob(T) =~ f^(1/T) and T-->0


What can go wrong?