5

Local alignment/multi-alignment

instructor: Ross A. Lippert

http://www-math.mit.edu/~lippert/18.417/



Last time: global alignment

>HXB5_HUMAN|P09067 P09069|HOXB5 OR HOX2A|HOMEOBOX PROTEIN HOX-B5 (HOX-2A) (HHO.C10) (HU-1)|Homo sapiens (Human)
MSSYFVNSFSGRYPNGPDYQLLNYGSGSSLSGSYRDPAAMHTGSYGYNYN
GMDLSVNRSSASSSHFGAVGESSRAFPAPAQEPRFRQAASSCSLSSPESL
PCTNGDSHGAKPSASSPSDQATSASSSANFTEIDEASASSEPEEAASQLS
SPSLARAQPEPMATSTAAPEGQTPQIFPWMRKLHISHDMTGPDGKRARTA
YTRYQTLELEKEFHFNRYLTRRRRIEIAHALCLSERQIKIWFQNRRMKWK
KDNKLKSMSLATAGSAFQP
>HMAA_DROME|P29555|ABD-A|HOMEOBOX PROTEIN ABDOMINAL-A|Drosophila melanogaster (Fruit fly)
MSKFVFDSMLPKYPQFQPFISSHHLTTTPPNSSSAAVAAALAAAAASASA
SVSASSSSNNNSSNTIAGSNTSNTNNSSSSPSSSSNNNSNLNLSGGSLSP
SHLSQHLGQSPHSPVSSSSPFQQHHPQVQQQHLNHQQQQHLHHQQQQHHH
QYSSLSAALQLQQQQHHISKLAAAAVASHGHAHQQLLLTPPSAGNSQAGD
SSCSPSPSASGSSSLHRSLNDNSPGSASASASASAASSVAAAAAAAAAAA
SSSFAIPTSKMYPYVSNHPSSHGGLSGMAGFTGLEDKSCSRYTDTVMNSY
QSMSVPASASAQFAQFYQHATAAASAVSAASAGAIGVDSLGNACTQPASG
VMPGAGGAGGAGIADLPRYPWMTLTDWMGSPFERVVCGDFNGPNGCPRRR
GRQTYTRFQTLELEKEFHFNHYLTRRRRIEIAHALCLTERQIKIWFQNRR
MKLKKELRAVKEINEQARRDREEQEKMKAQETMKSAQQNKQVQQQQQQQQ
QQQQQQQQQHQQQQQQPQDHHSIIAHNPGHLHHSVVGQNDLKLGLGMGVG
VGVGGIGPGIGGGLGGNLGMMSALDKSNHDLLKAVSKVNS

global alignment can miss something here

Most of the protein is uncorrelated

But a short piece is

405 YTRFQTLELEKEFHFNHYLTRRRRIEIAHALCLTERQIKIWFQNRRMKLKK 455
----YTR-QTLELEKEFHFN-YLTRRRRIEIAHALCL-ERQIKIWFQNRRMK-KK----
201 YTRYQTLELEKEFHFNRYLTRRRRIEIAHALCLSERQIKIWFQNRRMKWKK 251

Homeobox genes


A schematic view of the issue

The interesting piece is just a substring


Affine gap penalties, a common modification

Example

The first alignment is evolutionarily more likely than the second one.

Consider penalizing for gap length AND number of gaps

ATGC--A
ATGCCTA
ATG-C-A
ATGACTA

Multiple alignment

Recurrence


Multiple alignment

Grid


Progressive alignment: e.g. CLUSTAL

Based on aligning to an alignment

Form all pairwise scores

Align them in order of some tree formed on the pair comparisons.


A multiple alignment with CLUSTAL

MULTIPLE ALIGNMENT:
HXA1_HUMAN|P49639                   PNAVRTNFTTKQLTELEKEFHFN---KYLTRARRVEIAASLQLNETQVKIWFQNRRMKQ-KKRE
HM13_CAEEL|P17488|CEH-13|HOMEO      NGTNRTNFTTHQLTELEKEFHTA---KYVNRTRRTEIASNLKLQEAQVKIWFQNRRMKE-KKRE
DLX1_MOUSE|Q64317|DLX1|HOMEOBO      IRKPRTIYSSLQLQALNRRFQQT---QYLALPERAELAASLGLTQTQVKIWFQNKRSKF-KKLM
LX10_HELTR|P42584|LOX10|HOMEOB      RRKRRILFSQAQIYELERRFRQQ---KYLSAPEREHLATFIGLTPTQVKIWFQNHRYKT-KKSK
GSC_BRARE|P53544|GSC|HOMEOBOX       KRRHRTIFTDEQLEALENLFQET---KYPDVGTREQLARKVHLREEKVEVWFKNRRAKW-RRQK
HKLA_MAIZE|P56667|KNOX10|HOMEO      RKKKKGKLPRDARQKLLHWWQLHYRWPYPSELEKAALAESTGLEAKQINNWFINQRKRH-WKQA
MB11_COPCI|P40333||MATING-TYPE      DKNEPTSPTPAYVEPCARWLKDNWYNPYPSGEVRTQIARQTRTSRKDIDAWFIDARRRIGWNE-
                                            .        .  :      *     :  :*        .:. ** : * :   .  

Homeo-domains site

Clustal site


Where do things go wrong?

AAAATTTT
TTTTGGGG
AAAAGGGG

Sometimes a bad initial alignment forces bad decisions for the rest of the alignments.

AAAATTTT
TTTTGGGG
GGGGAAAA