Bioinformatics Seminar
The Bioinformatics Seminar is co-sponsored by the Department of Mathematics at the Massachusetts Institute of Technology and the Theory of Computation group at MIT's Computer Science and Artificial Intelligence Laboratory (CSAIL). The seminar series focuses on highlighting areas of research in the field of computational biology. This year, we are hoping to highlight three topics: (1) evolution and computational approaches to modeling and understanding it, (2) generative AI for biology/biomedicine, and (3) algorithms for computational biology/genomics.
Fall 2024
Lectures are on Wednesdays, 11:30am - 1:00pm ET
Location: 32G-575 (Stata Center at MIT; Gates Tower; 5th Floor)
Zoom link for virtual attendants: https://harvard.zoom.us/j/99103715484
Date | Speaker | Title/Abstract |
---|---|---|
Sept. 11 | Roshan Rao (EvolutionaryScale) |
Multimodal Protein Foundation Models How can multimodality improve representations of proteins? Foundation models have shown promise in building powerful representations for many domains. Language models are able to access a vast quantity of human knowledge and are able to perform limited reasoning over this body of knowledge. Protein models learn the evolutionary patterns in proteins, enabling prediction of protein structure and function. This talk will cover the development of protein foundation models, understanding the representations they build, and how they scale. Finally, it will cover incorporating modalities beyond protein sequences, and how additional data could be added to produce better representations in the future. |
Sept. 18 | Ben Langmead (JHU) |
Pan-genomic advances for fighting reference bias Sequencing data analysis often begins with aligning reads to a reference genome, where the reference takes the form of a linear string of bases. But linearity leads to reference bias, a tendency to miss or misreport alignments containing non-reference alleles, which can confound downstream statistical and biological results. This is a major concern in human genomics; we don't want to live in a world where diagnostics and therapeutics are differentially effective depending whether and where our genetic variants happen to match the reference. Fortunately, computer science and bioinformatics are meeting the moment. We can now index and align sequencing reads to references that include many population variants. I will present some of the major and insights that have shaped this journey from the early days of efficient genome indexing -- especially the Burrows-Wheeler Transform -- continuing through recent methods for indexing graph-shaped references and references that include many genomes. I will emphasize recent results that show how to optimize simple and complex pan-genome representations for effective avoidance of reference bias. Finally, I will outline promising methods for the bias, including new ideas for how to measure bias, new proposals in compressed indexing, and new workflows that integrate genotype imputation to improve reference bias. |
Sept. 25 | Ard Louis (Oxford)* |
Does evolution have an inbuilt bias towards highly compressible phenotypes? Darwinian evolution proceeds by natural selection acting on random variation. I will argue that, although mutations are random, the novel phenotypes they produce can be highly biased towards simple or more compressible forms. This bias is so strong that it can dramatically shape the spectrum of adaptive outcomes. The basic intuition follows from an algorithmic twist on the infinite monkey theorem inspired by the fact that natural selection doesn’t act directly on mutations, but rather on the phenotypes that are generated by developmental programmes. If monkeys type at random in a computer language, they are much more likely to generate outputs derived from shorter algorithms. This intuition can be formalised with the coding theorem of algorithmic information theory, predicting that random mutations are exponentially more likely to result in simpler, more compressible phenotypes with low descriptional (Kolmogorov) complexity. Evidence for this evolutionary Occam’s razor can be found in the symmetry in protein complexes [1], and in the simplicity of RNA secondary structures [2], gene regulatory networks, leaf shape, and Richard Dawkins’ biomorphs model of development [3]. This principle may also extend to machine learning, offering insights into why neural networks generalize well on typical datasets [4]. [1] Symmetry and simplicity spontaneously emerge from the algorithmic nature of evolution, IG Johnston, et al, PNAS 119 (11), e2113883119 (2022); |
Oct. 2 | Smita Krishnaswamy (Yale)* |
Inferring and Characterizing Cellular and Neural Dynamics with Geometric and Topological Deep Learning In the last decade there has been a data revolution in biology with the advent of high-throughput high dimensional data modalities such as single-cell RNA-sequencing, fMRI data, molecular structure data and other modalities. A key issue in these data types is that they provide static snapshots of highly dynamic biological entities. In this talk I will cover our work inferring and characterizing cellular and neural dynamics during various processes. First, I will cover how to infer cell state dynamics during differentiation and disease with a neural ODE framework called MIOflow that is regularized with data geometric and manifold priors. Then I will discuss RITINI, our recent graph ODE network which allows us to learn gene regulation that underlies cellular dynamics, and potentially find new targets for treatments of disease. I will showcase applications of these in triple negative breast cancer and human embryonic stem cell differentiation. Once these dynamics are available, I will showcase tools to quantify and classify these dynamics based on graph signal processing and topological data analysis. This will involve our learnable geometric scattering transform to capture spatial signal patterns, as well as persistence homology and other tools to quantify time-varying patterns. Applications to characterization of brain activity data will be presented. |
Oct. 9 | Sriram Sankararaman (UCLA) |
Understanding the genetic basis of complex traits from Biobank-scale data: Statistical and Computational challenges The quest to understand the interplay between evolution, genes and traits has been revolutionized by the collection of rich phenotypic and genetic data across millions of individuals in diverse populations. However analyses of these Biobank-scale datasets present substantial statistical and computational challenges. I will describe how we bring together statistical and computational insights to design accurate and highly scalable algorithms for a suite of problems that arise in the analysis of Biobank data: highly scalable randomized inference algorithms to dissect the genetic architecture of complex traits and deep-learning based phenotype imputation to deal with complex patterns of missingness. By applying these methods to about half a million individuals from the UK Biobank, we obtain novel insights how genetic effects are distributed across the genome, the relative contributions of additive, dominance and gene-environment interaction effects to trait variation, and new genes that confer risk for hard-to-measure diseases. |
Oct. 16 | Kevin K. Yang (Microsoft Research) |
Deep generative models for protein engineering Deep generative models are increasingly powerful tools for the in silico design of novel proteins. Recently, a family of generative models called diffusion models has demonstrated the ability to generate biologically plausible proteins that are dissimilar to any actual proteins seen in nature, enabling unprecedented capability and control in de novo protein design. However, current state-of-the-art models generate protein structures, which limits the scope of their training data and restricts generations to a small and biased subset of protein design space. Here, we introduce a general-purpose diffusion framework, EvoDiff, that combines evolutionary-scale data with the distinct conditioning capabilities of diffusion models for controllable protein generation in sequence space. EvoDiff generates high-fidelity, diverse, and structurally-plausible proteins that cover natural sequence and functional space. Critically, EvoDiff can generate proteins inaccessible to structure-based models, such as those with disordered regions, while maintaining the ability to design scaffolds for functional structural motifs, demonstrating the universality of our sequence-based formulation. We envision that EvoDiff will expand capabilities in protein engineering beyond the structure-function paradigm toward programmable, sequence-first design. |
Oct. 23 | Bin Yu (UC Berkeley)* |
TBA |
Oct 30 | Adam Phillippy (NIH) |
TBA |
Nov. 6 | Ava Amini (Microsoft Research) |
TBA |
Nov. 13 | Michael Desai (Harvard) |
TBA |
Nov. 20 | Jesse Bloom (Fred Hutchinson Cancer Center)* |
TBA |
Nov. 27 | Aleksandra Walczak (Ecole Normale Supérieure)* |
TBA |
Dec. 4 | Tristan Bepler (New York Structural Biology Center) |
TBA |
Dec. 11 | David Van Valen (Caltech)* |
TBA |
*Indicates the speaker will be presenting over Zoom. Otherwise, they will be presenting in person.
Past Terms
A listing of the Bioinformatics Seminar series home pages from prior terms.
- Fall 2024
- Fall 2023
- Fall 2022
- Spring 2022
- Fall 2021
- Spring 2021
- Spring 2020
- Spring 2019
- Spring 2018
- Spring 2017
- Spring 2016
- Spring 2015
- Spring 2013
- Spring 2011
- Spring 2010
- Spring 2009
- Fall 2008
- Fall 2007
- Spring 2007
- Fall 2006
- Spring 2006
- Fall 2005
- Spring 2005
- Fall 2004
- Spring 2004
- Fall 2003
- Spring 2003
- Spring 2001
Organizers and Information
The Bioinformatics Seminar is hosted by MIT Simons Professor of Mathematics and head of the Computation and Biology group at CSAIL Bonnie Berger. Professor Berger is also Faculty of Harvard-MIT Health Sciences & Technology, Associate Member of the Broad Institute of MIT and Harvard, Faculty of MIT CSB, and Affiliated Faculty of Harvard Medical School.
The seminar is announced weekly via email to members of the seminar's mailing list and to those on CSAIL's event calendar list. It is also posted in the BioWeek calendar.
Bonnie Berger: bab@mit.edu
Anna Sappington (TA): asapp@mit.edu
To be added to the seminar's email announcement list or for any questions you have about the seminar, please mail bioinfo@csail.mit.edu and cc TA Anna Sappington (asapp@mit.edu).
If you plan to enroll in the associated course, 18.418/HST.504: Topics in Computational Molecular Biology, please contact Professor Berger (bab@mit.edu) and cc TA Anna Sappington (asapp@mit.edu) for more information.