The Bioinformatics Seminar is co-sponsored by the Department of Mathematics at the Massachusetts Institute of Technology and the Theory of Computation group at MIT's Computer Science and Artificial Intelligence Laboratory (CSAIL). The seminar series focuses on highlighting areas of research in the field of computational biology. This year, we are hoping to highlight three topics: (1) language models and their uses in biology and biomedicine, (2) ethical and societal issues in biomedical data (e.g. privacy, fairness), and (3) single-cell biology.
Lectures are on Wednesdays, 11:30am - 1:00pm ET
Hybrid Lectures will be held in Stata (32) G-575.
Zoom link for virtual attendants: https://mit.zoom.us/j/93513735220
|Sep. 13 (hybrid)||Manolis Kellis
From genomics to therapeutics: single cell dissection of disease circuitry
Disease-associated variants lie primarily in non-coding regions, increasing the urgency of understanding how gene-regulatory circuitry impacts human disease. To address this challenge, we generate comparative genomics, epigenomic, and transcriptional maps, spanning 823 human tissues, 1500 individuals, and 20 million single cells. We link variants to target genes, upstream regulators, cell types of action, and perturbed pathways, and predict causal genes and regions to provide unbiased views of disease mechanisms, sometimes re-shaping our understanding. We find that Alzheimer’s variants act primarily through immune processes, rather than neuronal processes, and the strongest genetic association with obesity acts via energy storage/dissipation rather than appetite/exercise decisions. We combine single-cell profiles, tissue-level variation, and genetic variation across healthy and diseased individuals to map genetic effects into epigenomic, transcriptional, and function changes at single-cell resolution, to recognize cell-type-specific disease-associated somatic mutations indicative of mosaicism, and to recognize multi-tissue single-cell effects. We expand these methods to electronic health records to recognize multi-phenotype effects of genetics, environment, and disease, combining clinical notes, lab tests, and diverse data modalities despite missing data. We integrate large cohorts to factorize phenotype-genotype correlations to reveal distinct biological contributors of complex diseases and traits, to partition disease complexity, and to stratify patients for pathway-matched treatments. Lastly, we develop massively-parallel, programmable and modular technologies for manipulating these pathways by high-throughput reporter assays, genome editing, and gene targeting in human cells and mice, to propose new therapeutic hypotheses in Alzheimer’s, ALS, obesity, cardiac disease, schizophrenia, aging, and cancer. These results provide a roadmap for translating genetic findings into mechanistic insights and ultimately new therapeutic avenues for complex disease and cancer.
|Sep. 20 (hybrid)||Ellen Zhong
Machine learning for determining protein structure and dynamics from cryo-EM images
Major technological advances in cryo-electron microscopy (cryo-EM) have produced new opportunities to study the structure and dynamics of proteins and other biomolecular complexes. However, this structural heterogeneity complicates the algorithmic task of 3D reconstruction from the collected dataset of 2D cryo-EM images. In this seminar, I will overview cryoDRGN and related methods that leverage the representation power of deep neural networks for cryo-EM reconstruction. Underpinning the cryoDRGN method is a deep generative model parameterized by an implicit neural representation of 3D volumes and a learning algorithm to optimize this representation from unlabeled 2D cryo-EM images. Extended to real datasets and released as an open-source tool, these methods have been used to discover new protein structures and visualize continuous trajectories of protein motion. I will discuss various extensions of the method for scalable and robust reconstruction, analyzing the learned generative model, and visualizing dynamic protein structures in situ.
|Sep. 27 (hybrid)||Stephen Quake
|A Decade of Molecular Cell Atlases|
|Oct. 4 (virtual)||David Knowles
|Oct. 11 (virtual)||Bogdan Pasaniuc
|Oct. 18 (hybrid)||William Yu
Augmenting k-mer sketching for (meta)genomic sequence comparisons
Over the last decade, k-mer sketching (e.g. minimizers or MinHash) to create succinct summaries of long sequences has proven effective at improving the speed of sequence comparisons. However, rigorously characterizing the accuracy of these techniques has been more difficult. In this talk, I'll touch on three results that showcase some of the modern theoretical developments and practical applications of theory to building faster sequence comparison tools for metagenomics.
We begin by rigorously providing average-case guarantees for the popular seed-chain-extend heuristic for pairwise sequence alignment under a random substitution model, showing that it is accurate and runs in close to O(n log n) time for similar sequences. Then, we will turn our focus to metagenomics: our new tool skani computes average nucleotide identity (ANI) using sparse approximate alignments, and is both more accurate and over 20 times faster than the current state-of-the-art FastANI for comparing incomplete, fragmented MAGs (metagenome assembled genomes). This was enabled by Belbasi, et al.'s work showing that minimizers are biased Jaccard estimators, whereas other k-mer sketching does not have that drawback. Finally, we will introduce sylph (unpublished work), which enables fast and accurate database search to find nearest neighbor genomes (in ANI space) of low-coverage sequenced samples by using a combination of k-mer sketching with a zero-inflated Poisson correction (45x faster than MetaPhlAn for screening databases).
All of the work in this talk is joint with my brilliant PhD student Jim Shaw.
Shaw J, Yu YW. Proving sequence aligners can guarantee accuracy in almost O (m log n) time through an average-case analysis of the seed-chain-extend heuristic. Genome Research (2023) 33 (7), 1175-1187 Shaw J, Yu YW. Fast and robust metagenomic sequence comparison through sparse chaining with skani. Nature Methods (2023).
|Oct. 25 (hybrid)||Hoon Cho
|Nov. 1 (virtual)||Brian Hie
|Nov. 8 (hybrid)||Cory McLean
|Nov. 15 (virtual)||Rohit Singh
|Nov. 22 (virtual)||JP Hubaux
|Nov. 29 (hybrid)||Barbara Engelhardt
|Dec. 6 (virtual)||Emma Pierson
Using machine learning to increase equity in healthcare and public health.
Our society remains profoundly unequal. This talk discusses how data science and machine learning can be used to combat inequality in health care and public health by presenting several vignettes from domains like policing, women's health, and cancer risk prediction.
|Dec. 13 (hybrid)||Yun Song
A listing of the Bioinformatics Seminar series home pages from prior terms.
- Fall 2023 (Current)
- Fall 2022
- Spring 2022
- Fall 2021
- Spring 2021
- Spring 2020
- Spring 2019
- Spring 2018
- Spring 2017
- Spring 2016
- Spring 2015
- Spring 2013
- Spring 2011
- Spring 2010
- Spring 2009
- Fall 2008
- Fall 2007
- Spring 2007
- Fall 2006
- Spring 2006
- Fall 2005
- Spring 2005
- Fall 2004
- Spring 2004
- Fall 2003
- Spring 2003
- Spring 2001
Organizers and Information
The Bioinformatics Seminar is hosted by MIT Simons Professor of Mathematics and head of the Computation and Biology group at CSAIL Bonnie Berger. Professor Berger is also Faculty of Harvard-MIT Health Sciences & Technology, Associate Member of the Broad Institute of MIT and Harvard, Faculty of MIT CSB, and Affiliated Faculty of Harvard Medical School.
Bonnie Berger: email@example.com
Shuvom Sadhuka (TA): firstname.lastname@example.org
To be added to the seminar's email announcement list or for any questions you have about the seminar, please mail email@example.com.
If you plan to enroll in the associated course, 18.418/HST.504: Topics in Computational Molecular Biology, please contact Professor Berger (firstname.lastname@example.org) and cc TA Shuvom Sadhuka (email@example.com) for more information.