PhD, University of Cambridge
At A Glance
Changes in gene regulation often underlie the mechanism of genetic disorders and cancer. These changes can arise from variation in genomic DNA sequence. They can also come from alterations in epigenomic properties, such as DNA methylation, chromatin packaging, histone modifications, or 3D chromosome conformation. New sequencing technology reveals a forest of genomic and epigenomic variation, but we are hindered by insufficient understanding of the variation's consequences. As a result, we can apply these data to diagnosis or personalized drug therapy only in limited cases.
Our research program has three different themes, organized around addressing this gap in knowledge to understand interactions between genome, epigenome, and phenotype in human cancers.
Theme 1. Computational predictive models of gene regulation. We apply a systematic framework to create and validate predictive models of (1) how genetic variants cause epigenomic changes, and (2) the effect of epigenomic changes on gene regulation and phenotype. First, we start with data from collaborators or public resources, using cancer cell lines and cancer patient primary tissue. Second, we develop machine learning models of how a genomic or epigenomic input leads to an epigenomic or phenotypic output. Third, we perturb input data and predict changes in output. Fourth, we validate predictions with targeted experiments.
Theme 2. Epigenomic liquid biopsy. Working with several other Medical Biophysics faculty, we are developing improvements and new applications for the cell-free methylation DNA immunoprecipitation-sequencing (cfMeDIP-seq) epigenomic liquid biopsy technique developed at Princess Margaret Cancer Centre. These improvements will lead to more reliable identification of gene expression program and diagnosis of cancer from a minimally invasive blood draw, replacing invasive tissue biopsies. We are also applying cfMeDIP-seq in new domains such as diagnosis of preterm birth disorders through maternal blood draws.
Theme 3. Robustness, reproducibility, and transparency in biological research. Like many computational biology and genomics researchers, we rely on, and contribute to, a common base of shared computational tools and data. We work to establish practices that ensure data and code are shared in ways that maximizing the benefit of publicly funded research.
Michael Hoffman creates predictive computational models to understand interactions between genome, epigenome, and phenotype in human cancers. His influential machine learning approaches have reshaped researchers' analysis of gene regulation. These approaches include the genome annotation method Segway, which enables simple interpretation of large multivariate genomic data. He is a Senior Scientist at Princess Margaret Cancer Centre and Associate Professor in the Departments of Medical Biophysics and Computer Science, University of Toronto. He was named a CIHR New Investigator and has received several awards for his academic work, including the NIH K99/R00 Pathway to Independence Award, and the Ontario Early Researcher Award.
Dr. Hoffman is an established international leader in the increasingly important field of computational biology, where his widely used methods for analyzing epigenomics data have captured the attention of the community. Dr. Hoffman’s commitment to sharing his research through reusable software and genome annotations has amplified the impact of his work, which has enabled scientists around the world to address fundamental questions in biological and biomedical research.
Dr. Hoffman’s research program focuses on creating and validating predictive models of how genetic variants cause epigenomic changes and the effect of epigenomic changes on gene regulation and phenotype. Within this broad theme, he has excelled in several areas of concentration.
Integrative analysis of multiple epigenomic data types. Dr. Hoffman created the genomic annotation method Segway (Hoffman et al., Nature Methods 2012). Segway’s machine learning model integrates multiple epigenomic datasets and categorizes each base in a genome (such as transcription start, enhancer, insulator, or repressed).
Segway was the first method to enable simple interpretation and visualization of multivariate data across the entire genome at single-base resolution. It helped establish an area of research continued by Dr. Hoffman (Roberts et al., bioRxiv 2016; Chan et al., Bioinformatics 2018; Chan et al., bioRxiv 2020; Mendez et al., bioRxiv 2021) and many other groups (reviewed in Libbrecht, Chan, and Hoffman, PLOS Computational Biology 2021).
Coordinating an international network of scientists, Dr. Hoffman spearheaded an effort to label the human genome using Segway (Hoffman et al., Nucleic Acids Research 2013)—a major part of the ENCODE analysis, which transformed our thinking about the role of noncoding DNA (ENCODE Project Consortium, PLOS Biology 2011; ENCODE Project Consortium, Nature 2012). He used Segway for cross-species genome annotation in human, worm, and fly (Ho et al., Nature 2014).
Segway's global impact is demonstrated by the many scientists who run the software or use Segway annotations. These annotations are displayed by both the Ensembl and UCSC genome browsers. They also form a building block for highly used noncoding interpretation tools like CADD and the Ensembl Regulatory Build, and in other large genome analysis efforts like FANTOM6 (Ramilowski et al., Genome Research 2019; Agrawal et al., bioRxiv 2022).
In related work, Dr. Hoffman helped develop dynamic Bayesian network methods to identify chromatin footprints (Chen et al., Bioinformatics 2010), incorporate 3D genome organization into automated genome annotation (Libbrecht et al., ICML 2015; Libbrecht et al., Genome Research 2015), and make Segway fully automated (Libbrecht et al., Genome Biology 2019). He also developed new methods for benchmarking machine learning classifiers (Cao, Chicco, and Hoffman, arXiv 2020).
Analysis of DNA methylation. Dr. Hoffman developed the first method to directly model how sequence-specific transcription factors bind covalently modified DNA (Viner et al., bioRxiv 2022) and the DNAmod database of DNA modifications (Sood et al., Journal of Cheminformatics 2019). He helped determine machine learning approaches for the cell-free methylated DNA immunoprecipitation (cfMeDIP) liquid biopsy technique (Shen et al., Nature 2018), and played an important role in proving its utility in head and neck cancer (Burgener et al., Clinical Cancer Research 2021), and developing new standards for improving its robustness (Wilson et al., bioRxiv 2021). He was lead inventor on a patent application (PCT/CA2020/051507) instrumental in the creation of the startup company Adela.
Understanding chromatin biology. Dr. Hoffman’s Virtual ChIP tool predicts transcription factor binding sites far more accurately than previous methods (Karimzadeh and Hoffman, Genome Biology 2022). His method to combine knockout and control chromatin immunoprecipitation-sequencing (ChIP-seq) experiments identifies better transcription factor motifs (Denisko, Viner, and Hoffman, bioRxiv 2021). He created the first method that uses 3D genome organization data to identify the function of sets of intergenic genomic regions (Chicco et al., bioRxiv 2019).
Dr. Hoffman created a method for finding chimeric host-integrated human papillomavirus (HPV) in multiple types of cancer epigenomics data (Karimzadeh et al., bioRxiv 2021). He worked on ChromNet, which identifies the network of interactions among transcription factors and histones (Lundberg et al., Genome Biology 2016).
Comparative genomics of gene regulation. Dr. Hoffman established several models and computational methods that characterize differences in gene regulation across species and to understand the noncoding sequence evolution. He created Sunflower, which predicts effects of genetic variation on transcription factor binding and competition between transcription factors for the same piece of DNA, originating a widespread "motif-breaker" approach (Hoffman and Birney, Genome Research 2010). Dr. Hoffman also developed methods to quantify the rate of evolution in vertebrate intronic sequence (Hoffman and Birney, Molecular Biology and Evolution 2007).
Fundamental software and standards for genomic and epigenomic analysis. Dr. Hoffman develops robust and reusable software packages and databases to accelerate and facilitate research in the computational genomics community more generally. Software packages include Umap and Bismap (Karimzadeh et al., Nucleic Acids Research 2018), which provide important standards for functional genomics inference, Segtools (Buske et al., BMC Bioinformatics2011), which provides summary statistics and visualizations of genomic annotations, and Genomedata (Hoffman et al., Bioinformatics 2010), which provides high-performance access to dense genomic signal. He also helped develop standards for ChIP-seq data (Landt et al., Genome Research 2012), and led the standardization of a widely used file format by the Genomics Alliance for Genomics and Health (Niu, Denisko, and Hoffman, Bioinformatics 2022; Rehm et al., Cell Genomics 2021).
Review and commentary. Dr. Hoffman has helped define the direction of machine learning research in biology and computational genomics through his review and commentary articles. He has co-written highly cited articles that survey work in deep learning (Ching et al., Journal of the Royal Society Interface 2018), integrating multiple types of biological and medical data (Zitnik et al., Information Fusion 2019), machine learning reproducibility (Haibe-Kains et al., Nature2020; Heil et al., Nature Methods 2021), sharing biological data (Wilson et al., FEBS Letters 2021), documenting bioinformatics software (Karimzadeh and Hoffman, Briefings in Bioinformatics 2018), and statistical learning methods (Franke et al., International Statical Review 2016). He helped write an influential article on new representations of the human genome assembly (Church et al., Genome Biology 2015).
Karimzadeh M, Hoffman MM. “Virtual ChIP-seq: predicting transcription factor binding by learning from the transcriptome.” Genome Biology 2022; 23:126.
Karimzadeh M, Arlidge C, Rostami A, Lupien M, Bratman SV, Hoffman MM. “Human papillomavirus integration transforms chromatin to drive oncogenesis.” 2022. Preprint: https://doi.org/10.1101/2020.02.12.942755
Viner C, Ishak CA, Johnson J, Walker NJ, Shi H, Sjöberg-Herrera MK, Shen SY, Lardo SM, Adams DJ, Ferguson-Smith AC, De Carvalho DD, Hainer SJ, Bailey TL, Hoffman MM. “Modeling methyl-sensitive transcription factor motifs with an expanded epigenetic alphabet.” 2022. Preprint: https://doi.org/10.1101/043794
Wilson SL, Shen SY, Harmon L, Burgener JM, Triche T Jr, Bratman SV, De Carvalho DD, Hoffman MM. “Sensitive and reproducible cell-free methylome quantification with synthetic spike-in controls.”2021. Preprint: https://doi.org/10.1101/2021.02.12.430289
Denisko D, Viner C, Hoffman MM. “Motif elucidation in ChIP-seq datasets with a knockout control.” 2021. Preprint: https://doi.org/10.1101/721720
Mendez M, FANTOM Consortium Main Contributors, Scott MS, Hoffman MM. “Unsupervised analysis of multi-experiment transcriptomic patterns with SegRNA identifies unannotated transcripts.” 2021. Preprint: https://doi.org/10.1101/2020.07.28.225193