Posts by Collection

portfolio

Jump Rope Tricktionary: Democratizing Jump Rope Knowledge Around the World

While jump rope is one of the most physically accessible sports in the world, there exist very few resources for learning advanced skills, especially those needed to compete at an international level. I developed a simple video dictionary of hundreds of skills from basics to impossible which has reached tens of thousands of users across over 60 different countries.

publications

DeepLoop robustly maps chromatin interactions from sparse allele-resolved or single-cell Hi-C data at kilobase resolution

Published in Nature Genetics, 2022

Mapping chromatin loops from noisy Hi-C heatmaps remains a major challenge. Here we present DeepLoop, which performs rigorous bias correction followed by deep-learning-based signal enhancement for robust chromatin interaction mapping from low-depth Hi-C data. DeepLoop enables loop-resolution, single-cell Hi-C analysis. It also achieves a cross-platform convergence between different Hi-C protocols and micrococcal nuclease (micro-C). DeepLoop allowed us to map the genetic and epigenetic determinants of allele-specific chromatin interactions in the human genome. We nominate new loci with allele-specific interactions governed by imprinting or allelic DNA methylation. We also discovered that, in the inactivated X chromosome (Xi), local loops at the DXZ4 ‘megadomain’ boundary escape X-inactivation but the FIRRE ‘superloop’ locus does not. Importantly, DeepLoop can pinpoint heterozygous single-nucleotide polymorphisms and large structure variants that cause allelic chromatin loops, many of which rewire enhancers with transcription consequences. Taken together, DeepLoop expands the use of Hi-C to provide loop-resolution insights into the genetics of the three-dimensional genome.

Recommended citation: Zhang, S., Plummer, D., Lu, L. et al. DeepLoop robustly maps chromatin interactions from sparse allele-resolved or single-cell Hi-C data at kilobase resolution. Nat Genet 54, 1013–1025 (2022). https://doi.org/10.1038/s41588-022-01116-w https://www.nature.com/articles/s41588-022-01116-w

Single cell multiomic analysis reveals diabetes-associated β-cell heterogeneity driven by HNF1A

Published in Nature Communications, 2023

Broad heterogeneity in pancreatic β-cell function and morphology has been widely reported. However, determining which components of this cellular heterogeneity serve a diabetes-relevant function remains challenging. Here, we integrate single-cell transcriptome, single-nuclei chromatin accessibility, and cell-type specific 3D genome profiles from human islets and identify Type II Diabetes (T2D)-associated β-cell heterogeneity at both transcriptomic and epigenomic levels. We develop a computational method to explicitly dissect the intra-donor and inter-donor heterogeneity between single β-cells, which reflect distinct mechanisms of T2D pathogenesis. Integrative transcriptomic and epigenomic analysis identifies HNF1A as a principal driver of intra-donor heterogeneity between β-cells from the same donors; HNF1A expression is also reduced in β-cells from T2D donors. Interestingly, HNF1A activity in single β-cells is significantly associated with lower Na+ currents and we nominate a HNF1A target, FXYD2, as the primary mitigator. Our study demonstrates the value of investigating disease-associated single-cell heterogeneity and provides new insights into the pathogenesis of T2D.

Recommended citation: Weng, C., Gu, A., Zhang, S. et al. Single cell multiomic analysis reveals diabetes-associated β-cell heterogeneity driven by HNF1A. Nat Commun 14, 5400 (2023). https://doi.org/10.1038/s41467-023-41228-3 https://www.nature.com/articles/s41467-023-41228-3

Loop2Loop: Representation Learning of the 3D Genome for Multimodal Single-Cell Integration and In-Silico Chromatin Rewiring

Published in Case Western Reserve University ProQuest Dissertations & Theses, 2024

Genome sequencing has unlocked the ability to understand the genetic basis of human disease and development. However, the genome is not a linear sequence of bases, but a complex 3D structure of interacting regulatory elements. This structure is dynamic and varies between celltypes. By leveraging Hi-C data—a powerful tool for studying the 3D genome—we can begin to understand the interactions that regulate gene expression. But Hi-C data is high dimensional, noisy, and sparse; and the 3D genome is difficult to interpret in relation to other genomic data. To address this, we take a representation learning approach to the 3D genome. We introduce a set of autoencoding models: DeepLoop for bulk data, Va3DE for single-cell data, and HiGLUE for multi-modal learning. We demonstrate that simply training neural networks to compress and reconstruct chromatin loop information from Hi-C datasets can lead to the discovery of biologically relevant features. We produce some of the first ever allele-specific chromatin loop maps as well as the first Hi-C maps of celltypes in the pancreatic islet through multimodal integration of single-cell Hi-C data. After learning a representation of the 3D genome and its relationship to other genomic data, we can then simulate the effects of genetic perturbations and verify regulatory relationships or identify disease pathways. Due to the newly formed symbiotic data-driven relationship between genomics and machine learning, our field will continue developing and iterating upon these various models in tandem with new protocols for generating data, eventually converging on comprehensive models of the dynamic genome which can be queried, perturbed, and simulated in larger biological systems. We present a comprehensive exploration into how the 3D genome should be included in this new generation of research.

Recommended citation: Plummer, D. Loop2Loop: Representation Learning of the 3D Genome for Multimodal Single-Cell Integration and In-Silico Chromatin Rewiring. Doctoral dissertation, Case Western Reserve University. (2024) https://www.proquest.com/openview/c814f710061a3c2a4caddc5b9a8e012c/

A comprehensive benchmark of single-cell Hi-C embedding tools

Published in Nature Communications, 2025

Embedding is the key step in single-cell Hi-C (scHi-C) analysis which relies on capturing biological meaningful heterogeneity at various levels of genome architecture. To understand the strength and limitations of existing tools in various applications, here we use ten scHi-C datasets to benchmark thirteen embedding tools including Va3DE, a new convolutional neural network model that can accommodate large cell numbers. We built a software framework to decouple the preprocessing options of existing tools and found that no single tool works best across all datasets under default settings. The difficulty levels and preferred resolutions are different between benchmark datasets, and the choice of data representation and preprocessing strongly impact the embedding performance. Embedding cells from early embryonic stages relies on long-range compartment-scale contacts, but resolving cell cycle phases and complex tissue requires short-range loop-scale contacts. Both random-walk and inverse document frequency (IDF) transformation prefers long-range “compartment-scale” over short-range “loop-scale” embedding, while deep-learning methods better overcome sparsity at both scales and are more versatile with different resolutions. Finally, “diagonal integration” with independent data modal is a promising approach to distinguish similar cell subpopulations. Our findings underscore the significance of appropriate priors for scHi-C embedding and also offer insights into genome architecture heterogeneity.

Recommended citation: Plummer, D., Lang, X., Zhang, S. et al. A comprehensive benchmark of single-cell Hi-C embedding tools. Nat Commun 16, 9119 (2025). https://doi.org/10.1038/s41467-025-64186-4 https://www.nature.com/articles/s41467-025-64186-4

teaching

Teaching Assistant: EECS 458: Intro to Bioinformatics

Graduate course, Case Western Reserve University, Electrical Engineering and Computer Science Department, 2019

Held office hours to answer student questions and graded assignments on basic concepts in Bioinformatics.

Thesis Copromoter: Hogeschool Gent

Mentorship, HOGENT University of Applied Sciences and Arts, Ghent, Belgium, 2024

  • Advised undergraduate student in developing a computer vision pipeline for jump rope judging
    • Athlete localization and tracking using a finetuned YOLOv8 model
    • Trick segmentation and classification using a finetuned video transformer

Instructor GENE 520: Computational Human Genomics and Epigenomics

Graduate course, Case Western Reserve University, Computer and Data Science Department, 2025

Taught units starting from the basics of Python programming building up to dimension reduction and representation learning techniques for biological data with a focus on single-cell genomics and epigenomics.