Specifying Embedding Methods and Preprocessing Steps ==================================================== The main functionality of scloop is to provide a unified interface for embedding and clustering methods with a common set of preprocessing steps. The `embedding_algs` argument is a list of strings which specify the embedding methods and preprocessing steps to run. The embedding methods are specified by the name of the method followed by a `+` and then a comma-separated list of preprocessing steps. List of Embedding Methods ------------------------- **Baselines:** +--------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Name | Description | +==============+====================================================================================================================================================================================================================================+ | 1d_pca | 1-dimensional PCA baseline. Takes each contact matrix or preprocessed matrix and aggregates over rows to produce a 1D vector for each cell. Embed using PCA. | +--------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | 1d_lsi | 1-dimensional LSI baseline. Same as 1D PCA baseline, but embed using LSI instead | +--------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | 2d_pca | PCA baseline. Takes each contact matrix or preprocessed matrix and unravels the specified number of strata into a single vector representation for each cell. Embed using PCA | +--------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | 2d_lsi | LSI baseline. Same as PCA baseline but embed using LSI instead | +--------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | scHiCluster | scHiCluster method for embedding scHi-C based on VCSQRT normalization, convolution, and random-walk imputation prior to PCA embedding. By default this method will run with these preprocessing steps unless otherwise specified. | +--------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | fastHiCRep | Similarity-based method which relies on the HiCRep stratum-adjusted correlation coefficient (SCC) as a distance metric for MDS embedding | +--------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | InnerProduct | Generalization of fastHiCRep which ignores distance in correlation computation and simply computes cosine similarities of each strata vector | +--------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | cisTopic | Convert dataset into a set of discrete locus-pair occurences (bag-of-words representation) and run Latent Dirichlet Allocation for topic modeling | +--------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ **Conventional scRNA-seq/scATAC-seq methods:** +--------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Name | Description | +==============+====================================================================================================================================================================================================================================+ | scVI | Aggregate each contact matrix into a 1D vector and treat it like a transcription vector. Embed using default scVI model | +--------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | scVI_2d | Unravel each contact matrix into a 1D vector and embed using default scVI model | +--------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | peakvi | Aggregate each contact matrix into a 1D vector and treat like a binary peak vector. Embed using PeakVI | +--------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | peakvi_2d | Unravel each contact matrix into a 1D vector and embed using PeakVI | +--------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ **Deep learning methods:** +--------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Name | Description | +==============+====================================================================================================================================================================================================================================+ | Higashi | Represent entire dataset as a hypergraph and learn cell node embeddings by training a hypergraph neural network | +--------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Fast-Higashi | Higashi model based on tensor decomposition rather than training a hypergraph neural network | +--------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | 3DVI | Train an scVI model on each strata independently and concanenate to obtain final cell embeddings | +--------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | VaDE | Variational deep embedding model, trains a VAE with Gaussian mixture prior (if number of clusters is specified). Similar to 3DVI but embeds entire matrix instead of independent strata | +--------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ **Biological feature representations:** +--------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Name | Description | +==============+====================================================================================================================================================================================================================================+ | deTOKI/deDOC | Identify TADs and domain boundaries in each contact matrix and embed using PCA of domain density vectors | +--------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | InsScore | Identify domain boundaries by computing Insulation Score over sliding window and embed using PCA of domain density vectors | +--------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | scGAD | Map each cell to a gene score vector based on a set of known loci. Scores represent z-scores from the BandNorm normalization method | +--------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ List of Preprocessing Steps --------------------------- **Filtering operations:** +------------------+----------------------------------------------------------------------------------------------------------------------------+ | Name | Description | +==================+============================================================================================================================+ | quantile_ | Filter low values based on a quantile cutoff specified by ``q`` | +------------------+----------------------------------------------------------------------------------------------------------------------------+ | min_count_ | Filter values with count lower than ``N`` | +------------------+----------------------------------------------------------------------------------------------------------------------------+ **Normalization operations:** +------------------+----------------------------------------------------------------------------------------------------------------------------+ | Name | Description | +==================+============================================================================================================================+ | vc_sqrt_norm | Vanilla square-root coverage correction. Normalize by the square-root of the row-sums and column sums. | +------------------+----------------------------------------------------------------------------------------------------------------------------+ | oe_norm | Distance correction. Compute average of each distance strata for expected values and return the observed/expected ratios | +------------------+----------------------------------------------------------------------------------------------------------------------------+ | kr_norm | KR normalization. Convert the contact matrix into a doubly-stochastic matrix using the KR algorithm | +------------------+----------------------------------------------------------------------------------------------------------------------------+ **Graph operations:** +------------------+----------------------------------------------------------------------------------------------------------------------------+ | Name | Description | +==================+============================================================================================================================+ | convolution | Perform simple neighbor averaging with a box filter | +------------------+----------------------------------------------------------------------------------------------------------------------------+ | random_walk | Perform random-walk imputation on the contact matrices | +------------------+----------------------------------------------------------------------------------------------------------------------------+ | network_enhance | Compute KNN transition matrix for each contact matrix | +------------------+----------------------------------------------------------------------------------------------------------------------------+ | google | Compute PageRank transition matrix for each contact matrix | +------------------+----------------------------------------------------------------------------------------------------------------------------+