Specifying Embedding Methods and Preprocessing Steps
The main functionality of scloop is to provide a unified interface for embedding and clustering methods with a common set of preprocessing steps. The embedding_algs argument is a list of strings which specify the embedding methods and preprocessing steps to run. The embedding methods are specified by the name of the method followed by a + and then a comma-separated list of preprocessing steps.
List of Embedding Methods
Baselines:
Name |
Description |
---|---|
1d_pca |
1-dimensional PCA baseline. Takes each contact matrix or preprocessed matrix and aggregates over rows to produce a 1D vector for each cell. Embed using PCA. |
1d_lsi |
1-dimensional LSI baseline. Same as 1D PCA baseline, but embed using LSI instead |
2d_pca |
PCA baseline. Takes each contact matrix or preprocessed matrix and unravels the specified number of strata into a single vector representation for each cell. Embed using PCA |
2d_lsi |
LSI baseline. Same as PCA baseline but embed using LSI instead |
scHiCluster |
scHiCluster method for embedding scHi-C based on VCSQRT normalization, convolution, and random-walk imputation prior to PCA embedding. By default this method will run with these preprocessing steps unless otherwise specified. |
fastHiCRep |
Similarity-based method which relies on the HiCRep stratum-adjusted correlation coefficient (SCC) as a distance metric for MDS embedding |
InnerProduct |
Generalization of fastHiCRep which ignores distance in correlation computation and simply computes cosine similarities of each strata vector |
cisTopic |
Convert dataset into a set of discrete locus-pair occurences (bag-of-words representation) and run Latent Dirichlet Allocation for topic modeling |
Conventional scRNA-seq/scATAC-seq methods:
Name |
Description |
---|---|
scVI |
Aggregate each contact matrix into a 1D vector and treat it like a transcription vector. Embed using default scVI model |
scVI_2d |
Unravel each contact matrix into a 1D vector and embed using default scVI model |
peakvi |
Aggregate each contact matrix into a 1D vector and treat like a binary peak vector. Embed using PeakVI |
peakvi_2d |
Unravel each contact matrix into a 1D vector and embed using PeakVI |
Deep learning methods:
Name |
Description |
---|---|
Higashi |
Represent entire dataset as a hypergraph and learn cell node embeddings by training a hypergraph neural network |
Fast-Higashi |
Higashi model based on tensor decomposition rather than training a hypergraph neural network |
3DVI |
Train an scVI model on each strata independently and concanenate to obtain final cell embeddings |
VaDE |
Variational deep embedding model, trains a VAE with Gaussian mixture prior (if number of clusters is specified). Similar to 3DVI but embeds entire matrix instead of independent strata |
Biological feature representations:
Name |
Description |
---|---|
deTOKI/deDOC |
Identify TADs and domain boundaries in each contact matrix and embed using PCA of domain density vectors |
InsScore |
Identify domain boundaries by computing Insulation Score over sliding window and embed using PCA of domain density vectors |
scGAD |
Map each cell to a gene score vector based on a set of known loci. Scores represent z-scores from the BandNorm normalization method |
List of Preprocessing Steps
Filtering operations:
Name |
Description |
---|---|
quantile_<q> |
Filter low values based on a quantile cutoff specified by |
min_count_<N> |
Filter values with count lower than |
Normalization operations:
Name |
Description |
---|---|
vc_sqrt_norm |
Vanilla square-root coverage correction. Normalize by the square-root of the row-sums and column sums. |
oe_norm |
Distance correction. Compute average of each distance strata for expected values and return the observed/expected ratios |
kr_norm |
KR normalization. Convert the contact matrix into a doubly-stochastic matrix using the KR algorithm |
Graph operations:
Name |
Description |
---|---|
convolution |
Perform simple neighbor averaging with a box filter |
random_walk |
Perform random-walk imputation on the contact matrices |
network_enhance |
Compute KNN transition matrix for each contact matrix |
Compute PageRank transition matrix for each contact matrix |