Find optimal interaction distance for your data

Distance Sweep: Mouse Embryo Dataset

This is a dataset in which we need somehow consider long-range interactions. This can be done using random-walk based preprocessing, or by setting the maximum interaction distance to a sufficient range. We can test this by running a couple methods using the --distance_sweep option:

{
   "embedding_algs": [
      "scHiCluster"
   ],
   "dset": "embryo_mm10",
   "distance_sweep": true,
   "scool": "data/scools/embryo_mm10_1M.scool",
   "n_runs": 5
}

Using the configuration above will run each method using a series of increasing maximum interaction distances. We start with using only the first strata, then increase to short/mid-range (<2Mb), long-range (>10Mb), eventually using either a maximum of 50Mb or the full contact matrices (if the method allows). The methods will be compared among themselves and across the different distance settings to determine if there is any general trend across interaction distances, or if one method in particular can capture the required short/long range differences.

In the results directory, you will find the results of each individual run, as well as comparisons across clustering metrics (accuracy_compare by default). Looking at the accuracy_distance figure, we can see a clear trend of increasing accuracy as as include interactions beyond 10Mb:

../_images/embryo_distance_accuracy.png

And we can find a <method>_distance_umap (similarly for PCA and t-SNE) for each method tested so we can see how each method is influenced by increasing interaction distance. For example, here is how the scHiCluster plots change as we include longer range interactions:

../_images/embryo_distance_schicluster.png