Find optimal interaction distance for your data =============================================== Distance Sweep: Mouse Embryo Dataset ------------------------------------ This is a dataset in which we need somehow consider long-range interactions. This can be done using random-walk based preprocessing, or by setting the maximum interaction distance to a sufficient range. We can test this by running a couple methods using the ``--distance_sweep`` option: .. code-block:: json { "embedding_algs": [ "scHiCluster" ], "dset": "embryo_mm10", "distance_sweep": true, "scool": "data/scools/embryo_mm10_1M.scool", "n_runs": 5 } Using the configuration above will run each method using a series of increasing maximum interaction distances. We start with using only the first strata, then increase to short/mid-range (<2Mb), long-range (>10Mb), eventually using either a maximum of 50Mb or the full contact matrices (if the method allows). The methods will be compared among themselves and across the different distance settings to determine if there is any general trend across interaction distances, or if one method in particular can capture the required short/long range differences. In the `results` directory, you will find the results of each individual run, as well as comparisons across clustering metrics (`accuracy_compare` by default). Looking at the `accuracy_distance` figure, we can see a clear trend of increasing accuracy as as include interactions beyond 10Mb: .. image:: ../_static/embryo_distance_accuracy.png :width: 360 And we can find a `_distance_umap` (similarly for PCA and t-SNE) for each method tested so we can see how each method is influenced by increasing interaction distance. For example, here is how the ``scHiCluster`` plots change as we include longer range interactions: .. image:: ../_static/embryo_distance_schicluster.png :width: 720