Find optimal resolution for your data 
=====================================

Resolution Sweep: Human PFC
---------------------------

In this example, we explore a dataset which contains heterogeneity between celltypes which is only present when cells are represented at high enough resolution (<500kb bins).
Including the argument ``--resolution_sweep`` will start from your initial resolution and utilize the ``Cooler.coarsen_cooler`` function to bin the data to successive lower resolutions.
Starting from ~200kb resolution, this sweep will cover 200kb, 400kb, 600kb, 800kb, 1Mb, 2Mb, 3Mb, and 4Mb.

.. code-block:: json

   {
      "embedding_algs": [
         "InnerProduct",
         "fastHiCRep",
         "1d_pca"
      ],
      "dset": "pfc",
      "resolution_sweep": true,
      "scool": "data/scools/pfc_200kb.scool",
      "n_runs": 5
   }

Since we have ground-truth celltype labels to compare with in this dataset, the main plot of interest is either the clustering accuracy vs. resolution plot or the per-resolution effect size plot:

.. image:: ../_static/pfc_res_accuracy.png
  :width: 480

.. image:: ../_static/pfc_res_effect_size.png
  :width: 540

The clustering accuracy plot shows diminishing performance as we decrease the resolution, and we can inspect the embedding visualizations to determine the culprits:
The best embedding in this sweep was achieved by ``InnerProduct`` at 200kb:

.. image:: ../_static/pfc_innerproduct.png
  :width: 720

Already we can see that L2/3, L4, L5, and L6 are highly self-similar along with with Vip, Pvalb, Sst and Ndnf.
We can inspect the `innerproduct_resolution_tsne` figure and see that as we increase resolution, it is these groups of cell types which become harder to separate:

.. image:: ../_static/pfc_res_tsne.png
  :width: 720

This is because the L2/3, L4, L5, and L6 cell types are all very similar in terms of their large-scale domain organization, but at higher resolution we can identify short-range intra-domain differences which help distinguish them.