Prepare your data ================= Required arguments ------------------ ``scloop`` requires two arguments to run: 1. a single-cell Hi-C dataset (either ``.scool`` file or directory of ``bin-to-bin`` files) 2. a reference file. That's it! If you are following along with the provided example datasets you can skip to the next section. Otherwise, the following sections will help you prepare your own data. Reference file format --------------------- The reference file is a tab-separated file. The only required column is a ``cell`` column containing the cell file names. However, you can also provide additional metadata columns to be used for plotting and analysis. For details, see the `metadata` section. `oocyte_zygote_ref`: +------------------------------------------------+--------+---------+ | cell | depth |celltype | +================================================+========+=========+ | anchor_loop.55_oocyte_NSN | 130771 | oocyte | +------------------------------------------------+--------+---------+ | anchor_loop.65_pronucleus-w-o_nucl_extr-female | 155306 | ZygM | +------------------------------------------------+--------+---------+ | ... | ... | ... | +------------------------------------------------+--------+---------+ The dataset can be provided either as a single ``.scool`` file, or as a directory containing the `bin-to-bin` files and an additional anchor/bin reference file. We recommend working with ``.scool`` files as it is faster and more memory efficient. Convert your data to .scool format ---------------------------------- You can convert any `bin-to-bin` dataset to ``.scool`` format using the ``scloop cooler`` command. Each cell file should be a tab-separated file with columns ``bin1``, ``bin2``, and ``count``. The anchor reference file should be a tab-separated file with columns ``chr``, ``start``, ``end``, and ``bin_id``. .. code-block:: bash scloop cooler --dset oocyte_zygote_mm10 \ # name of dataset, used for output filename --data_dir data/example_datasets/oocyte_zygote_mm10/1M \ # path to dataset directory --anchor_file data/mm10.genome_split_1M \ # path to anchor reference file --reference data/oocyte_zygote_ref \ # reference file containing filenames and metadata --assembly mm10 \ # optional to store assembly metadata in cooler files, required by some embedding methods `mm10.genome_split_1M`: +------+---------+---------+---+ | chr1 | 1 | 1000001 | 0 | +------+---------+---------+---+ | chr1 | 1000001 | 2000001 | 1 | +------+---------+---------+---+ | chr1 | 2000001 | 3000001 | 2 | +------+---------+---------+---+ | ... | ... | ... |...| +------+---------+---------+---+ Now you are ready to run your first embedding tests.