Prepare your data

Required arguments

scloop requires two arguments to run:

  1. a single-cell Hi-C dataset (either .scool file or directory of bin-to-bin files)

  2. a reference file.

That’s it! If you are following along with the provided example datasets you can skip to the next section. Otherwise, the following sections will help you prepare your own data.

Reference file format

The reference file is a tab-separated file. The only required column is a cell column containing the cell file names. However, you can also provide additional metadata columns to be used for plotting and analysis. For details, see the metadata section.

oocyte_zygote_ref:

cell

depth

celltype

anchor_loop.55_oocyte_NSN

130771

oocyte

anchor_loop.65_pronucleus-w-o_nucl_extr-female

155306

ZygM

The dataset can be provided either as a single .scool file, or as a directory containing the bin-to-bin files and an additional anchor/bin reference file. We recommend working with .scool files as it is faster and more memory efficient.

Convert your data to .scool format

You can convert any bin-to-bin dataset to .scool format using the scloop cooler command.

Each cell file should be a tab-separated file with columns bin1, bin2, and count. The anchor reference file should be a tab-separated file with columns chr, start, end, and bin_id.

scloop cooler --dset oocyte_zygote_mm10 \  # name of dataset, used for output filename
              --data_dir data/example_datasets/oocyte_zygote_mm10/1M \  # path to dataset directory
              --anchor_file data/mm10.genome_split_1M \  # path to anchor reference file
              --reference data/oocyte_zygote_ref \  # reference file containing filenames and metadata
              --assembly mm10 \ # optional to store assembly metadata in cooler files, required by some embedding methods

mm10.genome_split_1M:

chr1

1

1000001

0

chr1

1000001

2000001

1

chr1

2000001

3000001

2

Now you are ready to run your first embedding tests.