Prepare your data
Required arguments
scloop
requires two arguments to run:
a single-cell Hi-C dataset (either
.scool
file or directory ofbin-to-bin
files)a reference file.
That’s it! If you are following along with the provided example datasets you can skip to the next section. Otherwise, the following sections will help you prepare your own data.
Reference file format
The reference file is a tab-separated file.
The only required column is a cell
column containing the cell file names.
However, you can also provide additional metadata columns to be used for plotting and analysis.
For details, see the metadata section.
oocyte_zygote_ref:
cell |
depth |
celltype |
---|---|---|
anchor_loop.55_oocyte_NSN |
130771 |
oocyte |
anchor_loop.65_pronucleus-w-o_nucl_extr-female |
155306 |
ZygM |
… |
… |
… |
The dataset can be provided either as a single .scool
file, or as a directory containing the bin-to-bin files and an additional anchor/bin reference file.
We recommend working with .scool
files as it is faster and more memory efficient.
Convert your data to .scool format
You can convert any bin-to-bin dataset to .scool
format using the scloop cooler
command.
Each cell file should be a tab-separated file with columns bin1
, bin2
, and count
.
The anchor reference file should be a tab-separated file with columns chr
, start
, end
, and bin_id
.
scloop cooler --dset oocyte_zygote_mm10 \ # name of dataset, used for output filename
--data_dir data/example_datasets/oocyte_zygote_mm10/1M \ # path to dataset directory
--anchor_file data/mm10.genome_split_1M \ # path to anchor reference file
--reference data/oocyte_zygote_ref \ # reference file containing filenames and metadata
--assembly mm10 \ # optional to store assembly metadata in cooler files, required by some embedding methods
mm10.genome_split_1M:
chr1 |
1 |
1000001 |
0 |
chr1 |
1000001 |
2000001 |
1 |
chr1 |
2000001 |
3000001 |
2 |
… |
… |
… |
… |
Now you are ready to run your first embedding tests.