pipeline_velocyto.py

Overview

This pipeline performs the following steps:

sort bam file by cell barcode
estimate intronic and exonic reads using velocyto (on selected barcodes)

Usage

See Installation and Usage on general information how to use CGAT pipelines.

Configuration

The pipeline requires a configured pipeline_velocity.yml file.

Default configuration files can be generated by executing:

python <srcdir>/pipeline_velocity.py config

Input files

The pipeline is run from bam files generated by cellranger count.

The pipeline expects a tsv file containing the path to each cellranger bam file (path) and the respective sample_id for each sample. In addition a list of barcodes is required, this could be the filtered barcodes from cellranger or a custom input (can be gzipped file). Any further metadata can be added to the file. The required columns are sample_id, barcodes and path.

Dependencies

This pipeline requires: * cgat-core: https://github.com/cgat-developers/cgat-core * samtools * veloctyo

Pipeline output

The pipeline returns: * a loom file with intronic and exonic reads for use in scvelo analysis

Code

cellhub.pipeline_velocyto.checkInputs(outfile): Check that input_samples.tsv exists and the path given in the file is a valid directorys.

cellhub.pipeline_velocyto.genClusterJobs(): Generate cluster jobs for each sample

cellhub.pipeline_velocyto.sortBam(infile, outfile): Sort bam file by cell barcodes

cellhub.pipeline_velocyto.runVelocyto(infile, outfile): Run velocyto on barcode-sorted bam file. This task writes a loom file into the pipeline-run directory for each sample.

cellhub.pipeline_velocyto.full(): Run the full pipeline.