pipeline_singleR.py

Overview

This pipeline runs singleR for cell prediction. Single R:

  1. runs at cell level (cells are scored independently)

  2. Uses a non-paramentric correlation test (i.e. monotonic transformations of the test data have no effect).

Given these facts, in cellhub we run singleR on the raw counts upstream to (a) help with cell QC and (b) save time in the interpretation phase.

This pipeline operates on the ensembl_ids.

Usage

See Installation and Usage on general information how to use CGAT pipelines.

Configuration

The pipeline should be run in the cellhub directory.

To obtain a configuration file run “cellhub singleR config”.

Inputs

  1. Per-sample market matrix files (from the cellhub API).

  2. References for singleR obtained via the R bioconductor ‘celldex’ library. As downloading of the references is very slow, they need to be manually downloaded and “stashed” as rds files in an appropriate location using the R/scripts/singleR_stash_references.R scripts. This location is then specified in the yaml file.

Pipeline output

The pipeline saves the singleR scores and predictions for each of the specified references on the cellhub API.

Code

cellhub.pipeline_singleR.genSingleRjobs()

generate the singleR jobs

cellhub.pipeline_singleR.singleR(infile, outfile)

Perform cell identity prediction with singleR.

cellhub.pipeline_singleR.concatenate(infile, outfile)

Concatenate the label predictions across all the samples.

cellhub.pipeline_singleR.summary(infile, outfile)

Make a summary table that can be included in the cell metadata packages.

cellhub.pipeline_singleR.singleRAPI(infiles, outfile)

Add the singleR results to the cellhub API.