pipeline_adt_norm.py

Overview

This pipeline implements three normalization methods:

Configuration

The pipeline requires a configured pipeline_adt_norm.yml file. Default configuration files can be generated by executing:

python <srcdir>/pipeline_adt_norm.py config

Input files

This pipeline requires the unfiltered gene-expression and ADT count matrices and a list of high quality barcodes most likely representing single-cells.

This means that ideally this pipeline is run after high quality cells are selected via the pipeline_fetch_cells.py.

This pipeline will look for the unfiltered matrix in the api:

./api/cellranger.multi/ADT/unfiltered//mtx/.gz

./api/cellranger.multi/GEX/unfiltered//mtx/.gz

Dependencies

This pipeline requires: * cgat-core: https://github.com/cgat-developers/cgat-core * R dependencies required in the r scripts

Pipeline output

The pipeline returns a adt_norm.dir folder containing one folder per methodology adt_dsb.dir, adt_median.dir, and adt_clr.dir with per-sample folders conatining market matrices [features, qc-barcodes] with the normalized values.

Code

cellhub.pipeline_adt_norm.gexdepth(infile, outfile): This task will run R/adt_calculate_depth_dist.R, It will describe the GEX UMI distribution of the background and cell-containing barcodes. This will help to assess the quality of the ADT data and will inform about the definition of the background barcodes.

cellhub.pipeline_adt_norm.gexdepthAPI(infiles, outfile): Add the umi depth metrics results to the API

cellhub.pipeline_adt_norm.adtdepth(infile, outfile): This task will run R/adt_calculate_depth_dist.R, It will describe the ADT UMI distribution of the background and cell-containing barcodes. This will help to assess the quality of the ADT data and will inform the definition of the background barcodes.

cellhub.pipeline_adt_norm.adtdepthAPI(infiles, outfile): Add the umi depth metrics results to the API

cellhub.pipeline_adt_norm.adt_plot_norm(infile, outfile): This task will run R/adt_plot_norm.R, It will create a visual report on the cell vs background dataset split and, if the user provided GEX and ADT UMI thresholds, those will be included.

cellhub.pipeline_adt_norm.dsb_norm(infile, outfile): This task runs R/adt_normalize.R. It reads the unfiltered ADT count matrix and calculates DSB normalized ADT expression matrix which is then saved like market matrices per sample.

cellhub.pipeline_adt_norm.dsbAPI(infile, outfile): Register the ADT normalized mtx files on the API endpoint

cellhub.pipeline_adt_norm.median_norm(infile, outfile): This task runs R/adt_get_median_normalization.R, It reads the filtered ADT count matrix and performed median-based normalization. Calculates median-based normalized ADT expression matrix and writes market matrices per sample.

cellhub.pipeline_adt_norm.medianAPI(infile, outfile): Register the ADT normalized mtx files on the API endpoint

cellhub.pipeline_adt_norm.clr_norm(infile, outfile): This task runs R/get_median_clr_normalization.R, It reads the filtered ADT count matrix and performes CLR normalization. Writes market matrices per sample.

cellhub.pipeline_adt_norm.clrAPI(infile, outfile): Register the CLR-normalized ADT mtx files on the API endpoint

cellhub.pipeline_adt_norm.plot(infile, outfile): Draw the pipeline flowchart

cellhub.pipeline_adt_norm.full(): Run the full pipeline.