pipeline_fetch_cells.py

Overview

This pipeline fetches a given set of cells from market matrices or loom files into a single market matrix file.

Usage

See Installation and Usage on general information how to use CGAT pipelines.

Configuration

It is recommended to fetch the cells into a new directory. Fetching of multiple datasets per-directory is (deliberately) not supported.

The pipeline requires a configured pipeline_fetch_cells.yml file.

Default configuration files can be generated by executing:

python <srcdir>/pipeline_fetch_cells.py config

Inputs

The pipeline will fetch cells from a cellhub instance according to the parameters specified in the local pipeline_fetch_cell.yml file.

The location of the cellhub instances must be specificed in the yml:

cellhub:
    location: /path/to/cellhub/instance

The specifications of the cells to retrieve must be provided as an SQL statement (query) that will be executed against the “final” table of the cellhub database:

cellhub:
    sql_query: >-
        select * from final
        where pct_mitochondrial < 10
        and ngenes > 200;

The cells will then be automatically retrieved from the API.

Dependencies

This pipeline requires:

Pipeline output

The pipeline outputs a folder containing a single market matrix that contains the requested cells.

cellhub.pipeline_fetch_cells.fetchCells(infile, outfile): Fetch the table of the user’s desired cells from the database effectively, cell-metadata tsv table.

cellhub.pipeline_fetch_cells.fetchCounts(infile, outfile)

Extract the target cells into a single anndata. Note that this currently contains all the modalities

TODO: support down-sampling