pipeline_fetch_cells.py
Overview
This pipeline fetches a given set of cells from market matrices or loom files into a single market matrix file.
Usage
See Installation and Usage on general information how to use CGAT pipelines.
Configuration
It is recommended to fetch the cells into a new directory. Fetching of multiple datasets per-directory is (deliberately) not supported.
The pipeline requires a configured pipeline_fetch_cells.yml
file.
Default configuration files can be generated by executing:
python <srcdir>/pipeline_fetch_cells.py config
Inputs
The pipeline will fetch cells from a cellhub instance according to the parameters specified in the local pipeline_fetch_cell.yml file.
The location of the cellhub instances must be specificed in the yml:
cellhub:
location: /path/to/cellhub/instance
The specifications of the cells to retrieve must be provided as an SQL statement (query) that will be executed against the “final” table of the cellhub database:
cellhub:
sql_query: >-
select * from final
where pct_mitochondrial < 10
and ngenes > 200;
The cells will then be automatically retrieved from the API.
Dependencies
This pipeline requires:
Pipeline output
The pipeline outputs a folder containing a single market matrix that contains the requested cells.
- cellhub.pipeline_fetch_cells.fetchCells(infile, outfile)
Fetch the table of the user’s desired cells from the database effectively, cell-metadata tsv table.
- cellhub.pipeline_fetch_cells.GEX(infile, outfile)
Extract the target cells into a single anndata. Note that this currently contains all the modalities
TODO: support down-sampling
- cellhub.pipeline_fetch_cells.ADT(infile, outfile)
Extract the target cells into a single anndata. Note that this currently contains all the modalities
TODO: support down-sampling