Cellhub provides an end-to-end scaleable workflow for the pre-processing, warehousing and analysis of data from millions of single-cells. It aims to bring together best practice solutions, including for read alignment (Cellranger), ambient RNA correction (CellBender), de-multiplexing and de-hashing (GMMDemux), cell type prediction (singleR) and cluster analysis (Scanpy) into a cohesive set of easy to use analysis pipelines. It relies on the cgat-core workflow management system to leverage the power of high-performance compute clusters for high-throughput parallel processing. Pipelines cross-talk via a defined API allowing for easy extension or modification of the workflow. Cells and their associated qc-statistics and metadata are indexed in a central SQLite database from which arbitrary subsets can be fetched for downstream analysis in anndata format. The clustering pipeline allows rapid evaluation of different pre-processing and integration strategies at different clustering resolutions, through generation of pdf reports and cellxgene objects.

Cellhub was originally developed to support the single-cell component of the University of Oxford’s COMBAT COVID-19 project.

Cellhub is currently alpha software. More detailed examples and tutorials will follow soon.

Indices and tables