cellhub
Cellhub provides an end-to-end scaleable workflow for the pre-processing, warehousing and analysis of data from millions of single-cells. It aims to bring together best practice solutions, including for read alignment (Cellranger), ambient RNA correction (CellBender), de-multiplexing and de-hashing (GMMDemux), cell type prediction (singleR) and cluster analysis (Scanpy) into a cohesive set of easy to use analysis pipelines. It relies on the cgat-core workflow management system to leverage the power of high-performance compute clusters for high-throughput parallel processing. Pipelines cross-talk via a defined API allowing for easy extension or modification of the workflow. Cells and their associated qc-statistics and metadata are indexed in a central SQLite database from which arbitrary subsets can be fetched for downstream analysis in anndata format. The clustering pipeline allows rapid evaluation of different pre-processing and integration strategies at different clustering resolutions, through generation of pdf reports and cellxgene objects.
Cellhub was originally developed to support the single-cell component of the University of Oxford’s COMBAT COVID-19 project.
Cellhub is currently alpha software. More detailed examples and tutorials will follow soon.
- Workflow Overview
- Installation
- Usage
- Examples
- Pipelines
- pipeline_adt_norm.py
gexdepth()
gexdepthAPI()
adtdepth()
adtdepthAPI()
adt_plot_norm()
dsb_norm()
dsbAPI()
median_norm()
medianAPI()
clr_norm()
clrAPI()
plot()
full()
- pipeline_ambient_rna.py
ambient_rna_per_input()
ambient_rna_compare()
plot()
full()
- pipeline_annotation.py
fetchEnsembl()
ensemblAPI()
fetchKegg()
keggAPI()
- pipeline_cellbender.py
cellbender()
h5API()
mtx()
mtxAPI()
full()
useCounts()
- pipeline_celldb.py
connect()
load_samples()
load_gex_qcmetrics()
load_gex_scrublet()
load_singleR()
load_gmm_demux()
load_demuxEM()
final()
- pipeline_cellranger_multi.py
taskSummary()
config()
cellrangerMulti()
mtxAPI()
h5API()
postProcessVDJ()
vdjAPI()
full()
useCounts()
- pipeline_cell_qc.py
qcmetrics()
qcmetricsAPI()
scrublet()
scrubletAPI()
plot()
full()
- pipeline_cluster.py
taskSummary()
preflight()
metadata()
loom()
neighbourGraph()
scanpyCluster()
cluster()
compareClusters()
clustree()
paga()
UMAP()
plotRdimsFactors()
plotRdimsClusters()
plotRdimsSingleR()
plotSingleR()
summariseSingleR()
plotGroupNumbers()
clusterStats()
findMarkers()
summariseMarkers()
topMarkerHeatmap()
dePlots()
markerPlots()
plotMarkerNumbers()
markers()
parseGMTs()
genesetAnalysis()
summariseGenesetAnalysis()
genesets()
plots()
latexVars()
summaryReportSource()
summaryReport()
markerReportSource()
markerReport()
export()
report()
cellxgene()
- pipeline_dehash.py
gmmDemux()
gmmAPI()
hashCountCSV()
demuxEM()
parseDemuxEM()
demuxemAPI()
- pipeline_emptydrops.py
emptyDrops()
meanReads()
full()
- pipeline_fetch_cells.py
fetchCells()
GEX()
ADT()
- pipeline_singleR.py
genSingleRjobs()
singleR()
concatenate()
summary()
singleRAPI()
- pipeline_velocyto.py
checkInputs()
genClusterJobs()
sortBam()
runVelocyto()
full()
- cellhub.tasks
- Contributing