cellhub
Cellhub provides an end-to-end scaleable workflow for the pre-processing, warehousing and analysis of data from millions of single-cells. It aims to bring together best practice solutions, including for read alignment (Cellranger), ambient RNA correction (CellBender), de-multiplexing and de-hashing (GMMDemux), cell type prediction (singleR) and cluster analysis (Scanpy) into a cohesive set of easy to use analysis pipelines. It relies on the cgat-core workflow management system to leverage the power of high-performance compute clusters for high-throughput parallel processing. Pipelines cross-talk via a defined API allowing for easy extension or modification of the workflow. Cells and their associated qc-statistics and metadata are indexed in a central SQLite database from which arbitrary subsets can be fetched for downstream analysis in anndata format. The clustering pipeline allows rapid evaluation of different pre-processing and integration strategies at different clustering resolutions, through generation of pdf reports and cellxgene objects.
Cellhub was originally developed to support the single-cell component of the University of Oxford’s COMBAT COVID-19 project.
Cellhub is currently alpha software. More detailed examples and tutorials will follow soon.
- Workflow Overview
- Installation
- Usage
- Examples
- Pipelines
- pipeline_adt_norm.py
gexdepth()gexdepthAPI()adtdepth()adtdepthAPI()adt_plot_norm()dsb_norm()dsbAPI()median_norm()medianAPI()clr_norm()clrAPI()plot()full()- pipeline_ambient_rna.py
ambient_rna_per_input()ambient_rna_compare()plot()full()- pipeline_annotation.py
fetchEnsembl()ensemblAPI()fetchKegg()keggAPI()- pipeline_cellbender.py
cellbender()h5API()mtx()mtxAPI()full()useCounts()- pipeline_celldb.py
connect()load_samples()load_gex_qcmetrics()load_gex_scrublet()load_singleR()load_gmm_demux()load_demuxEM()load_souporcell()final()- pipeline_cellranger.py
count()mtxAPI()h5API()tcr()registerTCR()mergeTCR()registerMergedTCR()bcr()registerBCR()mergeBCR()registerMergedBCR()full()useCounts()- pipeline_cell_qc.py
qcmetrics()qcmetricsAPI()scrublet()scrubletAPI()plot()full()- pipeline_cluster.py
taskSummary()preflight()metadata()loom()neighbourGraph()scanpyCluster()cluster()compareClusters()clustree()paga()UMAP()plotRdimsFactors()plotRdimsClusters()plotRdimsSingleR()plotSingleR()summariseSingleR()plotGroupNumbers()clusterStats()findMarkers()summariseMarkers()topMarkerHeatmap()dePlots()markerPlots()plotMarkerNumbers()markers()parseGMTs()genesetAnalysis()summariseGenesetAnalysis()genesets()plots()latexVars()summaryReportSource()summaryReport()markerReportSource()markerReport()export()report()cellxgene()- pipeline_dehash.py
gmmDemux()gmmAPI()hashCountCSV()demuxEM()parseDemuxEM()demuxemAPI()- pipeline_emptydrops.py
emptyDrops()meanReads()full()- pipeline_fetch_cells.py
fetchCells()fetchCounts()- pipeline_singleR.py
genSingleRjobs()singleR()concatenate()summary()singleRAPI()- pipeline_velocyto.py
checkInputs()genClusterJobs()sortBam()runVelocyto()full()
- cellhub.tasks
- Contributing