pipeline_annotation.py
Overview
This pipeline retrieves annotation from Ensembl
Usage
The annotation pipeline should be run in the cellhub directory.
Configuration
The pipeline requires a configured pipeline_cluster.yml
file.
Default configuration files can be generated by executing:
python <srcdir>/pipeline_annotation.py config
The ensembl version specified in the yaml file should match that used to build the reference transcriptome for the mapping algorithm (e.g. Cellranger)
Inputs
This pipeline has no inputs.
Dependencies
This pipeline requires:
Pipeline output
The pipeline produces the following outputs:
api/annotation/ensembl/ensembl.to.entrez.tsv.gz
A mapping of ensembl_id to gene_name and entrez_id. Used by gsfisher for pathway analysis.
api/annotation/ensembl/ensembl.gene_name.map.tsv.gz
A unique mapping of ensembl_id -> gene_name. Missing gene names are replaced with ensembl_ids. The gene names have been made unique.
api/annotation/kegg/kegg_pathways.rds
Kegg pathways in rds format for gsfisher.
- cellhub.pipeline_annotation.fetchEnsembl(infile, outfile)
Fetch the ensembl annotations from BioMart. This task requires internet access.
- cellhub.pipeline_annotation.ensemblAPI(infile, outfile)
Add the Ensembl gene annotation results to the cellhub API.
- cellhub.pipeline_annotation.fetchKegg(infile, outfile)
Fetch the Kegg pathway annotations. This task requires internet access.
- cellhub.pipeline_annotation.keggAPI(infile, outfile)
Add the kegg pathways to the cellhub API