setup.py
A parent class to help setup pipeline tasks. It can be extended to meet the needs of the different pipelines. The class is used to obtain a task object that:
defines job resource requirements
provides access to variables (by name or via a .var dictionary)
creates an outfolder based on the outfile name
- class cellhub.tasks.setup.setup(infile, outfile, PARAMS, memory='4G', cpu=1, make_outdir=True, expose_var=True)
Bases:
object
A class for routine setup of pipeline tasks.
- Parameters:
infile – The task infile path or None
outfile – The task outfile path (typically ends with “.sentinel”)
memory – The total memory needed for execution of the task. If no unit is given, gigabytes are assumed. Recognised units are “M” for megabyte and “G” for gigabytes. 4 gigabytes can be requested by passing “4”, “4GB” or “4096M”. Default = “4GB”.
cpu – The number of cpu cores required (used to populate job_threads)
make_outdir – True|False. Default = True.
expose_var – True|False. Should the self.var dictionary be created from self.__dict__. Default = True.
- job_threads
The number of threads that will be requested
- job_memory
The amount of memory that will be requested per thread
- resources
A dictionary with keys “job_threads” and “job_memory” for populating the P.run() kwargs, e.g.
P.run(statement, **t.resources)
- outname
The os.path.basename of outfile
- outdir
The os.path.dirname of outfile
- indir
If an infile path is given, the os.path.dirname of the infile.
- inname
If an infile path is given, the os.path.basename of the infile.
- log_file
If the outfile path ends with “.sentinel”
- parse_mem(memory)
Return an integer that represents the amount of memory needed by the task in gigabytes.
- set_resources(PARAMS, memory='4G', cpu=1)
calculate the resource requirements and return a dictionary that can be used to update the local variables