setup.py

A parent class to help setup pipeline tasks. It can be extended to meet the needs of the different pipelines. The class is used to obtain a task object that:

  • defines job resource requirements

  • provides access to variables (by name or via a .var dictionary)

  • creates an outfolder based on the outfile name

class cellhub.tasks.setup.setup(infile, outfile, PARAMS, memory='4G', cpu=1, make_outdir=True, expose_var=True)

Bases: object

A class for routine setup of pipeline tasks.

Parameters:
  • infile – The task infile path or None

  • outfile – The task outfile path (typically ends with “.sentinel”)

  • memory – The total memory needed for execution of the task. If no unit is given, gigabytes are assumed. Recognised units are “M” for megabyte and “G” for gigabytes. 4 gigabytes can be requested by passing “4”, “4GB” or “4096M”. Default = “4GB”.

  • cpu – The number of cpu cores required (used to populate job_threads)

  • make_outdir – True|False. Default = True.

  • expose_var – True|False. Should the self.var dictionary be created from self.__dict__. Default = True.

job_threads

The number of threads that will be requested

job_memory

The amount of memory that will be requested per thread

resources

A dictionary with keys “job_threads” and “job_memory” for populating the P.run() kwargs, e.g. P.run(statement, **t.resources)

outname

The os.path.basename of outfile

outdir

The os.path.dirname of outfile

indir

If an infile path is given, the os.path.dirname of the infile.

inname

If an infile path is given, the os.path.basename of the infile.

log_file

If the outfile path ends with “.sentinel”

parse_mem(memory)

Return an integer that represents the amount of memory needed by the task in gigabytes.

set_resources(PARAMS, memory='4G', cpu=1)

calculate the resource requirements and return a dictionary that can be used to update the local variables