coalispr.bedgraph_analyze.collect_bedgraphs

Module for collecting data and configured experiment information.

Attributes

Functions

label_frame()

Dataframe linking files by a filename-derived key to a short name.

get_categories_dict()

Returns dict for categories with SHORT names as keys.

checkSRCDIR(src_dir, tag)

Returns path to main folder with folders containing data files.

get_experiments([category, method, fraction, plusdiscards])

Returns list of short names based on properties of experiments.

collect_bedgraphs(tag[, src_dir, ndirlevels, category])

Find file paths to bedgraphs and return as list.

collect_references()

Find file paths to bedgraphs for reference data and return as list.

Module Contents

coalispr.bedgraph_analyze.collect_bedgraphs.logger
coalispr.bedgraph_analyze.collect_bedgraphs.label_frame()

Dataframe linking files by a filename-derived key to a short name.

This is the start-dataframe. It defines abbreviated names for display (SHORT), their CATEGORY, etc.. and is built from a tabbed text file EXPFILE containing all details. Try to correct for errors in EXPFILE that easily occur, like spaces by themselves as a value in a group column.

Returns:

Dataframe of EXPFILE with SHORT as index.

Return type:

pandas.DataFrame

coalispr.bedgraph_analyze.collect_bedgraphs.get_categories_dict()

Returns dict for categories with SHORT names as keys.

coalispr.bedgraph_analyze.collect_bedgraphs.checkSRCDIR(src_dir, tag)

Returns path to main folder with folders containing data files.

src_dir: str

Folder name (SRCFLDR) to create Path for.

tag: str

Flag to indicate kind of data (TAGUNCOLL or TAGCOLL), can be unset (None) for references; required to find data files.

coalispr.bedgraph_analyze.collect_bedgraphs.get_experiments(category=None, method=None, fraction=None, plusdiscards=True)

Returns list of short names based on properties of experiments.

Parameters:
  • category (str or list) – Name or list of category item(s) as present in EXPFILE

  • method (str or list) – Name or list of method(s) as given in EXPFILE

  • fraction (str or list) – Name or list of fraction(s) as given in EXPFILE

  • plusdiscards (bool) – Flag to indicate whether to include samples marked as a discard.

Returns:

A list of SHORT names representing the samples/experiments requested.

Return type:

list

coalispr.bedgraph_analyze.collect_bedgraphs.collect_bedgraphs(tag, src_dir=SRCFLDR, ndirlevels=SRCNDIRLEVEL, category=None)

Find file paths to bedgraphs and return as list.

Parameters:
  • tag (str (default: TAGUNCOLL)) – Flag to indicate kind of aligned-reads, TAGUNCOLL or TAGCOLL .

  • src_dir (str (default: SRCFLDR)) – File path + TAGCOLL, or + TAGUNCOLL, as string to main folder with folders containing data files.

  • ndirlevels (int (default: SRCNDIRLEVEL)) – Number of folders in between SRCFLDR and data files.

  • category (str or list) – Name or list of category item(s) as present in EXPFILE.

Returns:

A tuple of dictionaries with FILEKEY items and paths to separate bedgraph files for PLUS - and MINUS strand data.

Return type:

dict, dict

coalispr.bedgraph_analyze.collect_bedgraphs.collect_references()

Find file paths to bedgraphs for reference data and return as list.

Returns:

A tuple of dictionaries with FILEKEY items and paths to separate bedgraph files for the reference with PLUS - and MINUS strand data.

Return type:

dict, dict