coalispr.bedgraph_analyze.collect_bedgraphs

Module for collecting data and configured experiment information.

Attributes

Functions

checkset(listofexps[, omit])

In order to prevent duplication make sure that all input will be unique.

label_frame()

Dataframe linking files by a filename-derived key to a short name.

get_categories_dict()

Returns dict for categories with SHORT names as keys.

checkSRCDIR(src_dir, tag)

Returns path to main folder with folders containing data files.

get_experiments([category, method, fraction, plusdiscards])

Returns list of short names based on properties of experiments.

collect_bedgraphs(tag[, src_dir, ndirlevels, category])

Find file paths to bedgraphs and return as list.

collect_references()

Find file paths to bedgraphs for reference data and return as list.

Module Contents

coalispr.bedgraph_analyze.collect_bedgraphs.logger
coalispr.bedgraph_analyze.collect_bedgraphs.checkset(listofexps, omit=None)

In order to prevent duplication make sure that all input will be unique.

Parameters:

listofexperiments (list) – A list of experimental samples.

Returns:

A non-redundant list of experimental samples.

Return type:

list

coalispr.bedgraph_analyze.collect_bedgraphs.label_frame()

Dataframe linking files by a filename-derived key to a short name.

This is the start-dataframe. It defines abbreviated names for display (SHORT), their CATEGORY, etc.. and is built from a tabbed text file EXPFILE containing all details. Try to correct for errors in EXPFILE that easily occur, like spaces by themselves as a value in a group column.

Returns:

Dataframe of EXPFILE with SHORT as index.

Return type:

pandas.DataFrame

coalispr.bedgraph_analyze.collect_bedgraphs.get_categories_dict()

Returns dict for categories with SHORT names as keys.

coalispr.bedgraph_analyze.collect_bedgraphs.checkSRCDIR(src_dir, tag)

Returns path to main folder with folders containing data files.

coalispr.bedgraph_analyze.collect_bedgraphs.get_experiments(category=None, method=None, fraction=None, plusdiscards=True)

Returns list of short names based on properties of experiments.

Parameters:
  • category (str or list) – Name or list of category item(s) as present in EXPFILE

  • method (str or list) – Name or list of method(s) as given in EXPFILE

  • fraction (str or list) – Name or list of fraction(s) as given in EXPFILE

  • plusdiscards (bool) – Flag to indicate whether to include samples marked as a discard.

Returns:

A list of SHORT names representing the samples/experiments requested.

Return type:

list

coalispr.bedgraph_analyze.collect_bedgraphs.collect_bedgraphs(tag, src_dir=SRCDIR, ndirlevels=SRCNDIRLEVEL, category=None)

Find file paths to bedgraphs and return as list.

Parameters:
  • tag (str (default: TAG))

  • src_dir (str (default: SRCDIR)) – File path as string to main folder with folders containing data files.

  • ndirlevels (int (default: SRCNDIRLEVEL)) – Number of folders in between SRCDIR and data files.

  • category (str or list) – Name or list of category item(s) as present in EXPFILE.

Returns:

A tuple of dictionaries with FILEKEY items and paths to separate bedgraph files for PLUS - and MINUS strand data.

Return type:

dict, dict

coalispr.bedgraph_analyze.collect_bedgraphs.collect_references()

Find file paths to bedgraphs for reference data and return as list.

Parameters:
  • tag (str (default: TAG))

  • refsdir (str (default: REFDIR)) – File path as string to main folder with folders containing reference data files.

  • ndirlevels (int (default: REFNDIRLEVEL)) – Number of folders in between SRCDIR and data files.

Returns:

A tuple of dictionaries with FILEKEY items and paths to separate bedgraph files for the reference with PLUS - and MINUS strand data.

Return type:

dict, dict