coalispr.bedgraph_analyze.compare

Module with functions used to compare bedgraph-data.

Attributes

Functions

get_indexes(df[, keep])

Provide indexes for specific and unspecific data sets.

specific(chrnam, tag, setlist[, maincut, keep])

Get merged dataframes with specific reads for a chromosome.

unspecific(chrnam, tag, setlist[, maincut, keep])

Get merged dataframes with unspecific reads for a chromosome.

gather_all_specific_regions(tag[, dowhat, cutoff, ...])

Get specific regions for a dataset.

gather_all_unspecific_regions(tag[, dowhat, cutoff, ...])

Get unspecific regions.

test_intervals([logs10, gaps, thresh])

Try out various settings for UNSPECLOG10, USEGAPS and LOG2BG.

Module Contents

coalispr.bedgraph_analyze.compare.encoding = 'utf-8'
coalispr.bedgraph_analyze.compare.logger
coalispr.bedgraph_analyze.compare.get_indexes(df, keep='exps')

Provide indexes for specific and unspecific data sets.

Parameters:
  • df (pandas.DataFrame) – The pandas dataframe from which read-indexes are retrieved.

  • keep (str (default: 'exps')) – Indicates the group of reads (experiments or references) for which indexes need to be returned.

Returns:

Tuple of pandas indexes for specific resp. unspecific reads present in input dataframe.

Return type:

pandas.Index, pandas.Index

coalispr.bedgraph_analyze.compare.specific(chrnam, tag, setlist, maincut=UNSPECLOG10, keep='exps')

Get merged dataframes with specific reads for a chromosome.

Returned are data for both chromosomes for a list of given samples.

Parameters:
  • chrnam (str) – Name of chromosome to return data for.

  • tag (str) – Type of reads, TAG (default) or TAGCOLL (‘collapsed’) or TAGUNCOLL (‘uncollapsed’).

  • setlist (list) – Group of samples for which data is returned.

  • maincut (float) – Exponent for log10-difference between specific and non-specific reads.

Returns:

List with two dataframes with specific reads from merged samples for PLUS resp. MINUS strands.

Return type:

list of pd.DataFrames

coalispr.bedgraph_analyze.compare.unspecific(chrnam, tag, setlist, maincut=UNSPECLOG10, keep='exps')

Get merged dataframes with unspecific reads for a chromosome.

Returned are data for both chromosomes for a list of given samples.

Parameters:
  • chrnam (str) – Name of chromosome to return data for.

  • tag (str) – Type of reads, TAG (default) or TAGCOLL (‘collapsed’) or TAGUNCOLL (‘uncollapsed’).

  • setlist (list) – Group of samples for which data is returned.

  • maincut (float) – Exponent for log10-difference between specific and non-specific reads.

Returns:

List of dataframes with unspecific reads from merged samples for PLUS resp. MINUS strands.

Return type:

list

coalispr.bedgraph_analyze.compare.gather_all_specific_regions(tag, dowhat='tsv', cutoff=LOG2BG, maincut=UNSPECLOG10, gaps=USEGAPS, plusdiscards=False)

Get specific regions for a dataset.

The output are (TSV) text files with tab-separated values defining regions with specified reads that help to speed up counting.

Parameters:
  • tag (str) – Type of reads, TAG (default) or TAGCOLL (‘collapsed’) or TAGUNCOLL (‘uncollapsed’).

  • dowhat (str) – Instruction how to output data; ‘tsv’: write tabbed separated values to a text file; ‘test’: print total number of regions found to test_intervals.tsv.

  • cutoff (float) – Threshold (2cutoff) for read-signals above which reads are considered (default: LOG2BG).

  • maincut (float) – Threshold (10^maincut) for difference between read-signals from aligned-reads in wild type or mutant samples and those of unspecific (negative control) samples above which reads are considered ‘specific’. (default: UNSPECLOG10).

  • gaps (int) – Length of tolerated sections without reads separating peak-regions that form a contiguous segment of specified reads (default: USEGAPS).

  • plusdiscards (bool) – If ‘False’ use all experimental files but omit CAT_D, the discards.

coalispr.bedgraph_analyze.compare.gather_all_unspecific_regions(tag, dowhat='tsv', cutoff=LOG2BG, maincut=UNSPECLOG10, gaps=UNSPCGAPS, plusdiscards=False)

Get unspecific regions.

Output are text files with tab-separated values (TSV) defining regions with unspecific reads for samples prepared with given method.

Parameters:
  • tag (str) – Type of reads, TAG (default) or TAGCOLL (‘collapsed’) or TAGUNCOLL (‘uncollapsed’).

  • dowhat (str) – Instruction how to output data; ‘tsv’: write tabbed separated values to a text file; ‘test’: print total number of regions found to test_intervals.tsv.

  • cutoff (float) – Threshold (2cutoff) for read-signals above which reads are considered (default: LOG2BG).

  • maincut (float) – Threshold (10maincut) for difference between read-signals from aligned-reads in wild type or mutant samples and those of unspecific (negative control) samples above which reads are considered ‘specific’. (default: UNSPECLOG10).

  • gaps (int) – Length of tolerated sections without reads separating peak-regions that form a contiguous segment of specified reads; for UNSPECIFIC reads the gap is set to UNSPCGAPS (best as low as BINSTEP to keep peaks tight). (default: UNSPCGAPS).

  • plusdiscards (bool) – If ‘False’ use all experimental files but omit CAT_D, the discards.

coalispr.bedgraph_analyze.compare.test_intervals(logs10=None, gaps=None, thresh=None)

Try out various settings for UNSPECLOG10, USEGAPS and LOG2BG.

Output are TSV text files with tab-separated values for TAG, ‘KIND’, UNSPECLOG10, LOG2BG, USEGAPS, TRSHLD in relation to the number of independent regions (REGS) of specified reads that are picked up with combinations of these settings. Produces input for show_regions_vs_settings.

Parameters:
  • logs10 (list) – List of possible settings for UNSPECLOG10, set to UNSPECTST.

  • gaps (list) – List of possible settings for USEGAPS, set to UGAPSTST.

  • thresh (tuple) – Defines the range for checking LOG2BG values: (start, end, step), set to LOG2BGTST.