coalispr.count_analyze.load_countfile

Module to obtain dataframes from count files.

Attributes

Functions

load_count_table(name[, use, bam, segments, overmax, ...])

Retrieve count tables with given keywords in the folder/filename.

load_lengths_table_perc(name, use, debug)

Retrieve lengthcounts from coalispr.bedgraph_analyze.process_bamdata

load_bin_table_perc(name, use, debug)

Retrieve bin-counts from coalispr.bedgraph_analyze.process_bamdata

get_freq_frame(name, use, debug[, idx, allc])

Get frame with distribution for index idx and column allc.

split_readlengths_index(df)

get_readlengths_frame(name, use, debug)

Add columns PLOTSTRT, PLOTLEN to frame with read lengths.

Module Contents

coalispr.count_analyze.load_countfile.logger
coalispr.count_analyze.load_countfile.load_count_table(name, use=SPECIFIC, bam=TAGCOLL, segments=TAGUNCOLL, overmax=LOG2BG, maincut=UNSPECLOG10, usegaps=USEGAPS, index_col=0, debug=0)

Retrieve count tables with given keywords in the folder/filename.

The count files come from coalispr.bedgraph_analyze.process_bamdata.

Parameters:
  • name (str) – Name of particular count file to retrieve.

  • use (str (default: SPECIFIC)) – What type of counted reads to use, i.e. SPECIFIC or UNSPECIFIC.

  • bam (str (default: TAGCOLL)) – Flag to indicate sort of aligned-reads, TAGCOLL or TAGUNCOLL, used to obtain bam-alignments.

  • segments (str (default: TAGUNCOLL)) – Flag to indicate sort of aligned-reads, TAGCOLL or TAGUNCOLL, used to obtain segment definitions.

  • overmax (int (default: LOG2BG)) – Exponent to set threshold above which read signals are considered; part of folder name with stored count files.

  • maincut (float (default: UNSPECLOG10)) – Exponent to set difference between SPECIFIC and UNSPECIFIC reads; part of folder name with stored count files.

  • usegaps (int (default: USEGAPS)) – Region tolerated between peaks of mapped reads to form a contiguous segment; part of folder name with stored count files.

  • indexcol (int) – Number of index-column in table loaded from csv file.

  • debug (int) – Level to obtain relevant function calling this support function.

Return type:

pandas.DataFrame

coalispr.count_analyze.load_countfile.load_lengths_table_perc(name, use, debug)

Retrieve lengthcounts from coalispr.bedgraph_analyze.process_bamdata as percentages for given kind of reads.

Parameters:
  • name (str) – Name of particular count file to retrieve.

  • use (str (default: SPECIFIC)) – What type of counted reads to use, i.e. SPECIFIC or UNSPECIFIC.

  • debug (int) – Level to obtain relevant function calling this support function.

Returns:

Dataframe with percentaged counts for read-lengths.

Return type:

pandas.DataFrame

coalispr.count_analyze.load_countfile.load_bin_table_perc(name, use, debug)

Retrieve bin-counts from coalispr.bedgraph_analyze.process_bamdata as percentages for given kind of reads.

coalispr.count_analyze.load_countfile.get_freq_frame(name, use, debug, idx=PLOTINTRLEN, allc=PLOTPERC)

Get frame with distribution for index idx and column allc.

Parameters:
  • name (str) – Name of particular count file to retrieve.

  • use (str) – What type of counted reads to use, i.e. SPECIFIC or UNSPECIFIC.

  • idx (str) – Name of index column, PLOTINTRLEN for lengths (introns) or PLOTFREQ for number of hits (multimappers).

  • allc (str) – Name of column with values to be shown, PLOTINTR for lengths or PLOTMMAP for multimappers

Returns:

Dataframe with frequencies for lengths or multimappers.

Return type:

pandas.DataFrame

coalispr.count_analyze.load_countfile.split_readlengths_index(df)
coalispr.count_analyze.load_countfile.get_readlengths_frame(name, use, debug)

Add columns PLOTSTRT, PLOTLEN to frame with read lengths.

Parameters: name : str

Name of file to load.

usestr

Which kind of reads to use, SPECIFIC or UNSPECIFIC.

debugint

For debugging: the number of levels calling function is separated.