coalispr.resources.utilities

Attributes

Functions

bins_sum(df[, level])

Sum bin-values to get bin totals.

check_newpath(p)

checkset(listofexps[, omit])

In order to prevent duplication make sure that all input will be unique.

chrom_region(chrnam, region)

Create label for chromosome region.

chrxtra()

Check for presence extra DNA and annotations.

clean_dict(adict)

Remove empty items from a dictionary.

doneon()

Return date of function called (for saving files).

drop_all_zero_rows(df)

Return a dataframe without rows with only 0 values.

find_counts_path(subfolders, filenam, msg)

Return path to folder with stored count files.

get_count_folder(kind)

Return folder name for stored count files

get_segment_folder(kind[, log2bg, maincut, gaps])

Return folder name for stored segments files

get_skip()

Provide a value of fragment size skipped during counting,

get_string_info(df)

Return a string for logging info, bypassing standard out.

get_suffix_counts_path(subfolders[, suffix, create])

Return path to folder for storing count files.

get_suffix_store_path(subfolders[, suffix, create])

Return path to folder for storing (binary) data files.

get_suffix_path(parent, subfolders[, suffix, create])

Return path to folder for saving/finding processed files.

get_ylabel(label[, strand, spaces])

Return formatted label for y-axis of count plots.

include_test()

Include test command for profiling functions

is_all_zero(df)

Is this a dataframe with only 0 values?

is_backedup(pathtofile[, moveit])

Create a backup of file or directory or link, return success.

joinall(labels[, conn])

Return string of words from list or dict of labels.

joiner([symb])

Quote list-items when joined to string. Add start and end ' to calling

merg(df1, df2)

Merge bedgraphs for each chr on intervals with hits.

merg_frames(df_list)

Merge a list of dataframes by column-wise (axis=1) concatenation.

multi_process(func, keys, init_shared, shared[, ...])

Generalized multi_processor function for counting reads and storing data.

percentaged(df)

Turn dataframe values into percentages of column totals.

read_bny(filenam, **kwargs)

Reader of tabbed binary files, returns dataframe.

read_suffix(suffix, filenam, **kwargs)

Select and read file based on suffix

read_tsv(filenam, **kwargs)

Reader of tabbed csv files, returns dataframe.

remove_odds(termodds)

Prevent spaces or odd symbols in filename string.

removetree(directory)

Remove temporary directory

replace_dot(termdot)

Remove dots from file name and returns lower case version.

rpm_frame(frame, tots)

Return frame normalized to an external determined total number of reads,

thisfunc([n])

Return name of current or calling function for logging.

timer(func)

Print the time needed to run the decorated function.

usage(dfs)

Return memory usage for a list of dataframes.

write_bny(df, filenam, **kwargs)

Writer of binary files from dataframe.

write_suffix(suffix, df, filenam, **kwargs)

Write file from dataframe according to suffix; either write binary

write_tsv(df, filenam, **kwargs)

Writer of tabbed csv files from dataframe.

Module Contents

coalispr.resources.utilities.logger
coalispr.resources.utilities.TEST = False
coalispr.resources.utilities.bins_sum(df, level=BINN)

Sum bin-values to get bin totals.

Parameters:
  • df (pandas.DataFrame) – Dataframe with bins that constitute a level to be converted.

  • level (str) – Column header with indices to be grouped (default=**BINN**).

Return type:

pandas.DataFrame

coalispr.resources.utilities.check_newpath(p)
coalispr.resources.utilities.checkset(listofexps, omit=None)

In order to prevent duplication make sure that all input will be unique.

Parameters:

listofexperiments (list) – A list of experimental samples.

Returns:

A non-redundant list of experimental samples.

Return type:

list

coalispr.resources.utilities.chrom_region(chrnam, region)

Create label for chromosome region.

Parameters:
  • chrnam (str) – Chromosome name.

  • region (tuple) – Tuple of coordinates.

coalispr.resources.utilities.chrxtra()

Check for presence extra DNA and annotations.

coalispr.resources.utilities.clean_dict(adict)

Remove empty items from a dictionary.

Parameters:

adict (dict) – Dictionary

coalispr.resources.utilities.doneon()

Return date of function called (for saving files).

coalispr.resources.utilities.drop_all_zero_rows(df)

Return a dataframe without rows with only 0 values. With df.dropna(how=all) any 0 value will prevent dropping useless row: To circumvent this do df.fillna(0) then drop all-zero row.

Parameters:

df (pandas.DataFrame) – Dataframe to get info for.

Returns:

Flag to indicate whether all values are 0.

Return type:

bool

coalispr.resources.utilities.find_counts_path(subfolders, filenam, msg)

Return path to folder with stored count files.

Parameters:
  • subfolders (list) – Folders with file to find.

  • filenam (str) – File name with suffix, with saved counts.

  • msg (str) – Error message when file is not found.

Returns:

path – Path to folder with files with suffixis.

Return type:

Path

coalispr.resources.utilities.get_count_folder(kind)

Return folder name for stored count files

Parameters:

kind (str) – Kind of reads: UNSPECIFIC or SPECIFIC

Notes

TAGBAM: str (bam)

Flag to indicate sort of aligned-reads, TAGCOLL or TAGUNCOLL, used to obtain bam-alignments.

TAGSEG: str (segments)

Flag to indicate sort of aligned-reads, TAGCOLL or TAGUNCOLL, used to obtain segment definitions.

LOG2BG: int (over)

Exponent to set threshold above which read signals are considered; part of folder name with stored count files.

UNSPECLOG10: float (unspec)

Exponent to set difference between SPECIFIC and UNSPECIFIC reads; part of folder name with stored count files.

gaps: int

Region tolerated between peaks of mapped reads to form a contiguous segment, USEGAPS or UNSPCGAPS.

coalispr.resources.utilities.get_segment_folder(kind, log2bg=LOG2BG, maincut=UNSPECLOG10, gaps=None)

Return folder name for stored segments files

Parameters:
  • kind (str) – Kind of reads: UNSPECIFIC or SPECIFIC

  • kwargs (dict) – Collection to describe following parameters:

  • log2bg (int (default LOG2BG, over)) – Exponent to set threshold above which read signals are considered; part of folder name with stored count files.

  • maincut (float (default UNSPECLOG10, unspec)) – Exponent to set difference between SPECIFIC and UNSPECIFIC reads; part of folder name with stored count files.

  • gaps (int) – Region tolerated between peaks of mapped reads to form a contiguous segment, USEGAPS or UNSPCGAPS.

coalispr.resources.utilities.get_skip()

Provide a value of fragment size skipped during counting, which depends on BINSTEP and MIRNAPKBUF.

Returns:

A value representing an extra margin to expand read segment with a single peak beyond 0.

Return type:

int

coalispr.resources.utilities.get_string_info(df)

Return a string for logging info, bypassing standard out.

Parameters:

df (pandas.DataFrame) – Dataframe to get info for.

coalispr.resources.utilities.get_suffix_counts_path(subfolders, suffix=BNY, create=True)

Return path to folder for storing count files.

coalispr.resources.utilities.get_suffix_store_path(subfolders, suffix=BNY, create=True)

Return path to folder for storing (binary) data files.

coalispr.resources.utilities.get_suffix_path(parent, subfolders, suffix=BNY, create=True)

Return path to folder for saving/finding processed files.

Parameters:
  • parent – Parental folder.

  • subfolders (list) – Subfolders where saved files go.

  • suffix (str) – Denotes kind of binary file to save.

  • create (bool) – Create folders for vstorage, not when retrieved.

coalispr.resources.utilities.get_ylabel(label, strand=COMBI, spaces=0)

Return formatted label for y-axis of count plots.

Parameters:
  • label (str) – Read kind name to retrieve a label for configured in CNTLABELS.

  • strand (str) – One of COMBI, MUNR or CORB to indicate strand counted reads map to.

  • spaces (int) – Number of spaces to start second line with.

coalispr.resources.utilities.include_test()

Include test command for profiling functions

coalispr.resources.utilities.is_all_zero(df)

Is this a dataframe with only 0 values?

Parameters:

df (pandas.DataFrame) – Dataframe to get info for.

Returns:

Flag to indicate whether all values are 0.

Return type:

bool

coalispr.resources.utilities.is_backedup(pathtofile, moveit=True)

Create a backup of file or directory or link, return success.

Parameters:

pathtofile: Path

Path to object to be overwritten/replaced with a new version with the same name.

moveit: bool

Flag to indicate to rename and move file (default) or copy it.

returns:

Flag to indicate backup process went through.

rtype:

bool

coalispr.resources.utilities.joinall(labels, conn="', '")

Return string of words from list or dict of labels.

Parameters:
  • labels (list or dict) – List/Dictionary of lists of words to be joined.

  • conn (str) – Connector linking the words from labels.

coalispr.resources.utilities.joiner(symb=None)

Quote list-items when joined to string. Add start and end ‘ to calling format function {no control of ‘/” when using ‘repr’ by including !r}.

coalispr.resources.utilities.merg(df1, df2)

Merge bedgraphs for each chr on intervals with hits.

All rows/columns need to be combined; this creates duplicate columns with adapted names when non-unique columns are merged. Index of resulting dataframe is ordered.

Parameters:
  • df1 (pandas.DataFrame) – Dataframes to merge

  • df2 (pandas.DataFrame) – Dataframes to merge

Returns:

Merged dataframe.

Return type:

pandas.DataFrame

coalispr.resources.utilities.merg_frames(df_list)

Merge a list of dataframes by column-wise (axis=1) concatenation.

All indexes need to be kept (can vary between merged dataframes) All rows/columns need to be combined; this creates duplicate columns with adapted names when non-unique columns are merged. Concatenation of columns (bedgraph data) on index while each dataset comes with a different index, results in a dataframe with an unordered index that reflects adding order and needs to be sorted.

Parameters:

df_list (list) – List of pandas.DataFrames to merge.

Returns:

Merged dataframe.

Return type:

pandas.DataFrame

coalispr.resources.utilities.multi_process(func, keys, init_shared, shared, sequential=False)

Generalized multi_processor function for counting reads and storing data.

Parameters:
  • func (function) – Name of function to run in separate process.

  • keys (list) – List of keys - SHORT names, each leading to one alignment file to be counted in a separate process.

  • init_shared (function) – initializer function for each process to set global/shared variables

  • shared (object) – Object defining globals to be shared between all spawned processes.

  • sequential (bool) – Flag to bypass multiprocessing if memory demand becomes too high.

Returns:

collected_objects – Collection of objects generated by fusing outcomes of each process.

Return type:

list

coalispr.resources.utilities.percentaged(df)

Turn dataframe values into percentages of column totals.

Parameters:

df (pandas.DataFrame) – Dataframe with raw counts

Return type:

pandas.DataFrame

coalispr.resources.utilities.read_bny(filenam, **kwargs)

Reader of tabbed binary files, returns dataframe.

Parameters:
  • filenam (Path or str) – Input binary file

  • kwargs (dict) – Additional parameters

coalispr.resources.utilities.read_suffix(suffix, filenam, **kwargs)

Select and read file based on suffix

coalispr.resources.utilities.read_tsv(filenam, **kwargs)

Reader of tabbed csv files, returns dataframe.

Parameters:
  • filenam (Path or str) – Input text file

  • kwargs (dict) – Additional parameters fitting pd.to_csv; omit comment and sep

coalispr.resources.utilities.remove_odds(termodds)

Prevent spaces or odd symbols in filename string.

Parameters:

termodds (str) – String with possibly symbols or spaces in filename.

Returns:

Lower case name without odds; not to be confused with extension.

Return type:

str

coalispr.resources.utilities.removetree(directory)

Remove temporary directory https://docs.python.org/3/library/shutil.html#shutil-rmtree-example

coalispr.resources.utilities.replace_dot(termdot)

Remove dots from file name and returns lower case version.

Parameters:

termdot (str) – String with possibly dots (‘.’) in filename (excluding extension).

Returns:

Lowercase string without dot(s); not to be confused with extension.

Return type:

str

coalispr.resources.utilities.rpm_frame(frame, tots)

Return frame normalized to an external determined total number of reads, as ‘reads per million’ (RMP).

frame: pd.DataFrame

Dataframe with raw read numbers to normalize.

totsint

total number of reads to normalize to.

coalispr.resources.utilities.thisfunc(n=0)

Return name of current or calling function for logging.

from: https://stackoverflow.com/questions/5067604/determine-function-name-from-within-that-function-without-using-traceback

https://docs.quantifiedcode.com/python-anti-patterns/correctness/assigning_a_lambda_to_a_variable.html

https://docs.quantifiedcode.com/python-anti-patterns/correctness/accessing_a_protected_member_from_outside_the_class.html

Parameters:

n (int) – For current func name, specify 0 or no argument. For name of caller of current func, specify 1. For name of caller of caller of current func, specify 2. etc.

Returns:

Name of function containing call.

Return type:

str

coalispr.resources.utilities.timer(func)

Print the time needed to run the decorated function. from: https://realpython.com/primer-on-python-decorators/

coalispr.resources.utilities.usage(dfs)

Return memory usage for a list of dataframes.

coalispr.resources.utilities.write_bny(df, filenam, **kwargs)

Writer of binary files from dataframe.

Parameters:
  • df (pd.DataFrame) – Dataframe to save.

  • filnam (Path or str) – Output binary file.

  • kwargs (dict) – Additional parameters fitting pd.to_<format>;

coalispr.resources.utilities.write_suffix(suffix, df, filenam, **kwargs)
Write file from dataframe according to suffix; either write binary

(BNY) parquet or tabbed csv (TSV) files.

Parameters:
  • suffix (str) – BNY for binary or TSV for csv.

  • df (pd.DataFrame) – Dataframe to save

  • filnam (Path or str) – Output path for tabbed text file

  • kwargs (dict) – Additional parameters, apart from ‘comment’, ‘sep’, ‘quoting’, ‘quotingchar’, or ‘escapechar’, fitting pd.to_csv.

coalispr.resources.utilities.write_tsv(df, filenam, **kwargs)

Writer of tabbed csv files from dataframe.

Parameters:
  • df (pd.DataFrame) – Dataframe to save

  • filnam (Path or str) – Output path for tabbed text file

  • kwargs (dict) – Additional parameters, apart from ‘comment’, ‘sep’, ‘quoting’, ‘quotingchar’, or ‘escapechar’, fitting pd.to_csv.