coalispr.bedgraph_analyze.store¶

Module for dealing with file storage and retrieval.

Attributes¶

logger

Functions¶

`config_from_newdirs`(exp, path)	Create storage folders during the initialization step `coalispr init`.
`get_unselected_folderpath`([kind, tag])	Return path to folder with written bam files for unselected reads.
`store_chromosome_data`(name, plusdata, minusdata, tag)	Pickle binned bedgraph dataframes for easy access.
`store_processed_chrindexes`(name, chrnam, plusdata, ...)	Pickle chromosomal indexfile tuples for easy access.
`store_chromosome_data_as_tsv`(name, plusdata, ...)	Store bedgraph or regions dataframes as tsv.
`save_average_table`(df, name, use, samples[, bam, ...])	Save averaged count tables with given keywords in the folder/filename.
`retrieve_merged`([tag])	Retrieve the merged experimental data.
`retrieve_processed_files`(name, tag[, notag])	Retrieve experimental data from pickle files.
`retrieve_merged_unselected`()	Returns merged unselected data, organized per binned chromosome.
`get_inputtotals`([kind])	Returns total input counts from saved files.
`backup_pickle_to_tsv`([data_only, merged_only])	Convert binary bedgraph-data to text; skip indexes-only files.
`pickle_from_backup_tsv`()	Restore binary bedgraph-data from text files. This will replace
`rename_in_data_tsv`(names_old_new)	Replace sample names in data files that were pickled.
`rename_unselected`(names_old_new[, tag])	Replace sample in file names
`rename_in_count_tsv`(names_old_new)	Replace sample names in count files.
`rename_in_expfile`(names_old_new)	Replace sample names in experiment file EXPFILE.
`print_memory_usage_merged`()	Show pandas memory usage of merged data frames.

Module Contents¶

coalispr.bedgraph_analyze.store.logger¶

coalispr.bedgraph_analyze.store.config_from_newdirs(exp, path)¶

Create storage folders during the initialization step coalispr init.

Returns:: Paths to storage folders linked to new experiment EXP.
Return type:: Path

coalispr.bedgraph_analyze.store.get_unselected_folderpath(kind=UNSPECIFIC, tag=TAGCOLL)¶

Return path to folder with written bam files for unselected reads.

Parameters:

kind (str (default: UNSPECIFIC)) – Flag to indicate kind of specified reads to analyze, either specific or unspecific (default) for retrieving reads adhering to characteristics, when known, of specific RNAs.
tag (str) – Flag TAG to indicate kind of aligned-reads, TAGUNCOLL or TAGCOLL. Comes from counted bam files, which -for efficiency reasons- could be expected to be containing collapsed reads (TAGCOLL).

coalispr.bedgraph_analyze.store.store_chromosome_data(name, plusdata, minusdata, tag, notag=False, otherpath=None)¶

Pickle binned bedgraph dataframes for easy access.

Parameters:

name (str) – Name for file to be stored.
plusdata (object) – Data for plus strand.
minusdata (object) – Data for minus strand.
tag (str) – Flag TAG to indicate kind of aligned-reads, TAGUNCOLL or TAGCOLL.
notag (bool) – Flag to indicate whether ‘tag’ needs an argument.
otherpath (Path) – Path to storage location if different from default _get_storepath().

Returns:

Prints message upon completion of function

Return type:

None

coalispr.bedgraph_analyze.store.store_processed_chrindexes(name, chrnam, plusdata, minusdata, tag)¶

Pickle chromosomal indexfile tuples for easy access.

Parameters:

name (str) – Name for file to be stored.
chrnam (str) – Name of chromosome for which data are stored.
plusdata (object) – Data for plus strand of chromosome chrnam.
minusdata (object) – Data for minus strand of chromosome chrnam.
tag (str) – Flag TAG to indicate kind of aligned-reads, TAGUNCOLL or TAGCOLL.

Returns:

Prints message upon completion of function.

Return type:

None

coalispr.bedgraph_analyze.store.store_chromosome_data_as_tsv(name, plusdata, minusdata, tag)¶

Store bedgraph or regions dataframes as tsv.

Parameters:

name (str) – Name for file to be stored.
plusdata (object) – Data for plus strand.
minusdata (object) – Data for minus strand.
tag (str) – Flag TAG to indicate kind of aligned-reads, TAGUNCOLL or TAGCOLL.

Returns:

Prints message upon completion of function.

Return type:

None

coalispr.bedgraph_analyze.store.save_average_table(df, name, use, samples, bam=TAGCOLL, segments=TAGUNCOLL, overmax=LOG2BG, maincut=UNSPECLOG10, usegaps=USEGAPS)¶

Save averaged count tables with given keywords in the folder/filename.

Parameters:

name (str) – Name of filename for output table to save; is equal to figure name.
use (str (default: SPECIFIC)) – What type of counted reads to use, i.e. SPECIFIC or UNSPECIFIC.
samples (list) – List of library samples used for averaging dataframe.
bam (str (default: TAGCOLL)) – Flag to indicate sort of aligned-reads, TAGCOLL or TAGUNCOLL, used to obtain bam-alignments.
segments (str (default: TAGUNCOLL)) – Flag to indicate sort of aligned-reads, TAGCOLL or TAGUNCOLL, used to obtain segment definitions.
overmax (int (default: LOG2BG)) – Exponent to set threshold above which read signals are considered; part of folder name with stored count files.
maincut (float (default: UNSPECLOG10)) – Exponent to set difference between SPECIFIC and UNSPECIFIC reads; part of folder name with stored count files.
usegaps (int (default: USEGAPS)) – Region tolerated between peaks of mapped reads to form a contiguous segment; part of folder name with stored count files.

coalispr.bedgraph_analyze.store.retrieve_merged(tag=TAG)¶

Retrieve the merged experimental data.

Parameters:: tag (str) – Flag TAG to indicate kind of aligned-reads, TAGUNCOLL or TAGCOLL.
Returns:: A tuple of dicts, one for each strand, with pandas dataframes, one for each chromosome, with columns of bedgraph values summed per BINSET for each sample.
Return type:: dict of pandas.DataFrames, dict of pandas.DataFrames

coalispr.bedgraph_analyze.store.retrieve_processed_files(name, tag, notag=False)¶

Retrieve experimental data from pickle files.

Defines internal class FileTooShortWarning(Exception)

Parameters:

name (str) – Name for file
tag (str) – Flag TAG to indicate kind of aligned-reads, TAGUNCOLL or TAGCOLL.
notag (bool) – Flag to indicate whether ‘tag’ needs an argument.

Raises:

FileTooShortWarning – Raised when stored file has no significant size; previous file writing was inadequate.

Returns:

A tuple of dicts, one for each strand, with data structures, one for each chromosome.

Return type:

dict, dict

coalispr.bedgraph_analyze.store.retrieve_merged_unselected()¶: Returns merged unselected data, organized per binned chromosome.

coalispr.bedgraph_analyze.store.get_inputtotals(kind=TAGUNCOLL)¶

Returns total input counts from saved files.

Parameters:: kind (str) – Type of reads, TAGUNCOLL or TAGCOLL.

coalispr.bedgraph_analyze.store.backup_pickle_to_tsv(data_only=False, merged_only=False)¶

Convert binary bedgraph-data to text; skip indexes-only files.

Returns:: Prints message upon completion of function
Return type:: None

coalispr.bedgraph_analyze.store.pickle_from_backup_tsv()¶

Restore binary bedgraph-data from text files. This will replace original STOREPICKLE contents.

Returns:: Prints message upon completion of function
Return type:: None

coalispr.bedgraph_analyze.store.rename_in_data_tsv(names_old_new)¶

Replace sample names in data files that were pickled.

Parameters:: names_old_new (list of tuples: [(old_name1, new_name1),) – (old_name2, new_name2),]
Return type:: boolean expressing completion.

coalispr.bedgraph_analyze.store.rename_unselected(names_old_new, tag=None)¶

Replace sample in file names

Parameters:

names_old_new (list of tuples: [(old_name1, new_name1),) – (old_name2, new_name2),]
tag (str) – Flag TAG to indicate kind of aligned-reads, TAGUNCOLL or TAGCOLL. Comes from counted bam files, which -for efficiency reasons- could be expected to be containing collapsed reads (TAGCOLL).

Return type:

boolean expressing completion.

coalispr.bedgraph_analyze.store.rename_in_count_tsv(names_old_new)¶

Replace sample names in count files.

Parameters:: names_old_new (list of tuples: [(old_name1, new_name1),) – (old_name2, new_name2),]
Return type:: boolean expressing completion.

coalispr.bedgraph_analyze.store.rename_in_expfile(names_old_new)¶

Replace sample names in experiment file EXPFILE.

Parameters:: names_old_new (list of tuples: [(old_name1, new_name1),) – (old_name2, new_name2),]
Return type:: boolean expressing completion.

coalispr.bedgraph_analyze.store.print_memory_usage_merged()¶

Show pandas memory usage of merged data frames.

Returns:: Floats describing (in MBs) memory usage of reference, TAGCOLL or TAGUNCOLL datasets in Pandas.
Return type:: floats