coalispr.bedgraph_analyze.store¶
Module for dealing with file storage and retrieval.
Attributes¶
Functions¶
|
Create storage folders during the initialization step |
|
Return path to folder with written bam files for unselected reads. |
|
Pickle binned bedgraph dataframes for easy access. |
|
Pickle chromosomal indexfile tuples for easy access. |
|
Store bedgraph or regions dataframes as tsv. |
|
Save averaged count tables with given keywords in the folder/filename. |
|
Retrieve the merged experimental data. |
|
Retrieve experimental data from pickle files. |
Returns merged unselected data, organized per binned chromosome. |
|
|
Returns total input counts from saved files. |
|
Convert binary bedgraph-data to text; skip indexes-only files. |
Restore binary bedgraph-data from text files. This will replace |
|
|
Replace sample names in data files that were pickled. |
|
Replace sample in file names |
|
Replace sample names in count files. |
|
Replace sample names in experiment file EXPFILE. |
Show pandas memory usage of merged data frames. |
Module Contents¶
- coalispr.bedgraph_analyze.store.logger¶
- coalispr.bedgraph_analyze.store.config_from_newdirs(exp, path)¶
Create storage folders during the initialization step
coalispr init
.- Returns:
Paths to storage folders linked to new experiment EXP.
- Return type:
Path
- coalispr.bedgraph_analyze.store.get_unselected_folderpath(kind=UNSPECIFIC, tag=TAGCOLL)¶
Return path to folder with written bam files for unselected reads.
- Parameters:
kind (str (default: UNSPECIFIC)) – Flag to indicate kind of specified reads to analyze, either specific or unspecific (default) for retrieving reads adhering to characteristics, when known, of specific RNAs.
tag (str) – Flag TAG to indicate
kind
of aligned-reads, TAGUNCOLL or TAGCOLL. Comes from counted bam files, which -for efficiency reasons- could be expected to be containing collapsed reads (TAGCOLL).
- coalispr.bedgraph_analyze.store.store_chromosome_data(name, plusdata, minusdata, tag, notag=False, otherpath=None)¶
Pickle binned bedgraph dataframes for easy access.
- Parameters:
name (str) – Name for file to be stored.
plusdata (object) – Data for plus strand.
minusdata (object) – Data for minus strand.
tag (str) – Flag TAG to indicate kind of aligned-reads, TAGUNCOLL or TAGCOLL.
notag (bool) – Flag to indicate whether ‘tag’ needs an argument.
otherpath (Path) – Path to storage location if different from default
_get_storepath()
.
- Returns:
Prints message upon completion of function
- Return type:
None
- coalispr.bedgraph_analyze.store.store_processed_chrindexes(name, chrnam, plusdata, minusdata, tag)¶
Pickle chromosomal indexfile tuples for easy access.
- Parameters:
name (str) – Name for file to be stored.
chrnam (str) – Name of chromosome for which data are stored.
plusdata (object) – Data for plus strand of chromosome chrnam.
minusdata (object) – Data for minus strand of chromosome chrnam.
tag (str) – Flag TAG to indicate kind of aligned-reads, TAGUNCOLL or TAGCOLL.
- Returns:
Prints message upon completion of function.
- Return type:
None
- coalispr.bedgraph_analyze.store.store_chromosome_data_as_tsv(name, plusdata, minusdata, tag)¶
Store bedgraph or regions dataframes as tsv.
- Parameters:
name (str) – Name for file to be stored.
plusdata (object) – Data for plus strand.
minusdata (object) – Data for minus strand.
tag (str) – Flag TAG to indicate kind of aligned-reads, TAGUNCOLL or TAGCOLL.
- Returns:
Prints message upon completion of function.
- Return type:
None
- coalispr.bedgraph_analyze.store.save_average_table(df, name, use, samples, bam=TAGCOLL, segments=TAGUNCOLL, overmax=LOG2BG, maincut=UNSPECLOG10, usegaps=USEGAPS)¶
Save averaged count tables with given keywords in the folder/filename.
- Parameters:
name (str) – Name of filename for output table to save; is equal to figure name.
use (str (default: SPECIFIC)) – What type of counted reads to use, i.e. SPECIFIC or UNSPECIFIC.
samples (list) – List of library samples used for averaging dataframe.
bam (str (default: TAGCOLL)) – Flag to indicate sort of aligned-reads, TAGCOLL or TAGUNCOLL, used to obtain bam-alignments.
segments (str (default: TAGUNCOLL)) – Flag to indicate sort of aligned-reads, TAGCOLL or TAGUNCOLL, used to obtain segment definitions.
overmax (int (default: LOG2BG)) – Exponent to set threshold above which read signals are considered; part of folder name with stored count files.
maincut (float (default: UNSPECLOG10)) – Exponent to set difference between SPECIFIC and UNSPECIFIC reads; part of folder name with stored count files.
usegaps (int (default: USEGAPS)) – Region tolerated between peaks of mapped reads to form a contiguous segment; part of folder name with stored count files.
- coalispr.bedgraph_analyze.store.retrieve_merged(tag=TAG)¶
Retrieve the merged experimental data.
- Parameters:
tag (str) – Flag TAG to indicate kind of aligned-reads, TAGUNCOLL or TAGCOLL.
- Returns:
A tuple of dicts, one for each strand, with pandas dataframes, one for each chromosome, with columns of bedgraph values summed per BINSET for each sample.
- Return type:
dict of pandas.DataFrames, dict of pandas.DataFrames
- coalispr.bedgraph_analyze.store.retrieve_processed_files(name, tag, notag=False)¶
Retrieve experimental data from pickle files.
Defines internal class
FileTooShortWarning(Exception)
- Parameters:
name (str) – Name for file
tag (str) – Flag TAG to indicate kind of aligned-reads, TAGUNCOLL or TAGCOLL.
notag (bool) – Flag to indicate whether ‘tag’ needs an argument.
- Raises:
FileTooShortWarning – Raised when stored file has no significant size; previous file writing was inadequate.
- Returns:
A tuple of dicts, one for each strand, with data structures, one for each chromosome.
- Return type:
dict, dict
- coalispr.bedgraph_analyze.store.retrieve_merged_unselected()¶
Returns merged unselected data, organized per binned chromosome.
- coalispr.bedgraph_analyze.store.get_inputtotals(kind=TAGUNCOLL)¶
Returns total input counts from saved files.
- Parameters:
kind (str) – Type of reads, TAGUNCOLL or TAGCOLL.
- coalispr.bedgraph_analyze.store.backup_pickle_to_tsv(data_only=False, merged_only=False)¶
Convert binary bedgraph-data to text; skip indexes-only files.
- Returns:
Prints message upon completion of function
- Return type:
None
- coalispr.bedgraph_analyze.store.pickle_from_backup_tsv()¶
Restore binary bedgraph-data from text files. This will replace original STOREPICKLE contents.
- Returns:
Prints message upon completion of function
- Return type:
None
- coalispr.bedgraph_analyze.store.rename_in_data_tsv(names_old_new)¶
Replace sample names in data files that were pickled.
- Parameters:
names_old_new (list of tuples: [(old_name1, new_name1),) – (old_name2, new_name2),]
- Return type:
boolean expressing completion.
- coalispr.bedgraph_analyze.store.rename_unselected(names_old_new, tag=None)¶
Replace sample in file names
- Parameters:
names_old_new (list of tuples: [(old_name1, new_name1),) – (old_name2, new_name2),]
tag (str) – Flag TAG to indicate
kind
of aligned-reads, TAGUNCOLL or TAGCOLL. Comes from counted bam files, which -for efficiency reasons- could be expected to be containing collapsed reads (TAGCOLL).
- Return type:
boolean expressing completion.
- coalispr.bedgraph_analyze.store.rename_in_count_tsv(names_old_new)¶
Replace sample names in count files.
- Parameters:
names_old_new (list of tuples: [(old_name1, new_name1),) – (old_name2, new_name2),]
- Return type:
boolean expressing completion.
- coalispr.bedgraph_analyze.store.rename_in_expfile(names_old_new)¶
Replace sample names in experiment file EXPFILE.
- Parameters:
names_old_new (list of tuples: [(old_name1, new_name1),) – (old_name2, new_name2),]
- Return type:
boolean expressing completion.
- coalispr.bedgraph_analyze.store.print_memory_usage_merged()¶
Show pandas memory usage of merged data frames.
- Returns:
Floats describing (in MBs) memory usage of reference, TAGCOLL or TAGUNCOLL datasets in Pandas.
- Return type:
floats