coalispr.bedgraph_analyze.store¶
Module for dealing with file storage and retrieval to and from BNY.
to do: Change saving format to TSV using command parameter.
Attributes¶
Functions¶
|
Create storage folders during the initialization step |
Return path to folder with written bam files for unselected reads that |
|
|
Binarize binned bedgraph dataframes for easy access. |
|
Binarize chromosomal indexfile tuples for easy access. |
|
Save averaged count tables with given keywords in the folder/filename. |
|
Retrieve the merged experimental data. |
|
Retrieve experimental data from binary files. |
Returns merged unselected data, organized per binned chromosome. |
|
|
Returns total input counts for 'kind' of sequence from saved files. |
|
Check folders for possibility to contine with options. |
|
Convert binary bedgraph-data to text; skip indexes-only files. |
Restore binary bedgraph-data from text files. This will replace |
|
|
Replace sample names in data files. |
|
Replace sample in file names |
|
Replace sample names in count files. |
|
Replace sample names in experiment file EXPFILE. |
Show pandas memory usage of merged data frames. |
Module Contents¶
- coalispr.bedgraph_analyze.store.logger¶
- coalispr.bedgraph_analyze.store.config_from_newdirs(exp, path)¶
Create storage folders during the initialization step
coalispr init.- Returns:
Paths to storage folders linked to new experiment EXP.
- Return type:
Path
- coalispr.bedgraph_analyze.store.get_unselected_folderpath()¶
Return path to folder with written bam files for unselected reads that have been retrieved during counting of UNSPECIFIC reads.
- coalispr.bedgraph_analyze.store.store_chromosome_data(name, plusdata, minusdata, tag, notag=False, suffix=BNY, otherpath=None)¶
Binarize binned bedgraph dataframes for easy access.
- Parameters:
name (str) – Name for file to be stored.
plusdata (object) – Data for plus strand.
minusdata (object) – Data for minus strand.
tag (str) – Flag TAG to indicate kind of aligned-reads, TAGUNCOLL or TAGCOLL.
notag (bool) – Flag to indicate whether ‘tag’ needs an argument.
suffix (str) – Suffix indicating file format and used to select function name for storing dataframes by pandas.
otherpath (Path) – Path to storage location if different from default
_get_store_path().
- Returns:
Prints message upon completion of function
- Return type:
None
- coalispr.bedgraph_analyze.store.store_processed_chrindexes(name, chrnam, plusdata, minusdata, tag)¶
Binarize chromosomal indexfile tuples for easy access.
- Parameters:
name (str) – Name for file to be stored.
chrnam (str) – Name of chromosome for which data are stored.
plusdata (object) – Data for plus strand of chromosome chrnam.
minusdata (object) – Data for minus strand of chromosome chrnam.
tag (str) – Flag to indicate kind of aligned-reads, TAGUNCOLL or TAGCOLL.
- Returns:
Prints message upon completion of function.
- Return type:
None
- coalispr.bedgraph_analyze.store.save_average_table(df, name, kind, samples)¶
Save averaged count tables with given keywords in the folder/filename.
- Parameters:
name (str) – Name of filename for output table to save; is equal to figure name.
kind (str (default: SPECIFIC)) – What type of counted reads to use, i.e. SPECIFIC or UNSPECIFIC.
samples (list) – List of library samples used for averaging dataframe.
- coalispr.bedgraph_analyze.store.retrieve_merged(tag=TAG)¶
Retrieve the merged experimental data.
- Parameters:
tag (str) – Flag TAG to indicate kind of aligned-reads, TAGUNCOLL or TAGCOLL.
- Returns:
A tuple of dicts, one for each strand, with pandas dataframes, one for each chromosome, with columns of bedgraph values summed per BINSET for each sample.
- Return type:
dict of pandas.DataFrames, dict of pandas.DataFrames
- coalispr.bedgraph_analyze.store.retrieve_processed_files(name, tag, notag=False)¶
Retrieve experimental data from binary files.
Defines internal class
FileTooShortWarning(Exception)- Parameters:
name (str) – Name for file
tag (str) – Flag TAG to indicate kind of aligned-reads, TAGUNCOLL or TAGCOLL.
notag (bool) – Flag to indicate whether ‘tag’ needs an argument.
- Raises:
FileTooShortWarning – Raised when stored file has no significant size; previous file writing was inadequate.
- Returns:
A tuple of dicts, one for each strand, with data structures, one for each chromosome.
- Return type:
dict, dict
- coalispr.bedgraph_analyze.store.retrieve_merged_unselected()¶
Returns merged unselected data, organized per binned chromosome.
- coalispr.bedgraph_analyze.store.get_inputtotals(kind=TAGUNCOLL)¶
Returns total input counts for ‘kind’ of sequence from saved files.
- Parameters:
kind (str) – Type of reads, TAGUNCOLL (all reads) or TAGCOLL (unique cDNAs).
- coalispr.bedgraph_analyze.store.has_been_run(options, backup=False)¶
Check folders for possibility to contine with options.
- Parameters:
options (list) – List of keys to find folders to check
- coalispr.bedgraph_analyze.store.backup_binary_to_tsv(data_only=False, merged_only=False)¶
Convert binary bedgraph-data to text; skip indexes-only files.
- Returns:
Prints message upon completion of function
- Return type:
None
- coalispr.bedgraph_analyze.store.binary_from_backup_tsv()¶
Restore binary bedgraph-data from text files. This will replace original STOREBINARY contents.
- Returns:
Prints message upon completion of function
- Return type:
None
- coalispr.bedgraph_analyze.store.rename_in_data_tsv(names_old_new)¶
Replace sample names in data files.
- Parameters:
names_old_new (list of tuples: [(old_name1, new_name1),) – (old_name2, new_name2),]
- Return type:
boolean expressing completion.
- coalispr.bedgraph_analyze.store.rename_unselected(names_old_new)¶
Replace sample in file names
- Parameters:
names_old_new (list of tuples: [(old_name1, new_name1),) – (old_name2, new_name2),]
- Return type:
boolean expressing completion.
- coalispr.bedgraph_analyze.store.rename_in_count_tsv(names_old_new)¶
Replace sample names in count files.
- Parameters:
names_old_new (list of tuples: [(old_name1, new_name1),) – (old_name2, new_name2),]
- Returns:
Flag expressing completion.
- Return type:
boolean
- coalispr.bedgraph_analyze.store.rename_in_expfile(names_old_new)¶
Replace sample names in experiment file EXPFILE.
- Parameters:
names_old_new (list of tuples: [(old_name1, new_name1),) – (old_name2, new_name2),]
- Return type:
boolean expressing completion.
- coalispr.bedgraph_analyze.store.print_memory_usage_merged()¶
Show pandas memory usage of merged data frames.
- Returns:
Floats describing (in MBs) memory usage of reference, TAGCOLL or TAGUNCOLL datasets in Pandas.
- Return type:
floats