coalispr.bedgraph_analyze.store

Module for dealing with file storage and retrieval.

Attributes

Functions

config_from_newdirs(exp, path)

Create storage folders during the initialization step coalispr init.

get_unselected_folderpath([kind, tag])

Return path to folder with written bam files for unselected reads.

store_chromosome_data(name, plusdata, minusdata, tag)

Pickle binned bedgraph dataframes for easy access.

store_processed_chrindexes(name, chrnam, plusdata, ...)

Pickle chromosomal indexfile tuples for easy access.

store_chromosome_data_as_tsv(name, plusdata, ...)

Store bedgraph or regions dataframes as tsv.

save_average_table(df, name, use, samples[, bam, ...])

Save averaged count tables with given keywords in the folder/filename.

retrieve_merged([tag])

Retrieve the merged experimental data.

retrieve_processed_files(name, tag[, notag])

Retrieve experimental data from pickle files.

retrieve_merged_unselected()

Returns merged unselected data, organized per binned chromosome.

get_inputtotals([kind])

Returns total input counts from saved files.

backup_pickle_to_tsv([data_only, merged_only])

Convert binary bedgraph-data to text; skip indexes-only files.

pickle_from_backup_tsv()

Restore binary bedgraph-data from text files. This will replace

rename_in_data_tsv(names_old_new)

Replace sample names in data files that were pickled.

rename_unselected(names_old_new[, tag])

Replace sample in file names

rename_in_count_tsv(names_old_new)

Replace sample names in count files.

rename_in_expfile(names_old_new)

Replace sample names in experiment file EXPFILE.

print_memory_usage_merged()

Show pandas memory usage of merged data frames.

Module Contents

coalispr.bedgraph_analyze.store.logger
coalispr.bedgraph_analyze.store.config_from_newdirs(exp, path)

Create storage folders during the initialization step coalispr init.

Returns:

Paths to storage folders linked to new experiment EXP.

Return type:

Path

coalispr.bedgraph_analyze.store.get_unselected_folderpath(kind=UNSPECIFIC, tag=TAGCOLL)

Return path to folder with written bam files for unselected reads.

Parameters:
  • kind (str (default: UNSPECIFIC)) – Flag to indicate kind of specified reads to analyze, either specific or unspecific (default) for retrieving reads adhering to characteristics, when known, of specific RNAs.

  • tag (str) – Flag TAG to indicate kind of aligned-reads, TAGUNCOLL or TAGCOLL. Comes from counted bam files, which -for efficiency reasons- could be expected to be containing collapsed reads (TAGCOLL).

coalispr.bedgraph_analyze.store.store_chromosome_data(name, plusdata, minusdata, tag, notag=False, otherpath=None)

Pickle binned bedgraph dataframes for easy access.

Parameters:
  • name (str) – Name for file to be stored.

  • plusdata (object) – Data for plus strand.

  • minusdata (object) – Data for minus strand.

  • tag (str) – Flag TAG to indicate kind of aligned-reads, TAGUNCOLL or TAGCOLL.

  • notag (bool) – Flag to indicate whether ‘tag’ needs an argument.

  • otherpath (Path) – Path to storage location if different from default _get_storepath().

Returns:

Prints message upon completion of function

Return type:

None

coalispr.bedgraph_analyze.store.store_processed_chrindexes(name, chrnam, plusdata, minusdata, tag)

Pickle chromosomal indexfile tuples for easy access.

Parameters:
  • name (str) – Name for file to be stored.

  • chrnam (str) – Name of chromosome for which data are stored.

  • plusdata (object) – Data for plus strand of chromosome chrnam.

  • minusdata (object) – Data for minus strand of chromosome chrnam.

  • tag (str) – Flag TAG to indicate kind of aligned-reads, TAGUNCOLL or TAGCOLL.

Returns:

Prints message upon completion of function.

Return type:

None

coalispr.bedgraph_analyze.store.store_chromosome_data_as_tsv(name, plusdata, minusdata, tag)

Store bedgraph or regions dataframes as tsv.

Parameters:
  • name (str) – Name for file to be stored.

  • plusdata (object) – Data for plus strand.

  • minusdata (object) – Data for minus strand.

  • tag (str) – Flag TAG to indicate kind of aligned-reads, TAGUNCOLL or TAGCOLL.

Returns:

Prints message upon completion of function.

Return type:

None

coalispr.bedgraph_analyze.store.save_average_table(df, name, use, samples, bam=TAGCOLL, segments=TAGUNCOLL, overmax=LOG2BG, maincut=UNSPECLOG10, usegaps=USEGAPS)

Save averaged count tables with given keywords in the folder/filename.

Parameters:
  • name (str) – Name of filename for output table to save; is equal to figure name.

  • use (str (default: SPECIFIC)) – What type of counted reads to use, i.e. SPECIFIC or UNSPECIFIC.

  • samples (list) – List of library samples used for averaging dataframe.

  • bam (str (default: TAGCOLL)) – Flag to indicate sort of aligned-reads, TAGCOLL or TAGUNCOLL, used to obtain bam-alignments.

  • segments (str (default: TAGUNCOLL)) – Flag to indicate sort of aligned-reads, TAGCOLL or TAGUNCOLL, used to obtain segment definitions.

  • overmax (int (default: LOG2BG)) – Exponent to set threshold above which read signals are considered; part of folder name with stored count files.

  • maincut (float (default: UNSPECLOG10)) – Exponent to set difference between SPECIFIC and UNSPECIFIC reads; part of folder name with stored count files.

  • usegaps (int (default: USEGAPS)) – Region tolerated between peaks of mapped reads to form a contiguous segment; part of folder name with stored count files.

coalispr.bedgraph_analyze.store.retrieve_merged(tag=TAG)

Retrieve the merged experimental data.

Parameters:

tag (str) – Flag TAG to indicate kind of aligned-reads, TAGUNCOLL or TAGCOLL.

Returns:

A tuple of dicts, one for each strand, with pandas dataframes, one for each chromosome, with columns of bedgraph values summed per BINSET for each sample.

Return type:

dict of pandas.DataFrames, dict of pandas.DataFrames

coalispr.bedgraph_analyze.store.retrieve_processed_files(name, tag, notag=False)

Retrieve experimental data from pickle files.

Defines internal class FileTooShortWarning(Exception)

Parameters:
  • name (str) – Name for file

  • tag (str) – Flag TAG to indicate kind of aligned-reads, TAGUNCOLL or TAGCOLL.

  • notag (bool) – Flag to indicate whether ‘tag’ needs an argument.

Raises:

FileTooShortWarning – Raised when stored file has no significant size; previous file writing was inadequate.

Returns:

A tuple of dicts, one for each strand, with data structures, one for each chromosome.

Return type:

dict, dict

coalispr.bedgraph_analyze.store.retrieve_merged_unselected()

Returns merged unselected data, organized per binned chromosome.

coalispr.bedgraph_analyze.store.get_inputtotals(kind=TAGUNCOLL)

Returns total input counts from saved files.

Parameters:

kind (str) – Type of reads, TAGUNCOLL or TAGCOLL.

coalispr.bedgraph_analyze.store.backup_pickle_to_tsv(data_only=False, merged_only=False)

Convert binary bedgraph-data to text; skip indexes-only files.

Returns:

Prints message upon completion of function

Return type:

None

coalispr.bedgraph_analyze.store.pickle_from_backup_tsv()

Restore binary bedgraph-data from text files. This will replace original STOREPICKLE contents.

Returns:

Prints message upon completion of function

Return type:

None

coalispr.bedgraph_analyze.store.rename_in_data_tsv(names_old_new)

Replace sample names in data files that were pickled.

Parameters:

names_old_new (list of tuples: [(old_name1, new_name1),) – (old_name2, new_name2),]

Return type:

boolean expressing completion.

coalispr.bedgraph_analyze.store.rename_unselected(names_old_new, tag=None)

Replace sample in file names

Parameters:
  • names_old_new (list of tuples: [(old_name1, new_name1),) – (old_name2, new_name2),]

  • tag (str) – Flag TAG to indicate kind of aligned-reads, TAGUNCOLL or TAGCOLL. Comes from counted bam files, which -for efficiency reasons- could be expected to be containing collapsed reads (TAGCOLL).

Return type:

boolean expressing completion.

coalispr.bedgraph_analyze.store.rename_in_count_tsv(names_old_new)

Replace sample names in count files.

Parameters:

names_old_new (list of tuples: [(old_name1, new_name1),) – (old_name2, new_name2),]

Return type:

boolean expressing completion.

coalispr.bedgraph_analyze.store.rename_in_expfile(names_old_new)

Replace sample names in experiment file EXPFILE.

Parameters:

names_old_new (list of tuples: [(old_name1, new_name1),) – (old_name2, new_name2),]

Return type:

boolean expressing completion.

coalispr.bedgraph_analyze.store.print_memory_usage_merged()

Show pandas memory usage of merged data frames.

Returns:

Floats describing (in MBs) memory usage of reference, TAGCOLL or TAGUNCOLL datasets in Pandas.

Return type:

floats