coalispr.bedgraph_analyze.store

Module for dealing with file storage and retrieval to and from BNY.

to do: Change saving format to TSV using command parameter.

Attributes

Functions

config_from_newdirs(exp, path)

Create storage folders during the initialization step coalispr init.

get_unselected_folderpath()

Return path to folder with written bam files for unselected reads that

store_chromosome_data(name, plusdata, minusdata, tag)

Binarize binned bedgraph dataframes for easy access.

store_processed_chrindexes(name, chrnam, plusdata, ...)

Binarize chromosomal indexfile tuples for easy access.

save_average_table(df, name, kind, samples)

Save averaged count tables with given keywords in the folder/filename.

retrieve_merged([tag])

Retrieve the merged experimental data.

retrieve_processed_files(name, tag[, notag])

Retrieve experimental data from binary files.

retrieve_merged_unselected()

Returns merged unselected data, organized per binned chromosome.

get_inputtotals([kind])

Returns total input counts for 'kind' of sequence from saved files.

has_been_run(options[, backup])

Check folders for possibility to contine with options.

backup_binary_to_tsv([data_only, merged_only])

Convert binary bedgraph-data to text; skip indexes-only files.

binary_from_backup_tsv()

Restore binary bedgraph-data from text files. This will replace

rename_in_data_tsv(names_old_new)

Replace sample names in data files.

rename_unselected(names_old_new)

Replace sample in file names

rename_in_count_tsv(names_old_new)

Replace sample names in count files.

rename_in_expfile(names_old_new)

Replace sample names in experiment file EXPFILE.

print_memory_usage_merged()

Show pandas memory usage of merged data frames.

Module Contents

coalispr.bedgraph_analyze.store.logger
coalispr.bedgraph_analyze.store.config_from_newdirs(exp, path)

Create storage folders during the initialization step coalispr init.

Returns:

Paths to storage folders linked to new experiment EXP.

Return type:

Path

coalispr.bedgraph_analyze.store.get_unselected_folderpath()

Return path to folder with written bam files for unselected reads that have been retrieved during counting of UNSPECIFIC reads.

coalispr.bedgraph_analyze.store.store_chromosome_data(name, plusdata, minusdata, tag, notag=False, suffix=BNY, otherpath=None)

Binarize binned bedgraph dataframes for easy access.

Parameters:
  • name (str) – Name for file to be stored.

  • plusdata (object) – Data for plus strand.

  • minusdata (object) – Data for minus strand.

  • tag (str) – Flag TAG to indicate kind of aligned-reads, TAGUNCOLL or TAGCOLL.

  • notag (bool) – Flag to indicate whether ‘tag’ needs an argument.

  • suffix (str) – Suffix indicating file format and used to select function name for storing dataframes by pandas.

  • otherpath (Path) – Path to storage location if different from default _get_store_path().

Returns:

Prints message upon completion of function

Return type:

None

coalispr.bedgraph_analyze.store.store_processed_chrindexes(name, chrnam, plusdata, minusdata, tag)

Binarize chromosomal indexfile tuples for easy access.

Parameters:
  • name (str) – Name for file to be stored.

  • chrnam (str) – Name of chromosome for which data are stored.

  • plusdata (object) – Data for plus strand of chromosome chrnam.

  • minusdata (object) – Data for minus strand of chromosome chrnam.

  • tag (str) – Flag to indicate kind of aligned-reads, TAGUNCOLL or TAGCOLL.

Returns:

Prints message upon completion of function.

Return type:

None

coalispr.bedgraph_analyze.store.save_average_table(df, name, kind, samples)

Save averaged count tables with given keywords in the folder/filename.

Parameters:
  • name (str) – Name of filename for output table to save; is equal to figure name.

  • kind (str (default: SPECIFIC)) – What type of counted reads to use, i.e. SPECIFIC or UNSPECIFIC.

  • samples (list) – List of library samples used for averaging dataframe.

coalispr.bedgraph_analyze.store.retrieve_merged(tag=TAG)

Retrieve the merged experimental data.

Parameters:

tag (str) – Flag TAG to indicate kind of aligned-reads, TAGUNCOLL or TAGCOLL.

Returns:

A tuple of dicts, one for each strand, with pandas dataframes, one for each chromosome, with columns of bedgraph values summed per BINSET for each sample.

Return type:

dict of pandas.DataFrames, dict of pandas.DataFrames

coalispr.bedgraph_analyze.store.retrieve_processed_files(name, tag, notag=False)

Retrieve experimental data from binary files.

Defines internal class FileTooShortWarning(Exception)

Parameters:
  • name (str) – Name for file

  • tag (str) – Flag TAG to indicate kind of aligned-reads, TAGUNCOLL or TAGCOLL.

  • notag (bool) – Flag to indicate whether ‘tag’ needs an argument.

Raises:

FileTooShortWarning – Raised when stored file has no significant size; previous file writing was inadequate.

Returns:

A tuple of dicts, one for each strand, with data structures, one for each chromosome.

Return type:

dict, dict

coalispr.bedgraph_analyze.store.retrieve_merged_unselected()

Returns merged unselected data, organized per binned chromosome.

coalispr.bedgraph_analyze.store.get_inputtotals(kind=TAGUNCOLL)

Returns total input counts for ‘kind’ of sequence from saved files.

Parameters:

kind (str) – Type of reads, TAGUNCOLL (all reads) or TAGCOLL (unique cDNAs).

coalispr.bedgraph_analyze.store.has_been_run(options, backup=False)

Check folders for possibility to contine with options.

Parameters:

options (list) – List of keys to find folders to check

coalispr.bedgraph_analyze.store.backup_binary_to_tsv(data_only=False, merged_only=False)

Convert binary bedgraph-data to text; skip indexes-only files.

Returns:

Prints message upon completion of function

Return type:

None

coalispr.bedgraph_analyze.store.binary_from_backup_tsv()

Restore binary bedgraph-data from text files. This will replace original STOREBINARY contents.

Returns:

Prints message upon completion of function

Return type:

None

coalispr.bedgraph_analyze.store.rename_in_data_tsv(names_old_new)

Replace sample names in data files.

Parameters:

names_old_new (list of tuples: [(old_name1, new_name1),) – (old_name2, new_name2),]

Return type:

boolean expressing completion.

coalispr.bedgraph_analyze.store.rename_unselected(names_old_new)

Replace sample in file names

Parameters:

names_old_new (list of tuples: [(old_name1, new_name1),) – (old_name2, new_name2),]

Return type:

boolean expressing completion.

coalispr.bedgraph_analyze.store.rename_in_count_tsv(names_old_new)

Replace sample names in count files.

Parameters:

names_old_new (list of tuples: [(old_name1, new_name1),) – (old_name2, new_name2),]

Returns:

Flag expressing completion.

Return type:

boolean

coalispr.bedgraph_analyze.store.rename_in_expfile(names_old_new)

Replace sample names in experiment file EXPFILE.

Parameters:

names_old_new (list of tuples: [(old_name1, new_name1),) – (old_name2, new_name2),]

Return type:

boolean expressing completion.

coalispr.bedgraph_analyze.store.print_memory_usage_merged()

Show pandas memory usage of merged data frames.

Returns:

Floats describing (in MBs) memory usage of reference, TAGCOLL or TAGUNCOLL datasets in Pandas.

Return type:

floats