Coalispr introduction

Coalispr (COunt ALIgned SPecified Reads) is a Python tool to clean up (small) RNA sequencing results. It can visualize over 100 bedgraphs in one panel [1] and helps to retrieve read counts from associated bam files without reliance on reference features (GTF annotations).


Features

  • Fast and voluminous
    Reduced resolution decreases memory-use and speeds up comparison.
    Handle a large number of samples simultaneously.
    Count specified aligned reads [2].
    Count collapsed instead of single reads.
  • Input files are bedgraph files
    Bedgraph-data are imported for processing with Pandas.
    Reads are collected by their mid-point into bins [4].
    A common index [3] enables comparison between all samples.
    Comparisons are done per chromosome [5] strand.
    Specific reads (S, M) are separated from unspecific reads (U)
    (by checking bin-overlap first, then, signal difference).
    For fast reuse, created datastructures are stored in a binary format.
  • Interactive visualization
    All bedgraphs for a chromosome are shown (with Matplotlib).
    Toggle data display.
    Load GTF files for overlap with annotated features.
    Signal scale: normal or log2.
    Include reference RNA sequencing data (R).
    Save snapshots as svg, jpg, pdf or png.
  • Counting
    Map contiguous regions of unspecific or specific reads.
    These segment definitions are stored in tsv files to:
    Retrieve specified reads from bam files with Pysam.
    Split segments into a number of bins to profile coverage.
    Collect counts for various read properties and save to tsv files.
    Obtain counts for particular chromosomal regions.
    Thus, counting relies on genome coordinates, not GTF references.
  • Analysis
    Count-outputs can be diagrammed (with Matplotlib and Seaborn).
    Compare numbers for reads, cDNAs, introns, multimappers.
    Check length-distributions of reads, also for a particular genomic region.
    Annotate count files with gene-information from GTF references.

For a rationale and application of Coalispr see the essay: ‘Bio‑informatics: Integrate negative controls to get the good data’.


Requirements

The numerical expression evaluator for NumPy, Numexpr, can help to get the most of your machine computing capabilities [7].


Installation

Coalispr is on Codeberg.org and Pypi.org from where it can be downloaded.

Configuration files with properties will have to be edited by the user to analyze their own data (see Tutorials). Therefore, this package is best installed locally in user space, not system wide. Alternatively, the program can be installed in a virtual environment [8].

After extraction of the source archive, go to the coalispr project folder with the setup.py and pyproject.toml files and run in a terminal (as user):

python3 -m pip install --editable .

This also makes it easy to adapt source code and directly test the changes.

A script, callable from the command line with coalispr, will be installed locally [9] (alternatively, you can run python3 -m coalispr instead of coalispr).

With installs of pandas-2.x please link coalispr/resources/numeric.py to python3/site_packages/pandas/core/indexes/ (see here).

Installation can be done in a virtual environment as described in INSTALL.txt


Run Coalispr

In a terminal run the following command-line, which shows the various options for Coalispr:

coalispr -h

See the How-to guides and the Tutorials for more information.


Contribute

All resources for Coalispr are accessible at  "Codeberg icon adapted by brobr,see https://codeberg.org/Codeberg/Community/issues/976"  Codeberg.org.


Documentation


Licences


Author

  • The author has been trained as a molecular biologist "Orcid icon plus link" and from that angle got involved with high-throughput analysis (see About).



Notes

# cd to folder with virtual environments
# create environment 1 (env1) with module venv
    bash-5.2$ python3 -m venv env1
# activate env1
    bash-5.2$ source env1/bin/activate
    (env1) bash-5.2$
# extract dist/package
    (env1) bash-5.2$ tar -xvzf /<path_to>/coalispr-$VERSION.tar.gz
    (env1) bash-5.2$ cd coalispr-$VERSION
# install with:
    (env1) bash-5.2$ python3 -m pip install --editable .
# add link when using pandas-2.x
    (env1) bash-5.2$ ln -s -r coalispr-$VERSION/coalispr/resources/numeric.py \
                    -t env1/lib/python3.11/site-packages/pandas/core/indexes/
# run:
    (env1) bash-5.2$ coalispr -h
# stop the virtual environment:
    (env1) bash-5.2$ deactivate
     bash-5.2$