coalispr.resources.share.sub_gtf¶
Script to extract lines from general annotation file for a particular property, outputting a ‘sub’-gtf. Python alternative for below bash script.
#! /bin/bash
inputgz=$1
if [[ -z $inputgz ]]; then
echo "Please, provide compressed annotations file (.gtf.gz) as input"
exit
fi
# collect entries for common ncRNAs
gunzip -cf $inputgz | grep snRNA > tmp
gunzip -cf $inputgz | grep snoRNA >> tmp
gunzip -cf $inputgz | grep tRNA >> tmp
gunzip -cf $inputgz | grep rRNA >> tmp
sort -k 1.4h,1 -k 4n,4 -k 5nr,5 tmp > mouse_ncRNAs.gtf
rm tmp
Functions¶
|
Create a kind of GTF file by extracting features from reference gtf. |
|
Module Contents¶
- coalispr.resources.share.sub_gtf.create_gtf(kind, get_all, reference, features)¶
Create a kind of GTF file by extracting features from reference gtf.
- Parameters:
kind (str) – Kind of feature for which a GTF is made. Used as output name.
reference (str) – Filename for annotation reference
features (str) – List of features to extract annotations for, recoverable from string.
- Returns:
An annotation with the following fields:
seqname - The name of the sequence. Must be a chromosome or scaffold. source - The program that generated this feature. feature - The name of this type of feature. Some examples of standard feature types are "CDS", "start_codon", "stop_codon", and "exon". start - The starting position of the feature in the sequence. The first base is numbered 1. end - The ending position of the feature (inclusive). score - A score between 0 and 1000. If the track line useScore attribute is set to 1 for this annotation data set, the score value will determine the level of gray in which this feature is displayed (higher numbers = darker gray). If there is no score value, enter ".". strand - Valid entries include '+', '-', or '.' (for don't know/don't care). frame - If the feature is a coding exon, frame should be a number between 0-2 that represents the reading frame of the first base. If the feature is not a coding exon, the value should be '.'. comments - gene_id "Em:U62317.C22.6.mRNA"; transcript_id "Em:U62317.C22.6.mRNA"; exon_number 1
- Return type:
GTF file
- coalispr.resources.share.sub_gtf.main(args)¶