coalispr.resources.share.ncfasta2gtf¶
Script to convert home_made fasta to gtf.
From Cryptococcus_neoformans.ASM9104v1.ncrna_db.fa only use fasta header, get something like this:
14 RWvN exon 347687 347851 . - . gene_id "boxCDsnorna-116nt";
transcript_id "boxCDsnorna-116nt"; gene-source "contains box-CD snoRNA; as
CNAG_12022"; gene_biotype "ncRNA"; transcript_name "boxCDsnorna-116nt";
transcript_source ""; transcript_biotype "ncRNA"; exon_id "boxCDsnorna-116nt"
Attributes¶
Classes¶
Class to deal with a line in a gtf file; all features are named 'exon'. |
Functions¶
Module Contents¶
- coalispr.resources.share.ncfasta2gtf.exts = ['.fa', '.fasta']¶
- class coalispr.resources.share.ncfasta2gtf.Gtfline(chrno, source, seqstart, seqend, strand, seqid, gentype)¶
Class to deal with a line in a gtf file; all features are named ‘exon’.
- chrno¶
Name of chromosome (seqid);
- Type:
str
- source¶
Origin of information for a feature.
- Type:
str
- seqstart¶
Start nucleotide featured region
- Type:
int/str
- seqend¶
End nucleotide featured region
- Type:
int/str
- strand¶
Strand with feature if applicable
- Type:
str {‘+’,’-’, ‘.’}
- seqid¶
Gene-id (also used as ‘transcript-id’)
- Type:
str
- gentype¶
Type of transcript: ‘exon’, ‘tRNA’, etc.
- lineobjects = []¶
- chrno¶
- source¶
- seqstart¶
- seqend¶
- strand¶
- seqid¶
- gentype¶
- printme()¶
Print gtf-line
- sort_and_remove_duplicates()¶
Sort gtf and remove duplicate-entries based on feature coordinates
- equals(b)¶
Compare 2 Gtfline objects with respect to feature coordinates
- skip_duplicates()¶
Skip duplicate-entries based on feature coordinates
- coalispr.resources.share.ncfasta2gtf.fasta2gtf(infile, gtfout)¶
Create GTF file from fasta-headers
- Parameters:
infile (str) – Path to fasta file
gtfout (str) – Path to gtf file (default: ‘’)
- coalispr.resources.share.ncfasta2gtf.main(args)¶