coalispr.resources.share.ncfasta2gtf

Script to convert home_made fasta to gtf.

From Cryptococcus_neoformans.ASM9104v1.ncrna_db.fa only use fasta header, get something like this:

14  RWvN    exon    347687  347851  .   -   .   gene_id "boxCDsnorna-116nt";
transcript_id "boxCDsnorna-116nt"; gene-source "contains box-CD snoRNA; as
CNAG_12022"; gene_biotype "ncRNA"; transcript_name "boxCDsnorna-116nt";
transcript_source ""; transcript_biotype "ncRNA"; exon_id "boxCDsnorna-116nt"

Attributes

Classes

Gtfline

Class to deal with a line in a gtf file; all features are named 'exon'.

Functions

fasta2gtf(infile, gtfout)

Create GTF file from fasta-headers

main(args)

Module Contents

coalispr.resources.share.ncfasta2gtf.exts = ['.fa', '.fasta']
class coalispr.resources.share.ncfasta2gtf.Gtfline(chrno, source, seqstart, seqend, strand, seqid, gentype)

Class to deal with a line in a gtf file; all features are named ‘exon’.

chrno

Name of chromosome (seqid);

Type:

str

source

Origin of information for a feature.

Type:

str

seqstart

Start nucleotide featured region

Type:

int/str

seqend

End nucleotide featured region

Type:

int/str

strand

Strand with feature if applicable

Type:

str {‘+’,’-’, ‘.’}

seqid

Gene-id (also used as ‘transcript-id’)

Type:

str

gentype

Type of transcript: ‘exon’, ‘tRNA’, etc.

lineobjects = []
chrno
source
seqstart
seqend
strand
seqid
gentype
printme()

Print gtf-line

sort_and_remove_duplicates()

Sort gtf and remove duplicate-entries based on feature coordinates

equals(b)

Compare 2 Gtfline objects with respect to feature coordinates

skip_duplicates()

Skip duplicate-entries based on feature coordinates

coalispr.resources.share.ncfasta2gtf.fasta2gtf(infile, gtfout)

Create GTF file from fasta-headers

Parameters:
  • infile (str) – Path to fasta file

  • gtfout (str) – Path to gtf file (default: ‘’)

coalispr.resources.share.ncfasta2gtf.main(args)