coalispr.resources.share.ncfasta2gtf¶

Script to convert home_made fasta to gtf.

From Cryptococcus_neoformans.ASM9104v1.ncrna_db.fa only use fasta header, get something like this:

14  RWvN    exon    347687  347851  .   -   .   gene_id "boxCDsnorna-116nt";
transcript_id "boxCDsnorna-116nt"; gene-source "contains box-CD snoRNA; as
CNAG_12022"; gene_biotype "ncRNA"; transcript_name "boxCDsnorna-116nt";
transcript_source ""; transcript_biotype "ncRNA"; exon_id "boxCDsnorna-116nt"

Attributes¶

exts

Classes¶

Gtfline

Class to deal with a line in a gtf file; all features are named 'exon'.

Functions¶

`fasta2gtf`(infile, gtfout)	Create GTF file from fasta-headers
`main`(args)

Module Contents¶

coalispr.resources.share.ncfasta2gtf.exts = ['.fa', '.fasta']¶

class coalispr.resources.share.ncfasta2gtf.Gtfline(chrno, source, seqstart, seqend, strand, seqid, gentype)¶

Class to deal with a line in a gtf file; all features are named ‘exon’.

chrno¶

Name of chromosome (seqid);

Type:: str

source¶

Origin of information for a feature.

Type:: str

seqstart¶

Start nucleotide featured region

Type:: int/str

seqend¶

End nucleotide featured region

Type:: int/str

strand¶

Strand with feature if applicable

Type:: str {‘+’,’-’, ‘.’}

seqid¶

Gene-id (also used as ‘transcript-id’)

Type:: str

gentype¶: Type of transcript: ‘exon’, ‘tRNA’, etc.

lineobjects = []¶

chrno¶

source¶

seqstart¶

seqend¶

strand¶

seqid¶

gentype¶

printme()¶: Print gtf-line

sort_and_remove_duplicates()¶: Sort gtf and remove duplicate-entries based on feature coordinates

equals(b)¶: Compare 2 Gtfline objects with respect to feature coordinates

skip_duplicates()¶: Skip duplicate-entries based on feature coordinates

coalispr.resources.share.ncfasta2gtf.fasta2gtf(infile, gtfout)¶

Create GTF file from fasta-headers

Parameters:

infile (str) – Path to fasta file
gtfout (str) – Path to gtf file (default: ‘’)

coalispr.resources.share.ncfasta2gtf.main(args)¶