THE CONTRIBUTION OF STOP CODON FREQUENCY AND PURINE BIAS TO THE CLASSIFICATION OF CODING SEQUENCES

N. CARELS; D. FRIAS

doi:10.1142/9789814520829_0018

What is it about?

We revisited the classification of coding sequences (CDS) based on nucleotide statistics using the Universal Feature Method (UFM). We show that the rules (i) G1>G2 (G1 and G2 are the guanine levels in 1st and 2nd position of contiguous DNA triplets, respectively) and (ii) T1<A2 (T1 and A2 are the thymine and adenine levels in 1st and 2nd position of contiguous DNA triplets, respectively) improve the success rate of CDS classification. The combination of G1>G2 and T1<A2 rules causes the decrease of the classification error due to the confusion between +1 and -1 or -2 frames without affecting significantly the detection rate. We also show how the information due to purine bias can be complemented by that of stop codon frequency to achieve high success rate together with low error rate.

Why is it important?

UFM provide a simple tool to gather necessary prior knowledge from transcriptome data for training of other investigative tools, such as Markov models or other machine learning processes that can be used for de novo genome annotation of eukaryote species. Alternatively, it could be used to extract coding information from samples of bulk metagenomic sequencing.

Perspectives

This method does not need any previous knowledge, which means that there is no theoretical impediment to the sequencing of any transcriptomes or metagenomic data without previous knowledge. The only limitation being the access financial means. With a MinION sequencer, a LandRover and a laptop, it would be possible to go in the wild for sequencing exome on the fly with 95% sensitivity and 95% success rate.
Nicolas Carels
Oswaldo Cruz Foundation

This page is a summary of: THE CONTRIBUTION OF STOP CODON FREQUENCY AND PURINE BIAS TO THE CLASSIFICATION OF CODING SEQUENCES, June 2013, World Scientific Pub Co Pte Lt,
DOI: 10.1142/9789814520829_0018.
You can read the full text:

Read

Resources

URL
Biomat 2012
Original report

Contributors

The following have contributed to this page

Nicolas Carels
Oswaldo Cruz Foundation

Contribution of stop codon frequency and purine bias to the classification of coding sequences

What is it about?

Why is it important?

Perspectives

Resources

Biomat 2012

Contributors

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management

Contribution of stop codon frequency and purine bias to the classification of coding sequences

What is it about?

Featured Image

Why is it important?

Perspectives

Read the Original

Resources

Biomat 2012

Contributors

Share this page:

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management