What is it about?
The mutual information theory was used for the certification of annotated coding sequences of rice from both GenBank and TIGR databases. Considering coding sequences larger than 600 bp, we successfully screened out genes with aberrant compositional features. We found that they represent about 10% of both datasets after cleaning for gene redundancy. Most of the rejected accessions showed a different trend in GC3% vs GC2% plot compared to the set of accessions that have been published in international journals.
Featured Image
Why is it important?
These results were used to argue the contamination of coding sequence samples in public databases with spurious non-coding sequences as a bias of pattern recognition algorithms introduced by gene prediction softwares.
Perspectives
Read the Original
This page is a summary of: The mutual information theory for the certification of rice coding sequences, FEBS Letters, May 2004, Wiley,
DOI: 10.1016/j.febslet.2004.05.026.
You can read the full text:
Resources
Contributors
The following have contributed to this page