What is it about?
This study presents a new method for accurately identifying and classifying archaea, single-celled organisms found in a variety of habitats. The classification of these organisms is challenging because most have not been isolated in a laboratory and are only found in environmental samples by their gene sequences. This paper proposes a simple and highly accurate classification method for sequence samples using feature-based classification. The features used are the compressibility of a genomic sequence, its GC-content, and sequence length. Overall, the method achieved high accuracy for classification at different taxonomic levels. For example, the Phylum classification task achieved 96% accuracy, whereas 91% accuracy was achieved in the genus identification task of archaea in a pool of 55 genera. This method offers a fast and accurate solution for archaea identification and classification, which could have important implications for the medical, forensic, and exobiology fields.
Featured Image
Photo by CDC on Unsplash
Why is it important?
The exponential growth of Metagenomics analysis has impacted many fields such as healthcare, pharmacology and biotechnology. However, with the current methodologies (reference-based), it is sometimes difficult to obtain conclusive identification of an organism. Our method is fast, highly accurate and does not depend on a reference sequence. Moreover, the results are promising for metagenomics, especially archaea, since most identifications can only be obtained from environmental samples. Finally, the work is entirely reproducible and replicated.
Perspectives
Read the Original
This page is a summary of: Feature-Based Classification of Archaeal Sequences Using Compression-Based Methods, January 2022, Springer Science + Business Media,
DOI: 10.1007/978-3-031-04881-4_25.
You can read the full text:
Resources
Contributors
The following have contributed to this page