What is it about?
There's a growing interest in building machine learning models to predict traits of interest in bacteria, the major trend being antibiotic resistance. A number of attempts have been published recently which show promising levels of accuracy. However, little work has been put into systematically figuring out how different features of the dataset used to train the algorithm may impact its accuracy, and how this accuracy may vary across different contexts. This has important implications if these algorithms are to be used in clinical practise. Here, we analysed several large datasets spanning three different species, testing three different machine learning methods to see how performance changes according to drug, dataset, resistance measure, accuracy measure and species.
Featured Image
Photo by Arvin Chingcuangco on Unsplash
Why is it important?
Our findings underscore the importance of incorporating relevant biological and epidemiological knowledge into model design and assessment and suggest that doing so can inform tailored modelling for individual drugs, pathogens, and clinical populations. We further suggest that continued comprehensive sampling and incorporation of up-to-date whole genome sequence data, resistance phenotypes, and treatment outcome data into model training will be crucial to the clinical utility and sustainability of machine learning-based molecular diagnostics.
Perspectives
Read the Original
This page is a summary of: Evaluation of parameters affecting performance and reliability of machine learning-based antibiotic susceptibility testing from whole genome sequencing data, April 2019, Cold Spring Harbor Laboratory Press,
DOI: 10.1101/607127.
You can read the full text:
Contributors
The following have contributed to this page