What is it about?

There's a growing interest in building machine learning models to predict traits of interest in bacteria, the major trend being antibiotic resistance. A number of attempts have been published recently which show promising levels of accuracy. However, little work has been put into systematically figuring out how different features of the dataset used to train the algorithm may impact its accuracy, and how this accuracy may vary across different contexts. This has important implications if these algorithms are to be used in clinical practise. Here, we analysed several large datasets spanning three different species, testing three different machine learning methods to see how performance changes according to drug, dataset, resistance measure, accuracy measure and species.

Featured Image

Why is it important?

Our findings underscore the importance of incorporating relevant biological and epidemiological knowledge into model design and assessment and suggest that doing so can inform tailored modelling for individual drugs, pathogens, and clinical populations. We further suggest that continued comprehensive sampling and incorporation of up-to-date whole genome sequence data, resistance phenotypes, and treatment outcome data into model training will be crucial to the clinical utility and sustainability of machine learning-based molecular diagnostics.

Perspectives

Choosing the right dataset and accuracy measures for machine learning algorithms are critical for making sure that the algorithm not only looks good on paper, but also performs well in the "real world". This work is a step toward defining better practises for training machine learning models for clinical use.

Dr Nicole E Wheeler
Wellcome Sanger Institute

Read the Original

This page is a summary of: Evaluation of parameters affecting performance and reliability of machine learning-based antibiotic susceptibility testing from whole genome sequencing data, April 2019, Cold Spring Harbor Laboratory Press,
DOI: 10.1101/607127.
You can read the full text:

Read

Contributors

The following have contributed to this page