What is it about?
We learn different techniques and metrics for evaluating ML problems, but usually end up using accuracy as the evaluation metric most of the times. Every time using accuracy to know how well the model performs is not beneficial as there might be other factors that influence the model performance. One of the factors is a biased database. This paper presents a study on four different biasing of target class variables in train set: (50% majority class, 50% minority class), (60% majority class, 40% minority class), (70% majority class, 30% minority class), (80% majority class, 20% minority class), and focuses on choosing the right metrics out of accuracy, precision, recall, f1 score and AUC score for a binary classification problem with a skewed class distribution. The algorithms considered are Logistic Regression, K Nearest Neighbors, Naïve Bayes.
Featured Image
Photo by Immo Wegmann on Unsplash
Read the Original
This page is a summary of: Choosing the best metrics for quantifying the quality of the model in skewed binary classification problems, January 2024, American Institute of Physics,
DOI: 10.1063/5.0177799.
You can read the full text:
Contributors
The following have contributed to this page