What is it about?

We learn different techniques and metrics for evaluating ML problems, but usually end up using accuracy as the evaluation metric most of the times. Every time using accuracy to know how well the model performs is not beneficial as there might be other factors that influence the model performance. One of the factors is a biased database. This paper presents a study on four different biasing of target class variables in train set: (50% majority class, 50% minority class), (60% majority class, 40% minority class), (70% majority class, 30% minority class), (80% majority class, 20% minority class), and focuses on choosing the right metrics out of accuracy, precision, recall, f1 score and AUC score for a binary classification problem with a skewed class distribution. The algorithms considered are Logistic Regression, K Nearest Neighbors, Naïve Bayes.

Featured Image

Read the Original

This page is a summary of: Choosing the best metrics for quantifying the quality of the model in skewed binary classification problems, January 2024, American Institute of Physics,
DOI: 10.1063/5.0177799.
You can read the full text:

Read

Contributors

The following have contributed to this page