Agreement and Disagreement between True and False-Positive Metrics in Recommender Systems Evaluation

Elisa Mena-Maldonado; Rocío Cañamares; Pablo Castells; Yongli Ren; Mark Sanderson

doi:10.1145/3397271.3401096

What is it about?

We discover a surprising degree of systematic disagreement that was occasionally noted but not explained in the literature by previous authors. We find an explanation for the discrepancy between the metrics in the effect of popularity biases, which impact false and true-positive metrics in very different ways: instead of rewarding the recommendation of popular items, as with true-positive, false-positive metrics penalize the popular.

Photo by Paweł Czerwiński on Unsplash

Why is it important?

Psychological studies have reported that bad experiences usually outperform good ones in the overall assessment. Recommender systems evaluation has been mostly focused on measuring good experiences, hence counting true-positives. The idea, in this case, is to recommend as many good items as possible. However, if bad experiences are known to cause a more substantial impact, the recommendation task could also be considered to avoid recommending as many bad items as possible. One may think that true and false positive metrics are complementary, but we found that in Recommender System evaluation, this is not always the case. Considering this last idea, metrics that measure bad experiences can provide a broader perspective on evaluation. In our study, we found the main reason for disagreement is related to a strong popularity bias in the data for training and testing algorithms. Moreover, we determine under which circumstances true or false-positive metrics should be used for offline evaluation.

Perspectives

This work is my first simultaneous collaboration with researchers from several countries. It was a challenging experience but very rewarding in the sense of being able to share ideas and learn from such talented people.
Elisa Mena-Maldonado
RMIT University

This page is a summary of: Agreement and Disagreement between True and False-Positive Metrics in Recommender Systems Evaluation, July 2020, ACM (Association for Computing Machinery),
DOI: 10.1145/3397271.3401096.
You can read the full text:

Read

Resources

Project
GitHub code
Source code used for the results reported in the SIGIR2020 paper

Contributors

The following have contributed to this page

Metrics Disagreement in Recommender Systems Evaluation

What is it about?

Why is it important?

Perspectives

Resources

GitHub code

Contributors

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management

Metrics Disagreement in Recommender Systems Evaluation

What is it about?

Featured Image

Why is it important?

Perspectives

Read the Original

Resources

GitHub code

Contributors

Share this page:

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management