Hierarchical Bayesian data selection

Simon L. Cotter

doi:10.1145/3699721

What is it about?

When we use data to make predictions or understand how systems work, both the data and the models we use can be imperfect. Traditional methods for analyzing data often assume everything fits neatly into the model, but real-world data can be messy, with errors or parts that don’t fit the model well. This can lead to inaccurate conclusions. In this study, we propose a new approach called hierarchical Bayesian data selection. Instead of manually cleaning the data or ignoring problematic areas, our method automatically determines which parts of the data are reliable for making predictions. It does this by simultaneously estimating both the model’s parameters and which data points are trustworthy. This allows the model to focus only on regions where it works best. We applied our method to various test cases, such as linear regression and a model based on differential equations. Our approach was able to handle challenges like corrupted data and situations where different parts of the data required different model parameters. The method is easy to implement and can be used in a wide range of fields, from engineering to physical sciences, potentially reducing the time and effort needed to analyze complex data.

Why is it important?

This paper introduces a new method for automatically identifying the most reliable data for making accurate predictions, addressing the problem of messy or imperfect data. The approach improves the accuracy of model-based predictions, reduces manual data cleaning, and can be applied to a wide range of fields, making it a valuable tool for researchers and data scientists dealing with complex data.

This page is a summary of: Hierarchical Bayesian data selection, ACM Transactions on Probabilistic Machine Learning, October 2024, ACM (Association for Computing Machinery),
DOI: 10.1145/3699721.
You can read the full text:

Read

Contributors

The following have contributed to this page

Simon Cotter
University of Manchester

A Method for Automatically Choosing the Best Data for Accurate Predictions

What is it about?

Why is it important?

Contributors

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management

A Method for Automatically Choosing the Best Data for Accurate Predictions

What is it about?

Featured Image

Why is it important?

Read the Original

Contributors

Share this page:

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management