Investigating the Performance of Data Complexity &amp; Instance Hardness Measures as A Meta-Feature in Overlapping Classes Problem

Omaimah Al Hosni; Andrew Starkey

doi:10.1145/3616131.3616132

What is it about?

A common problem in meta-learning is establishing a (good) collection of meta-features that best represent the dataset properties‎[5]. In other words, the meta-learning recommendation's quality depends on the meta-features decision quality, and their ability to reflect the actual data challenges for the given dataset. Hence, the research question of this study is to what extent meta-features can describe the actual data difficulty without being affected by complex data challenges and thereby produced biased recommendation? According to literature, this question has not been given much attention but instead most of the works in this context focus on validating the meta-learning recommendation by evaluating the learning algorithms prediction performance (i.e., identifying correlations between meta-learning outputs and learning algorithm performance).

Photo by NASA on Unsplash

Why is it important?

To ensure the success of the meta-learning model since the learning algorithm selection decision is based on meta-feature.However, little attention has been paid to validating the meta-feature decisions in reflecting the actual data properties. In particular, if the meta-feature analysis is negatively affected by complex data characteristics, such as class overlap due to the distortion imposed by the noisy features at the decision boundary of the classes and thereby produces biased meta-learning recommendations that do not match the actual data characteristics (either by overestimating or underestimating the complexity).

Perspectives

Most of the works in this context focus on validating the meta-learning recommendation by evaluating the learning algorithms prediction performance (i.e., identifying correlations between meta-learning outputs and learning algorithm performance). From our study point of view, examining this correlation is not a good independent indicator to validate the complexity measure performance in estimating the actual data difficulty nor for showing the causes of the poor prediction of the learning algorithm’s performance, since the complex data characteristics might also affect measures' performance in not reflecting the actual data difficulty and thereby produced biased meta-learning recommendation. In addition to that, both perspectives (learning algorithm and the Measures) adapt different assumptions and thereby react differently based on their sensitivity to the different data challenges. Accordingly, relying only on the learning algorithm performance to validate the measure performance might produce misleading information. Thus, in this work, the analysis of learning algorithm performance will be omitted from this study.
Omaimah Al Hosni
University of Aberdeen

This page is a summary of: Investigating the Performance of Data Complexity & Instance Hardness Measures as A Meta-Feature in Overlapping Classes Problem, August 2023, ACM (Association for Computing Machinery),
DOI: 10.1145/3616131.3616132.
You can read the full text:

Read

Contributors

The following have contributed to this page

Omaimah Al Hosni
University of Aberdeen

Investigating the Performance of Data Complexity & Instance Hardness Measures as A Meta-Feature

What is it about?

Why is it important?

Perspectives

Contributors

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management

Investigating the Performance of Data Complexity & Instance Hardness Measures as A Meta-Feature

What is it about?

Featured Image

Why is it important?

Perspectives

Read the Original

Contributors

Share this page:

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management