What is it about?
We propose a modification that corrects for split-improvement variable importance measures in random forests and other tree-based methods. These methods have been shown to be biased towards increasing the importance of features with more potential splits. We show that by appropriately incorporating split-improvement as measured on out of sample data, this bias can be corrected yielding better summaries and screening tools.
Featured Image
Photo by Kevin Ku on Unsplash
Why is it important?
Machine learning models have been ubiquitous in every day applications. Practitioners often reply on feature importance measurement to understand model behavior. However, a widely used feature importance metric for tree-based methods are inherently biased. In this paper we analyze this phenomenon and propose a simple yet effective correction.
Read the Original
This page is a summary of: Unbiased Measurement of Feature Importance in Tree-Based Methods, ACM Transactions on Knowledge Discovery from Data, April 2021, ACM (Association for Computing Machinery),
DOI: 10.1145/3429445.
You can read the full text:
Contributors
The following have contributed to this page