Unbiased Measurement of Feature Importance in Tree-Based Methods

Zhengze Zhou; Giles Hooker

doi:10.1145/3429445

What is it about?

We propose a modification that corrects for split-improvement variable importance measures in random forests and other tree-based methods. These methods have been shown to be biased towards increasing the importance of features with more potential splits. We show that by appropriately incorporating split-improvement as measured on out of sample data, this bias can be corrected yielding better summaries and screening tools.

Photo by Kevin Ku on Unsplash

Why is it important?

Machine learning models have been ubiquitous in every day applications. Practitioners often reply on feature importance measurement to understand model behavior. However, a widely used feature importance metric for tree-based methods are inherently biased. In this paper we analyze this phenomenon and propose a simple yet effective correction.

This page is a summary of: Unbiased Measurement of Feature Importance in Tree-Based Methods, ACM Transactions on Knowledge Discovery from Data, April 2021, ACM (Association for Computing Machinery),
DOI: 10.1145/3429445.
You can read the full text:

Read

Contributors

The following have contributed to this page

Zhengze Zhou
Cornell University

Loading...

Unbiased Measurement of Feature Importance in Tree-Based Methods

What is it about?

Why is it important?

Contributors

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management

Loading...

Unbiased Measurement of Feature Importance in Tree-Based Methods

What is it about?

Featured Image

Why is it important?

Read the Original

Contributors

Share this page:

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management