Learning from Uncurated Regular Expressions for Semantic Type Classification

Michael J Mior

doi:10.1145/3596225.3596226

What is it about?

When trying to identify specific types of values within a text corpus, one common approach is to use regular expressions (regexes). However, this normally requires writing a regex for each type. Instead, we leverage a large corpus of regular expressions from a regex playground as features to a machine learning algorithm to automatically identify regexes that can be used to classify types.

Photo by UX Indonesia on Unsplash

This page is a summary of: Learning from Uncurated Regular Expressions for Semantic Type Classification, June 2023, ACM (Association for Computing Machinery),
DOI: 10.1145/3596225.3596226.
You can read the full text:

Read

Resources

Contributors

The following have contributed to this page

Assistant Professor Michael J. Mior
Rochester Institute of Technology

Large collections of user-generated regular expressions for text classification

What is it about?

Resources

Source code

arXiv paper

Contributors

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management

Large collections of user-generated regular expressions for text classification

What is it about?

Featured Image

Read the Original

Resources

Source code

arXiv paper

Contributors

Share this page:

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management