Text Classification for Records Management

Jason Franks

doi:10.1145/3485846

What is it about?

Manual, library-style techniques for classifying digital records do not scale in this era of Big Data. Legislation offering users better rights over their data (such as GDPR) has only increased the compliance needs for organizations in charge of records data. This paper assesses a broad variety of algorithms for their skill at classifying real-world records based on their text content. A focus group of records professionals then discussed the findings and their usefulness and trustworthiness for their day-to-day work.

Photo by Kevin Ku on Unsplash

Why is it important?

This is perhaps the first study to compare the classification performance of statistical models, a variety of neural network architectures, and pre-trained language models on real-world records data. The newer technologies are more skilled but the older algorithms are still competitive, and are cheaper to run at scale. A focus group of records professionals are optimistic about the adoption of these technologies into their workplaces and see it as a first step in a journey be able to synthesize meaningful narrative out of a vast body of records.

Perspectives

People think records are dull reams of old paper, but they are much more than that. Records can be any form of data and they contain the whole of human knowledge. I hope this study will help to show the potential of machine learning systems to unlock insights from the ever-greater volumes of records that we generate as we navigate the challenges of the 21st century.
Jason Franks
Monash University

This page is a summary of: Text Classification for Records Management, Journal on Computing and Cultural Heritage, September 2022, ACM (Association for Computing Machinery),
DOI: 10.1145/3485846.
You can read the full text:

Read

Resources

Other
Source code
Source code and evaluation data for the experiments run in this paper.

Contributors

The following have contributed to this page

Jason Franks
Monash University

Using Machine learning to classify records

What is it about?

Why is it important?

Perspectives

Resources

Source code

Contributors

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management

Using Machine learning to classify records

What is it about?

Featured Image

Why is it important?

Perspectives

Read the Original

Resources

Source code

Contributors

Share this page:

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management