What is it about?
Manual, library-style techniques for classifying digital records do not scale in this era of Big Data. Legislation offering users better rights over their data (such as GDPR) has only increased the compliance needs for organizations in charge of records data. This paper assesses a broad variety of algorithms for their skill at classifying real-world records based on their text content. A focus group of records professionals then discussed the findings and their usefulness and trustworthiness for their day-to-day work.
Featured Image
Photo by Kevin Ku on Unsplash
Why is it important?
This is perhaps the first study to compare the classification performance of statistical models, a variety of neural network architectures, and pre-trained language models on real-world records data. The newer technologies are more skilled but the older algorithms are still competitive, and are cheaper to run at scale. A focus group of records professionals are optimistic about the adoption of these technologies into their workplaces and see it as a first step in a journey be able to synthesize meaningful narrative out of a vast body of records.
Perspectives
Read the Original
This page is a summary of: Text Classification for Records Management, Journal on Computing and Cultural Heritage, September 2022, ACM (Association for Computing Machinery),
DOI: 10.1145/3485846.
You can read the full text:
Resources
Contributors
The following have contributed to this page