What is it about?

This paper introduces a method for automatically infer CSV file configuration at data import stage. The proposed method outperform state of the art solutions like CleverCSV in the used dataset. The method is robust enough to get very accurate results using a little data sample.

Featured Image

Why is it important?

Implementing this methodology, data ingesting pipelines can be near fully automated when importing CSV files requiring minimum human intervention. Solutions implementing this method can achieve high precision configuration detection in no time.

Perspectives

The CSV dialect detection is currently an open problem. This research proposed a method for automatically infer CSV file dialects with outstanding accuracy, capable of being even superior than CleverCSV (known as the state of the art in the topic). The methodology opens a new door allowing current CSV parsing tools to reach ultimate accuracy with little integration effort.

Ing. Wilfredo García
ECP Solutions

Read the Original

This page is a summary of: Detecting CSV file dialects by table uniformity measurement and data type inference, Data Science, July 2024, SAGE Publications,
DOI: 10.3233/ds-240062.
You can read the full text:

Read

Contributors

The following have contributed to this page