What is it about?
Until recently, most of our documentation, from receipts and financial records to healthcare documents, has been in the form of physical paper documents. A wealth of raw data and information is available in these documents, often in tables. Since tables are the most compact method of representing relational data, their data must be made available in an indexable and searchable format. Our work focuses on precisely recognizing tabular data in a scanned document and then extracting this into a standard format like CSV or Excel while preserving the table's structure.
Featured Image
Why is it important?
It is highly inefficient and costly to manually navigate through large numbers of document images to search for data about something specific. Moreover, the time manual labor required to identify the needed document is not feasible in large organizations with ever-growing data. Our paper proposes a cost-effective, time-feasible approach to save organizations time, money, and effort. It attempts to solve a real-life industry problem and create a meaningful impact using technology.
Perspectives
Read the Original
This page is a summary of: End-to-end table structure recognition and extraction in heterogeneous documents, Applied Soft Computing, May 2022, Elsevier,
DOI: 10.1016/j.asoc.2022.108942.
You can read the full text:
Contributors
The following have contributed to this page