What is it about?

This article presents a dataset consisting of 4,293 historical postcards from the Grand Est region of France, dating from 1899 to 1930. This set includes annotations for the text regions - classified as print, handwritten, and scene text - as well as manual transcriptions of the printed text for a subset of the postcards. Based on this set, we carry out in-depth benchmarking of open-source OCR models, such as EasyOCR, Tesseract OCR, docTR, PaddleOCR and Calamari, to assess their performances without fine-tuning. Our results highlight the challenges of different fonts, orientations, and image quality, with EasyOCR standing out for its accuracy in text recognition, while Tesseract OCR excels in orientation detection. The best models are then used to complete the dataset, automatically transcribing the printed text of all postcards.

Featured Image

Why is it important?

Postcards are invaluable sources of historical information, but they are not structured. Furthermore, traditional algorithms, trained on contemporary sources, struggle to extract information from older documents. This paper provides a valuable resource for the analysis of historical postcards and lays the foundations for future advances in OCR adapted to historical postcards.

Perspectives

We intend to significantly expand this collection, incorporating a wider range of postcards. Furthermore, we plan to incorporate Named Entity Recognition, and keyword identification for printed texts. Then, we’ll add detailed annotations for other elements, including date stamps through segmentation/binarization and transcriptions, alongside with both handwritten and scene text transcription.

Matthieu PELINGRE
Universite de Lorraine

Read the Original

This page is a summary of: Benchmarking OCR Tools for Historical Postcards: A Dataset and Evaluation, October 2025, ACM (Association for Computing Machinery),
DOI: 10.1145/3746273.3760201.
You can read the full text:

Read

Resources

Contributors

The following have contributed to this page