What is it about?
Documents may convey ideas with more than just text: e.g: font characteristics, geometric position & indention are also indicators for the function of words in a document. This paper aims to make the task of group editing of related words in a document easier, by exploiting language, visual and geometric similarities of words in a document. Words are clustered together according to these features, and optionally additional user constraints. Users may then perform quick group editing on these clusters. The method relies on an optimization, which is orchestrated by an unsupervised siamese network. No training set is assumed.
Featured Image
Photo by Patrick Tomasso on Unsplash
Why is it important?
Application wise, our work paves the way for edit propagation among textual entities in a document. For example - users may highlight, delete or indent a group of words at the same time, by quickly using our method to figure out which groups compose which words. Our work considers language, font and geometric characteristics at the same time, ensuring clusters represent meaningful word groups in a document. In the case that the automatic clustering is non-satisfactory, a user may add constraints and refine the process. Hopefully, conclusions from our work may also facilitate additional research on unsupervised deep learning involved with multimodal features of language, vision and geometry.
Perspectives
Read the Original
This page is a summary of: Learning Multimodal Affinities for Textual Editing in Images, ACM Transactions on Graphics, July 2021, ACM (Association for Computing Machinery),
DOI: 10.1145/3451340.
You can read the full text:
Contributors
The following have contributed to this page