What is it about?

In-line semantic annotations are structured representations of the knowledge available in a given text. Adding such information to textual excerpts requires the definition of the layout of semantic annotations. In our work, we define the layouts that have been used to structure and represent the semantic annotations of textual resources about the COVID-19 disease outbreak as available in the CORD-19 dataset. Then, we try to discuss these data models to define their strengths and weaknesses and to provide insights on how layouts of semantic annotations should be developed in the next few years.

Featured Image

Why is it important?

Data modeling for in-line semantic annotations is an important step towards ensuring more flexibility and easiness of the annotation of textual resources. As well, the data modeling layout will be a key component in developing a reliable final output of semantic annotations that can be used by machine learning algorithms to achieve higher accuracy rates. Solving matters in this context can be very crucial to significantly advancing many research fields in computer science including natural language processing, knowledge engineering, and information retrieval.

Perspectives

Thanks to this work, we identified several gaps in the guidelines for the semantic annotation of textual resources, particularly when multiple ways of annotating a statement are practically accurate. These limitations include text span granularity, the choice between active and passive voice, and the management of adverbs, negation, and adjectives. We believe that our finding is of major importance as it provides directions to computer scientists and linguists on how to solve major issues in annotation-based tasks such as named entity recognition, knowledge extraction, and topic modeling. Our work should consequently be the foundation of a large-scale set of guidelines for in-line semantic annotations that should be widely used for better practical outcomes in computer science research.

Houcemeddine Turki
Universite de Sfax

Read the Original

This page is a summary of: Data Models for Annotating Biomedical Scholarly Publications: the Case of CORD-19, April 2022, ACM (Association for Computing Machinery),
DOI: 10.1145/3487553.3524675.
You can read the full text:

Read

Resources

Contributors

The following have contributed to this page