Data Models for Annotating Biomedical Scholarly Publications: the Case of CORD-19

Houcemeddine Turki; Mohamed Ali Hadj Taieb; Alejandro Piad-Morffis; Mohamed Ben Aouicha; René Fabrice Bile

doi:10.1145/3487553.3524675

What is it about?

In-line semantic annotations are structured representations of the knowledge available in a given text. Adding such information to textual excerpts requires the definition of the layout of semantic annotations. In our work, we define the layouts that have been used to structure and represent the semantic annotations of textual resources about the COVID-19 disease outbreak as available in the CORD-19 dataset. Then, we try to discuss these data models to define their strengths and weaknesses and to provide insights on how layouts of semantic annotations should be developed in the next few years.

Photo by mauRÍCIO SANTOS on Unsplash

Why is it important?

Data modeling for in-line semantic annotations is an important step towards ensuring more flexibility and easiness of the annotation of textual resources. As well, the data modeling layout will be a key component in developing a reliable final output of semantic annotations that can be used by machine learning algorithms to achieve higher accuracy rates. Solving matters in this context can be very crucial to significantly advancing many research fields in computer science including natural language processing, knowledge engineering, and information retrieval.

Perspectives

Thanks to this work, we identified several gaps in the guidelines for the semantic annotation of textual resources, particularly when multiple ways of annotating a statement are practically accurate. These limitations include text span granularity, the choice between active and passive voice, and the management of adverbs, negation, and adjectives. We believe that our finding is of major importance as it provides directions to computer scientists and linguists on how to solve major issues in annotation-based tasks such as named entity recognition, knowledge extraction, and topic modeling. Our work should consequently be the foundation of a large-scale set of guidelines for in-line semantic annotations that should be widely used for better practical outcomes in computer science research.
Houcemeddine Turki
Universite de Sfax

This page is a summary of: Data Models for Annotating Biomedical Scholarly Publications: the Case of CORD-19, April 2022, ACM (Association for Computing Machinery),
DOI: 10.1145/3487553.3524675.
You can read the full text:

Read

Resources

Presentation
Presentation Slides
These slides have been used to present the work during the Sci-K Workshop at the 2022 ACM Web Conference. They represent the findings of our research work in a user-friendly and visually enhanced format.

Contributors

The following have contributed to this page

Data models for annotating biomedical scholarly publications: the case of CORD-19

What is it about?

Why is it important?

Perspectives

Resources

Presentation Slides

Contributors

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management

Data models for annotating biomedical scholarly publications: the case of CORD-19

What is it about?

Featured Image

Why is it important?

Perspectives

Read the Original

Resources

Presentation Slides

Contributors

Share this page:

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management