What is it about?
The article discusses a way to check if two pieces of data, called identity links, truly refer to the same thing in the real world. Many datasets today connect pieces of data using these identity links, often using a method called owl. These links are supposed to declare that two different pieces of information are actually about the same entity. However, not all of these links are accurate because they are often created by automated processes rather than verified by experts. The problem is that different datasets may describe the same entities in different ways, using various terms and sometimes incomplete information. This makes it challenging to determine if an identity link is correct or not. The paper proposes a method to invalidate incorrect identity links by examining the differences and detecting outliers within groups of identity links. This framework aims to improve the reliability of data connections without relying on a central authority or expert review.
Featured Image
Photo by Towfiqu barbhuiya on Unsplash
Why is it important?
It's important to ensure the accuracy of identity links in datasets for several reasons: (1) Data Integrity: Accurate identity links ensure that the information in different datasets correctly refers to the same real-world entities. This maintains the integrity and reliability of the data. (2) Improved Decision Making: High-quality, reliable data is crucial for making informed decisions. If identity links are incorrect, decisions based on that data could be flawed. (3) Enhanced Data Integration: Many systems and applications rely on the integration of various datasets. Accurate identity links enable seamless integration, allowing for more comprehensive and useful datasets. (4) Resource Efficiency: Automating the validation of identity links reduces the need for manual review by experts, saving time and resources. (5) Trust and Credibility: Ensuring the accuracy of identity links enhances the trust and credibility of the datasets and the systems that use them. (6) Reusability of Data: Correct identity links facilitate the reuse of data across different applications and systems, promoting innovation and the development of new solutions.
Perspectives
Read the Original
This page is a summary of: Dissimilarity-based approach for Identity Link Invalidation, September 2020, Institute of Electrical & Electronics Engineers (IEEE),
DOI: 10.1109/wetice49692.2020.00056.
You can read the full text:
Contributors
The following have contributed to this page