What is it about?

The article discusses a way to check if two pieces of data, called identity links, truly refer to the same thing in the real world. Many datasets today connect pieces of data using these identity links, often using a method called owl. These links are supposed to declare that two different pieces of information are actually about the same entity. However, not all of these links are accurate because they are often created by automated processes rather than verified by experts. The problem is that different datasets may describe the same entities in different ways, using various terms and sometimes incomplete information. This makes it challenging to determine if an identity link is correct or not. The paper proposes a method to invalidate incorrect identity links by examining the differences and detecting outliers within groups of identity links. This framework aims to improve the reliability of data connections without relying on a central authority or expert review.

Featured Image

Why is it important?

It's important to ensure the accuracy of identity links in datasets for several reasons: (1) Data Integrity: Accurate identity links ensure that the information in different datasets correctly refers to the same real-world entities. This maintains the integrity and reliability of the data. (2) Improved Decision Making: High-quality, reliable data is crucial for making informed decisions. If identity links are incorrect, decisions based on that data could be flawed. (3) Enhanced Data Integration: Many systems and applications rely on the integration of various datasets. Accurate identity links enable seamless integration, allowing for more comprehensive and useful datasets. (4) Resource Efficiency: Automating the validation of identity links reduces the need for manual review by experts, saving time and resources. (5) Trust and Credibility: Ensuring the accuracy of identity links enhances the trust and credibility of the datasets and the systems that use them. (6) Reusability of Data: Correct identity links facilitate the reuse of data across different applications and systems, promoting innovation and the development of new solutions.

Perspectives

From my perspective, ensuring the accuracy of identity links in datasets is fundamental for several key reasons: (1) Foundation of Knowledge: In our increasingly data-driven world, accurate identity links form the foundation of reliable knowledge. When datasets correctly link information about the same entities, we can build a more accurate and cohesive understanding of the world. (2) Interconnected World: As more systems and services rely on interconnected data, the importance of precise identity links grows. For instance, in healthcare, accurate links between patient records from different providers can mean the difference between life and death. In research, correctly linked datasets can lead to groundbreaking discoveries. (3) Data Economy: The value of data lies in its quality and reliability. Businesses and organizations that depend on data for operations, marketing, and strategic decisions need to trust that their data is accurate. Mistakes in identity links can lead to poor decisions, financial losses, and damaged reputations. (4) User Experience: For end-users, whether they are consumers using a product or researchers working on a project, the seamless integration of data enhances usability and experience. Accurate identity links ensure that users can trust the information they interact with, leading to better engagement and outcomes. (5) Ethical Implications: Incorrect identity links can have serious ethical implications. For example, in social services, mislinked data can result in individuals not receiving the help they need. Ensuring accuracy respects the dignity and rights of individuals represented in the data. In essence, accurate identity links are vital not just for technical reasons but for their broader impact on society, business, and individual lives. They enable a more informed, efficient, and ethical use of data, which is crucial as we navigate an era where data increasingly drives our decisions and actions.

Dr. HDR. Frederic ANDRES, IEEE Senior Member, IEEE CertifAIEd Authorized Lead Assessor (Affective Computing)
National Institute of Informatics

Read the Original

This page is a summary of: Dissimilarity-based approach for Identity Link Invalidation, September 2020, Institute of Electrical & Electronics Engineers (IEEE),
DOI: 10.1109/wetice49692.2020.00056.
You can read the full text:

Read

Contributors

The following have contributed to this page