What is it about?
Dataset Search systems are mainly based on metadata and ignore the contents, however, in tasks related to data integration and enrichment, the contents of datasets have to be considered. This is important for data integration but also for data enrichment, for instance, quite often datasets’ owners want to enrich the content of their dataset, by selecting datasets that provide complementary information for their dataset. We propose an approach relying on a) a set of pre-constructed (and periodically refreshed) semantics-aware indexes , and b) “lattice-based" incremental algorithms that exploit the posting lists of such indexes, as well as set theory properties, for enabling efficient responses at query time. We also discuss the efficiency of the proposed methods by presenting comparative results, and we report measurements for 400 real RDF datasets (containing over 2 billion triples), by exploiting the proposed metrics.
Featured Image
Photo by Markus Winkler on Unsplash
Why is it important?
For improving dataset Discoverability, Interlinking and Reusability.
Read the Original
This page is a summary of: Content-based Union and Complement Metrics for Dataset Search over RDF Knowledge Graphs, Journal of Data and Information Quality, April 2020, ACM (Association for Computing Machinery),
DOI: 10.1145/3372750.
You can read the full text:
Contributors
The following have contributed to this page