Open Access BASE2021

Toward Generic Abstractions for Data of Any Model

Abstract

International audience ; Digital data sharing leads to unprecedented opportunities to develop data-driven systems for supporting economic activities, the social and political life, and science. Many open-access datasets are RDF graphs, but others are CSV files, Neo4J property graphs, JSON or XML documents, etc. Potential users need to understand a dataset in order to decide if it is useful for their goal. While some datasets come with a schema and/or documentation, this is not always the case. Data summarization or schema inference tools have been proposed, specializing in XML, or JSON, or the RDF data models. In this work, we present a dataset abstraction approach, which () applies on relational, CSV, XML, JSON, RDF or Property Graph data; () computes an abstraction meant for humans (as opposed to a schema meant for a parser); () integrates Information Extraction data profiling, to also classify dataset content among a set of categories of interest to the user. Our abstractions are conceptually close to an Entity-Relationship diagram, if one allows nested and possibly heterogeneous structure within entities.

Problem melden

Wenn Sie Probleme mit dem Zugriff auf einen gefundenen Titel haben, können Sie sich über dieses Formular gern an uns wenden. Schreiben Sie uns hierüber auch gern, wenn Ihnen Fehler in der Titelanzeige aufgefallen sind.