Despite the emergence of RDF knowledge bases, exposed via SPARQL endpoints or as Linked Data, formats like CSV, JSON or XML are still the most used for exposing data on the web. Some solutions have been proposed to describe and integrate these resources using declarative mapping languages (e.g. RML, CSVW, KR2RML, etc) and many of those are equipped with associated RDF generators (e.g. RMLMapper, CSVW generator, etc). The use of these technologies enables the construction of knowledge graphs in a declarative way. However, they have a steep learning curve for new users. Our aim in this tutorial is, from a practical perspective, to explain in detail the process of constructing knowledge graphs, from writing mappings to their use with suitable tools. First, we describe the mapping structure and a tool to ease writing mappings, Mapeathor, showing the main guidelines for attendants to create their own mappings. Then, we present Morph-CSV, a framework for virtual knowledge graph access over tabular data. Finally, we present HELIO, a Linked Data publisher that provides unified access in real-time to multiple heterogeneous data sources.
The target audience for this tutorial is researchers and practitioners both from industry and academia who are interested in integrating heterogeneous data on the web using semantic technologies. The tools that will be presented during the tutorial will be available online. Knowledge about standard mapping languages (R2)RML is not required. We will give a brief summary of the practical use of these proposals, as they will be a key factor among all the solutions that will be presented in the tutorial.
Declarative mappings are commonly used for establishing the relationships between original data sources and a target ontology. There is a wide diversity of different languages that fulfil this purpose. The tool Mapeathor is designed to make it easier to write mappings in a language-independent way and later translate them into a language of the user's choice.
An example dataset and guidelines will be provided to the attendants to create the appropriate mappings in order to be used in the following sessions of the tutorial.
Morph-CSV is a tool for the enhancement of OBDA techniques to query tabular data. This engine exploits the information from typical inputs (i.e. query, mapping rules and annotations) for directly query tabular data including last ideas of the state of the art. Morph-CSV can be embedded it in the top of any SPARQL-to-SQL engine such as Morph-RDB or Ontop. The engine is part of the Morph suite.
Helio is a tool that publishes the knowledge graph as a service. It integrates heterogeneous datasets in real-time, providing an unified access and view of the gathered data. Helio is a mapping-based approach, which supports complex function expressions to clean data and the use of fuzzy linking rules to generate links among published instances. It is also aligned with standard mapping specifications such as RML.
David is a researcher and last year PhD student of the Ontology Engineering Group - Universidad Politécnica de Madrid since 2016, his work is currently focused on the generation of virtual knowledge graph from tabular data on the web using mapping languages. He is also an assistant lecturer in several Semantic Web courses, chair of the W3C Community Group Knowledge Graph Construction, part of the organising committees of the Knowledge Graph Building Workshop 2019 and the Workshop on Semantics for Transport 2019 and the main coordinator of the formative international program Open Summer of Code in Spain.
Andrea is a postdoctoral researcher at the Ontology Engineering Group - Universidad Politécnica since 2019. He holds a PhD in the Universidad of Sevilla focused on dynamic link discovery of RDF instances by applying Genetic Programming Algorithms. His main research interest the data integration by providing efficient access to heterogeneous services and resources, mainly in the IoT context. Also, he is interested in the SHACL shapes generation for RDF validation. Currently, he is teaching in a Business School Master in Madrid, in which he lectures the Computer Science subject. During the last years he has collaborated with some standardisation groups, such as OpenADR.
Oscar is a full professor at Universidad Politécnica de Madrid and one of the coordinators of the Ontology Engineering Group. He leads the Data Integration area of the group, where all the presenters come from. He has been working and teaching Data Integration, Knowledge Graphs and Linked Data since 2004.
Ana is a researcher and PhD student of the Ontology Engineering Group (Universidad Politécnica de Madrid) since 2019. Her work is mainly focused on declarative mapping construction and development, and data integration of diverse sources from the biological domain.