KGC2020

Knowledge Graph Construction Using Declarative Mapping Rules

Despite the emergence of RDF knowledge bases, exposed via SPARQL endpoints or as Linked Data, formats like CSV, JSON or XML are still the most used for exposing data on the web. Some solutions have been proposed to describe and integrate these resources using declarative mapping languages (e.g. RML, CSVW, KR2RML, etc) and many of those are equipped with associated RDF generators (e.g. RMLMapper, CSVW generator, etc). The use of these technologies enables the construction of knowledge graphs in a declarative way. However, they have a steep learning curve for new users. Our aim in this tutorial is, from a practical perspective, to explain in detail the process of constructing knowledge graphs, from writing mappings to their use with suitable tools. First, we describe the mapping structure and a tool to ease writing mappings, Mapeathor, showing the main guidelines for attendants to create their own mappings. Then, we present Morph-CSV, a framework for virtual knowledge graph access over tabular data. Finally, we present HELIO, a Linked Data publisher that provides unified access in real-time to multiple heterogeneous data sources.


Audience

The target audience for this tutorial is researchers and practitioners both from industry and academia who are interested in integrating heterogeneous data on the web using semantic technologies. The tools that will be presented during the tutorial will be available online. Knowledge about standard mapping languages (R2)RML is not required. We will give a brief summary of the practical use of these proposals, as they will be a key factor among all the solutions that will be presented in the tutorial.


Program

Getting started with mapping creation

Part I: Knowledge graph and declarative mapping introduction

Declarative mappings are commonly used for establishing the relationships between original data sources and a target ontology. There is a wide diversity of different languages that fulfil this purpose. The tool Mapeathor is designed to make it easier to write mappings in a language-independent way and later translate them into a language of the user's choice.

Creating declarative mappings

Part II: Mapping creation for knowledge graph construction

An example dataset and guidelines will be provided to the attendants to create the appropriate mappings in order to be used in the following sessions of the tutorial.

Virtual access over tabular data

Part III: Enhancing virtual knowledge graph access for tabular data

Morph-CSV is a tool for the enhancement of OBDA techniques to query tabular data. This engine exploits the information from typical inputs (i.e. query, mapping rules and annotations) for directly query tabular data including last ideas of the state of the art. Morph-CSV can be embedded it in the top of any SPARQL-to-SQL engine such as Morph-RDB or Ontop. The engine is part of the Morph suite.

Helio: Linked Data Service for publishing heterogeneus data as RDF

Part IV: Semantic annotation and efficient knowledge graph generation

Helio is a tool that publishes the knowledge graph as a service. It integrates heterogeneous datasets in real-time, providing an unified access and view of the gathered data. Helio is a mapping-based approach, which supports complex function expressions to clean data and the use of fuzzy linking rules to generate links among published instances. It is also aligned with standard mapping specifications such as RML.


Presenters

David Chaves-Fraga

Universidad Politécnica de Madrid

dchaves (at) fi.upm.es

David is a researcher and last year PhD student of the Ontology Engineering Group - Universidad Politécnica de Madrid since 2016, his work is currently focused on the generation of virtual knowledge graph from tabular data on the web using mapping languages. He is also an assistant lecturer in several Semantic Web courses, chair of the W3C Community Group Knowledge Graph Construction, part of the organising committees of the Knowledge Graph Building Workshop 2019 and the Workshop on Semantics for Transport 2019 and the main coordinator of the formative international program Open Summer of Code in Spain.

Andrea Cimmino

Universidad Politécnica de Madrid

cimmino (at) fi.upm.es

Andrea is a postdoctoral researcher at the Ontology Engineering Group - Universidad Politécnica since 2019. He holds a PhD in the Universidad of Sevilla focused on dynamic link discovery of RDF instances by applying Genetic Programming Algorithms. His main research interest the data integration by providing efficient access to heterogeneous services and resources, mainly in the IoT context. Also, he is interested in the SHACL shapes generation for RDF validation. Currently, he is teaching in a Business School Master in Madrid, in which he lectures the Computer Science subject. During the last years he has collaborated with some standardisation groups, such as OpenADR.

Oscar Corcho

Universidad Politécnica de Madrid

ocorcho (at) fi.upm.es

Oscar is a full professor at Universidad Politécnica de Madrid and one of the coordinators of the Ontology Engineering Group. He leads the Data Integration area of the group, where all the presenters come from. He has been working and teaching Data Integration, Knowledge Graphs and Linked Data since 2004.

Ana Iglesias-Molina

Universidad Politécnica de Madrid

ana.iglesiasm (at) upm.es

Ana is a researcher and PhD student of the Ontology Engineering Group (Universidad Politécnica de Madrid) since 2019. Her work is mainly focused on declarative mapping construction and development, and data integration of diverse sources from the biological domain.


Materials

All the necessary materials for the tutorial are available at: https://github.com/oeg-upm/kgc-tutorial-iswc2020
Technologies
Software
  • Mapeathor: User-friendly mapping generator
  • Morph-CSV: Enhancing virtual knowledge graph construction techniques for tabular data
  • Helio: Link Data Publishing in Real-Time

Sponsors