VKG2019

Generating and Querying Virtual Knowledge Graphs from heterogeneous data sources

Despite the emergence of RDF knowledge bases, exposed via SPARQL endpoints or as Linked Data, formats like CSV, JSON or XML are still the most used for exposing data on the web. Some solutions have been proposed to describe and integrate these resources using mapping languages (e.g. RML, CSVW, kR2RML, etc) and many of those are equipped with associated RDF generators (e.g. RML-Mapper, CSVW generator, etc). As these solutions generate materialized RDF, they cannot efficiently deal with volatile data or provide a SPARQL entry point directly to the data sources. In this tutorial, we explain how to use a suite of tools to manage and exploit data in heterogeneous formats (CSV, RDB, JSON or REST API) without the need to load the resulting RDF in a triple store for querying it. First, we present TADA, a tool for automatically annotating CSV files using existing Knowledge Graphs. Second, we present HELIO, a Linked Data publisher that provides unified access in real-time to multiple heterogeneous data sources. Finally, we present an OBDA approach to exploit CSV published on the Web providing access via SPARQL or GraphQL.


Audience

The target audience for this tutorial is researchers and practitioners both from industry and academia who are interested in integrating heterogeneous data on the web using semantic technologies. The tools, slides and data that will be presented during the tutorial will be available online in this web page soon. Knowledge about standard mapping languages (R2)RML is not required but we will give a brief summary of the practical use of these proposals as they will be a key factor among all the solutions that will be presented in the tutorial.


Automatic tabular data annotation

Part I: Semantic Annotation and Efficient Knowledge Graph Generation

TADA performs semantic annotation on tabular datasets. Given a tabular data file and an existing Knowledge Graph, it will automatically annotate the columns of the input data file with types from the given Knowledge Graph exploiting the data within

Real-Time Link Data Generation

Part I: Semantic Annotation and Efficient Knowledge Graph Generation

Helio publishes Linked Data from heterogeneous datasets in real-time, providing an unified access and view of the gathered data. Helio is a mapping-based approach, which supports complex function expressions to clean data and the use of fuzzy linking rules to generate links among published instances.

Querying CSV from OBDA

Part II: OBDA approach for querying heterogeneous data

Morph is a suite of OBDA tools focused on the optimization of SPARQL-to-SQL translation process. It starts as a framework for querying relational databases using R2RML mappings and SPARQL queries and now has also incorporated an access to CSV files using state-of-the-art proposals like mapping languages (RML with FnO) and annotations (CSVW).

Semantic Web for developers

Part II: OBDA approach for querying heterogeneous data

GraphQL is a graph query language that allows query heterogeneous data and services. During the last part of the tutorial, we will show how to query the virtual Knowledge Graph using GraphQL.

Ontology-Driven Access to Link Data

Part II: OBDA approach for querying heterogeneous data

Agora allows to query sparse Link Data from different datasets, providing long query results that combine the different datasets data in an efficient time.


Ahmad Alobaid.

aalobaid (at) fi.upm.es

Ahmad Alobaid is a researcher and PhD student of the Ontology Engineering Group - Universidad Politécnica de Madrid since 2016, his main research area is on the semantic labelling of tabular data using machine learning and existing knowledge graphs. He co-organized a tutorial about Ontoology, a widely used ontology development tool, in EKAW2018

David Chaves-Fraga

dchaves (at) fi.upm.es

David Chaves-Fraga is a researcher and PhD student of the Ontology Engineering Group - Universidad Politécnica de Madrid since 2016, his work is currently focused on the generation of virtual knowledge graph from CSV on the Web using mapping languages. He is also an assistant lecturer in Semantic Web courses and the main coordinator of the formative international program Open Summer of Code in Spain.

Andrea Cimmino

cimmino (at) fi.upm.es

Andrea Cimmo is a researcher at the Ontology Engineering Group since 2018 and PhD student from Universidad of Sevilla. His main research interest is on providing efficient access to heterogeneous services in the IoT domain. He has been also an assistant lecturer in several Software Engineering courses.

Oscar Corcho

ocorcho (at) fi.upm.es

Oscar Corcho is a full professor at Universidad Politécnica de Madrid and one of the leaders of the Ontology Engineering Group. He also leads the Data Integration area of the group, where all the presenters come from. He has been working and teaching Data Integration, OBDA, and Linked Data since 2004.

Freddy Priyatna

fpriyatna (at) fi.upm.es

Freddy Priyatna is a postdoctoral researcher at the Ontology Engineering Group - Universidad Politécnica de Madrid since 2012. He holds a PhD in the same research group where he developed the R2RML engine morph-RDB together with R2RML mapping generator, MIRROR. His main research area is on providing query access to virtual knowledge graph. He also takes part as an assistant lecturer teaching OBDA in several Semantic Web courses.


Materials

Technologies
Software
  • TADA: Semantic Annotation on Tabular Datasets
  • Helio: Link Data Publishing in Real-Time
  • Morph Suite: Optimization of SPARQL-to-SQL Translation Process
  • Graph QL Generator: Query Heterogeneous Data and Services
  • Agora: Ontology-Driven Access to Link Data