Aalborg University Optimizing RDF Data Cubes for Efficient Processing of Analytical Queries Kim Ahlstrøm Jakobsen Alex B. Andersen Katja Hose Torben Bach Pedersen Database Technology, Department of Computer Science, Aalborg University Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 1 / 19
Aalborg University Motivation Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 2 / 19
Aalborg University Motivation Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 2 / 19
Aalborg University Motivation Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 2 / 19
Aalborg University Motivation Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 2 / 19
Aalborg University Motivation Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 2 / 19
Aalborg University Motivation Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 2 / 19
Aalborg University Future Goal Goal Analytical queries on internal data & external linked data Benefits Enables exploratory queries Increasing amount of linked data Integrates with heterogeneous data Semantic reasoning Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 3 / 19
Aalborg University Future Goal Goal Analytical queries on internal data & external linked data Benefits Enables exploratory queries Increasing amount of linked data Integrates with heterogeneous data Semantic reasoning Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 3 / 19
Aalborg University The First Steps Efficient Processing of Analytical Querying on RDF Data Cubes. Denormalize the cube dimensions Reduce the subject-object joins (expensive) Increase the subject-subject joins Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 4 / 19
Aalborg University The First Steps Efficient Processing of Analytical Querying on RDF Data Cubes. Denormalize the cube dimensions Reduce the subject-object joins (expensive) Increase the subject-subject joins Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 4 / 19
Aalborg University Workflow Internal optimization Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 5 / 19
Aalborg University Workflow Internal optimization Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 5 / 19
Aalborg University Workflow Internal optimization Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 5 / 19
Aalborg University Workflow Internal optimization Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 5 / 19
Aalborg University Workflow Internal optimization Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 5 / 19
Aalborg University Workflow Internal optimization Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 5 / 19
Aalborg University Workflow Internal optimization Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 5 / 19
Aalborg University Workflow Internal optimization Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 5 / 19
Aalborg University Workflow Internal optimization Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 5 / 19
Aalborg University Building the Cube Purpose Organize data with purpose of analysis Easier to understand What is a cube Facts: The subject of the analysis Dimensions: Perspectives of the data Levels: Concepts in the dimensions Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 6 / 19
Aalborg University Building the Cube Purpose Organize data with purpose of analysis Easier to understand What is a cube Facts: The subject of the analysis Dimensions: Perspectives of the data Levels: Concepts in the dimensions Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 6 / 19
Aalborg University Analytical Queries Example Query 1 What is the revenue per country? Example Query 2 What are the top k products bought by customers from Denmark? Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 7 / 19
Aalborg University Analytical Queries Example Query 1 What is the revenue per country? Example Query 2 What are the top k products bought by customers from Denmark? Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 7 / 19
Aalborg University Patterns Snowflake Pattern Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 8 / 19
Aalborg University Patterns Star Pattern Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 9 / 19
Aalborg University Patterns Fully Denormalized Pattern Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 10 / 19
Aalborg University Special Cases: Unbalanced Hierarchies Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 11 / 19
Aalborg University Special Cases: Property Collision Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 12 / 19
Aalborg University Special Cases: Property Collision Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 12 / 19
Aalborg University Special Cases: Property Collision Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 12 / 19
Aalborg University Semantic Web OLAP Denormalization Algorithm Input QB4OLAP ontology Snowflake pattern RDF data cube Output Star pattern RDF data cube Fully Denormalized pattern RDF data cube Features Top-down traversal Property renaming Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 13 / 19
Aalborg University Unbalanced Hierarchies Example Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 14 / 19
Aalborg University Unbalanced Hierarchies Example Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 14 / 19
Aalborg University Unbalanced Hierarchies Example Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 14 / 19
Aalborg University Unbalanced Hierarchies Example Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 14 / 19
Aalborg University Unbalanced Hierarchies Example Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 14 / 19
Aalborg University Unbalanced Hierarchies Example Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 14 / 19
Aalborg University Query rewriting SELECT ?name sum(? p r i c e ) WHERE { ? l i n e i t e m : e x t e n d e d p r i c e ? p r i c e ; : h a s o r d e r ? o r d e r . ? o r d e r skos : broader ? customer . ? customer skos : broader ? natio n . ? nation : name ?name . } GROUP BY ?name Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 15 / 19
Aalborg University Query rewriting SELECT ?name sum(? p r i c e ) WHERE { ? l i n e i t e m : e x t e n d e d p r i c e ? p r i c e ; : h a s o r d e r ? o r d e r . ? o r d e r skos : broader ? customer . ? customer skos : broader ? natio n . ? nation : name ?name . } GROUP BY ?name SELECT ?name sum(? p r i c e ) WHERE { ? l i n e i t e m : e x t e n d e d p r i c e ? p r i c e ; : h a s o r d e r ? o r d e r . ? o r d e r : nation name ?name . } GROUP BY ?name Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 16 / 19
Aalborg University Results Virtuoso Star Denormalized Increase in Triples 16 % 173 % Avg. Decease in Query Time 600 % 700 % Geo. M. Decease in Query Time 110 % 140 % Cost of triple storage Static and frequently changing data Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 17 / 19
Aalborg University Future Work More cube optimizations Consider data provenance and quality Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 18 / 19
Thank you
Aalborg University SWOD Abstract Example Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 19 / 19
Aalborg University SWOD Abstract Example Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 19 / 19
Aalborg University SWOD Abstract Example Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 19 / 19
Aalborg University SWOD Abstract Example Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 19 / 19
Aalborg University SWOD Abstract Example Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 19 / 19
Aalborg University SWOD Abstract Example Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 19 / 19
Aalborg University SWOD Abstract Example Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 19 / 19
Aalborg University SWOD Abstract Example Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 19 / 19
Aalborg University SWOD Abstract Example Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 19 / 19
Aalborg University SWOD Abstract Example Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 19 / 19
Aalborg University SWOD Abstract Example Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 19 / 19
Aalborg University SWOD Abstract Example Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 19 / 19
Aalborg University SWOD Abstract Example Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 19 / 19
Aalborg University Figure Credits Workman – Licence: CC BY 3.0 Credit: www.clipartbest.com Cube – Licence: CC BY 3.0 Credit: www.clipartbest.com Turing machine http://www.felienne.com/ Steps http://www.cliparthut.com/ Future-work http://www.horsesforsources.com/ Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 19 / 19

Optimizing RDF Data Cubes for Efficient Processing of Analytical Queries