Federated Data Stores using Semantic Web Technology Steve Ray Distinguished Research Fellow Carnegie Mellon University
Interoperability is all about DATA Three Technology Trends that could help* 1. Semantic Web technologies 2. Cloud 3. Natural Language Processing I will focus on semantic web technologies *Inspired by “Top Three Technologies to Tame the Big Data Beast,” Huffington Post, 11/22/2011 Steve Ray, Carnegie Mellon University
Representation Trends IBM Card Format EDI XML Metadata Metamodels Meta-meta- models RDF/OWL XML Schema BPML/ BPEL CBA Semantic Mediation Web Services Protocols 40 25 7 6 5 0 2 4 3 1 SOA Legacy Current Practice Exploratory 18 Info Modeling FOL (Slide adapted from Donald Hall, Logistics Enterprise Services Office, DLA) Steve Ray, Carnegie Mellon University
Why Consider RDF & OWL Semantic Web Technology? RDF = Resource Description Framework OWL = Web Ontology Language 1. Simple representation – Everything is a triple: <subject – predicate – object> 2. Self-describing models – Schemas and data coexist in data stores 3. Easy to interrogate – SPARQL queries (over schema and data) 4. Easy to validate – Supports automated reasoning 5. Easy to interoperate – Natively supports distributed data stores Steve Ray, Carnegie Mellon University
Simple Representation Everything is stored as triples: <subject predicate object> Steve Ray, Carnegie Mellon University
Self-Describing Models • The schema (model) and the data is stored in the same place • Schema: – Mammal subClassOf Animal – Human subClassOf Mammal • Data: – george is-a Human – george marriedTo lisa Steve Ray, Carnegie Mellon University
Easy to Interrogate SPARQL † language to query an RDF database (Just matches against patterns of triples) SELECT ?x WHERE { george marriedTo ?x . } Returns a table: x lisa SELECT ?y WHERE { y? subClassOf Animal . } Returns a table: y Mammal † SPARQL = SPARQL Protocol and RDF Query LanguageSteve Ray, Carnegie Mellon University
Easy to Validate SPARQL can be used for reasoning, not just interrogating In SPARQL: If George sonOf Fred and Fred siblingOf Mary Then George nephewOf Mary CONSTRUCT { ?a nephewOf ?c .} WHERE { ?a sonOf ?b ; ?b siblingOf ?c . } Steve Ray, Carnegie Mellon University
Easy to Interoperate • A single query can interact with more than one RDF database – Linked Movie Database contains movies, actors – DBPedia contains people and birthdates • Find the birthdates of all Star Trek actors – Answer does not exist in one source
Dbpedia is just one of many RDF data stores on the Web We are not alone
Implications • OWL/RDF provides a representation that can natively support transformations from other modeling languages and native formats for product and process models • The API is SPARQL • Storage can be local or web-based Steve Ray, Carnegie Mellon University
Take-away • Poor interoperability is expensive • Interoperability solutions can be expensive • Semantic technology can make interoperability solutions easier and cheaper to implement Steve Ray, Carnegie Mellon University

Federated data stores using semantic web technology

  • 1.
    Federated Data Storesusing Semantic Web Technology Steve Ray Distinguished Research Fellow Carnegie Mellon University
  • 2.
    Interoperability is allabout DATA Three Technology Trends that could help* 1. Semantic Web technologies 2. Cloud 3. Natural Language Processing I will focus on semantic web technologies *Inspired by “Top Three Technologies to Tame the Big Data Beast,” Huffington Post, 11/22/2011 Steve Ray, Carnegie Mellon University
  • 3.
    Representation Trends IBM CardFormat EDI XML Metadata Metamodels Meta-meta- models RDF/OWL XML Schema BPML/ BPEL CBA Semantic Mediation Web Services Protocols 40 25 7 6 5 0 2 4 3 1 SOA Legacy Current Practice Exploratory 18 Info Modeling FOL (Slide adapted from Donald Hall, Logistics Enterprise Services Office, DLA) Steve Ray, Carnegie Mellon University
  • 4.
    Why Consider RDF& OWL Semantic Web Technology? RDF = Resource Description Framework OWL = Web Ontology Language 1. Simple representation – Everything is a triple: <subject – predicate – object> 2. Self-describing models – Schemas and data coexist in data stores 3. Easy to interrogate – SPARQL queries (over schema and data) 4. Easy to validate – Supports automated reasoning 5. Easy to interoperate – Natively supports distributed data stores Steve Ray, Carnegie Mellon University
  • 5.
    Simple Representation Everything isstored as triples: <subject predicate object> Steve Ray, Carnegie Mellon University
  • 6.
    Self-Describing Models • Theschema (model) and the data is stored in the same place • Schema: – Mammal subClassOf Animal – Human subClassOf Mammal • Data: – george is-a Human – george marriedTo lisa Steve Ray, Carnegie Mellon University
  • 7.
    Easy to Interrogate SPARQL † languageto query an RDF database (Just matches against patterns of triples) SELECT ?x WHERE { george marriedTo ?x . } Returns a table: x lisa SELECT ?y WHERE { y? subClassOf Animal . } Returns a table: y Mammal † SPARQL = SPARQL Protocol and RDF Query LanguageSteve Ray, Carnegie Mellon University
  • 8.
    Easy to Validate SPARQLcan be used for reasoning, not just interrogating In SPARQL: If George sonOf Fred and Fred siblingOf Mary Then George nephewOf Mary CONSTRUCT { ?a nephewOf ?c .} WHERE { ?a sonOf ?b ; ?b siblingOf ?c . } Steve Ray, Carnegie Mellon University
  • 9.
    Easy to Interoperate •A single query can interact with more than one RDF database – Linked Movie Database contains movies, actors – DBPedia contains people and birthdates • Find the birthdates of all Star Trek actors – Answer does not exist in one source
  • 10.
    Dbpedia is justone of many RDF data stores on the Web We are not alone
  • 11.
    Implications • OWL/RDF providesa representation that can natively support transformations from other modeling languages and native formats for product and process models • The API is SPARQL • Storage can be local or web-based Steve Ray, Carnegie Mellon University
  • 12.
    Take-away • Poor interoperabilityis expensive • Interoperability solutions can be expensive • Semantic technology can make interoperability solutions easier and cheaper to implement Steve Ray, Carnegie Mellon University