Linked Data and Knowledge Graphs -- Constructing and Understanding Knowledge Graphs
The document is a tutorial on linked data and knowledge graphs presented by Jeff Z. Pan and others from the University of Aberdeen. It covers topics such as the current status of linked data, methods for constructing and understanding knowledge graphs, and applications, as well as research challenges in the field. Key examples discussed include DBpedia, Wikidata, and GoodRelations as linked data knowledge repositories.
Linked Data and Knowledge Graphs -- Constructing and Understanding Knowledge Graphs
1.
JIST2014 Tutorial on Linked Data and Knowledge Graphs -‐ ConstrucAng and Understanding Knowledge Graphs Presenter Jeff Z. Pan (University of Aberdeen) Contributors Honghan Wu (University of Aberdeen) Yuan Ren (University of Aberdeen) Panos Alexopoulos (iSOCO)
2.
Jeff Z. Pan (University of Aberdeen) Agenda Overview & ApplicaAons 1:00pm – 1:20pm 1:35pm – 1:45pm The Current Status of Linked Data: the Good, the Bad and the Ugly 1:20pm – 1:35pm Example Linked Data Knowledge Repositories PART I LINKED DATA & KNOWLEDGE GRAPHS 1:45pm – 2:00pm Research Challenges 2
3.
Jeff Z. Pan (University of Aberdeen) Agenda ConstrucAng Knowledge Graphs 2:00pm – 3:05pm 3:05pm – 3:40pm Understanding Knowledge Graphs 2:30pm – 2:45pm Coffee Break PART II METHODS & TECHNIQUES 3:40pm – 3:45pm Outlook 3
4.
Jeff Z. Pan (University of Aberdeen) • Overview • ApplicaLons • Linked Data Knowledge Repositories • Knowledge Graph on Linked Data • Research Challenges PART I LINKED DATA & KNOWLEDGE GRAPHS 4
5.
Jeff Z. Pan (University of Aberdeen) Knowledge • What is knowledge? • Something is known • Structured informaLon • About certain aspects of the (real) world 5
6.
Jeff Z. Pan (University of Aberdeen) Semantic Networks A semantic network is a graph structure for represenLng knowledge in paSerns of interconnected nodes and arcs. • with nodes representing objects, concepts, or situations, and • arcs representing relationships 6
7.
Jeff Z. Pan (University of Aberdeen) RDF: Standard for Directed Labelled Graph KBs for the Web • RDF is • a modern version of semantic network, with formal syntax and semantics • a standard model for data interchange on the Web • RDF statements: Subject-property-value triples [my-‐chair colour tan .] [my-‐chair rdf:type chair .] [chair rdfs:subClassOf furniture .] 7
8.
Jeff Z. Pan (University of Aberdeen) Linked Data and Knowledge Graphs • Linked Data refers to (RDF) data published on the web • with its meaning explicitly defined with ontological (OWL) vocabulary • can be inter-‐linked with external datasets • A knowledge graph is a set of interconnected typed enLLes and their aSributes 8
9.
Jeff Z. Pan (University of Aberdeen) Knowledge Graph (KG) Services and Related Research Problems • KG construcLon: how to construct high quality knowledge graphs? • Knowledge aquaciLon • Knowledge evaluaLon • KG understanding: how to make it easier to access and reuse knowledge? • for end users • for data engineers • KG reasoning: how to bridge the gap between vocabulary used in the graphs and those used in qeuries • Scalability • Efficiency 9
10.
Jeff Z. Pan (University of Aberdeen) APPLICATIONS OF KNOWLEDGE GRAPHS Summary of entities, Faceted fact, From best to list, EntityAssociations, Structured Queries, and QuestionAnswering 10
11.
Jeff Z. Pan (University of Aberdeen) ENTITY UNDERSTANDING: THINGS, NOT STRINGS 11
12.
Jeff Z. Pan (University of Aberdeen) What is it? (EnAty Understanding) 12
13.
Jeff Z. Pan (University of Aberdeen) FACETED FACT: GETTING THE VALUE OF SOME ATTRIBUTE 13
14.
Jeff Z. Pan (University of Aberdeen) What is the Ame there? (Faceted Fact) 14
15.
Jeff Z. Pan (University of Aberdeen) FROM BEST TO LIST: NOT ONLY THE BEST 15
16.
Jeff Z. Pan (University of Aberdeen) Give a List instead of Best 16
17.
Jeff Z. Pan (University of Aberdeen) ENTITY ASSOCIATION: SHOW THE CONNECTIONS 17
18.
Jeff Z. Pan (University of Aberdeen) How are they connected? (EnAty AssociaAon) Gong Cheng,Yanan Zhang, andYuzhong Qu. Explass: ExploringAssociations between Entities viaTop-K Ontological Patterns and Facets. In Proc. Of ISWC 2014, pp. 422–437. http://ws.nju.edu.cn/explass/ 18
19.
Jeff Z. Pan (University of Aberdeen) STRUCTURED QUERIES: EVEN WHEN THE INPUTS ARE KEYWORDS 19
20.
Jeff Z. Pan (University of Aberdeen) From keywords to structural queries Wang, Haofen, Kang Zhang, Qiaoling Liu,ThanhTran, andYongYu. Q2semantic:A lightweight keyword interface to semantic search. In Proc. Of ESWC 2008, pp 584-598. “Capin SVG” find specifications about“SVG”whose author’s name is“Capin” 20
21.
Jeff Z. Pan (University of Aberdeen) QUESTION ANSWERING: COMPUTE ANSWERS WITH THE KG 21
22.
Jeff Z. Pan (University of Aberdeen) QuesAon Answering Christina Unger, Lorenz Bühmann, Jens Lehmann,Axel-Cyrille Ngonga Ngomo, Daniel Gerber, and Philipp Cimiano. "Template-based question answering over RDF data." In Proceedings of the 21st international conference onWorldWideWeb, pp. 639-648.ACM, 2012. “films starring Brad Pitt” 22
23.
Jeff Z. Pan (University of Aberdeen) SAMPLE LINKED DATA KNOWLEDGE REPOSITORIES DBpedia,WikiData, GoodRelation 23
24.
Jeff Z. Pan (University of Aberdeen) DBpedia • A crowd-‐sourced community effort to extract structured informaLon from Wikipedia • allows to ask structured queries against Wikipedia • and to link the different data sets on the Web to Wikipedia data. 24
25.
Jeff Z. Pan (University of Aberdeen) DBpedia – the content Entities and their attributes from Wikipedia infobox templates, categorisation information, images, geo- coordinates, etc Classification Schemas • Wikipedia Categories are represented using the SKOS vocabulary and DCMI terms. • YAGO Classification is derived from the Wikipedia category system using Word Net. • Word Net Synset Links were generated by manually relating Wikipedia infobox templates and Word Net synsets DBpedia 2014 release consists of 3 billion RDF triples 25
26.
Jeff Z. Pan (University of Aberdeen) DBpedia – services http://dbpedia.org/sparql Query Builders (e.g. Leipzig query builder at http://querybuilder.dbpedia.org) Public Faceted Web Service Interface Dump Downloads • DBpedia dumps in 125 languages at DBpedia download server. • DBpedia Ontology 26
27.
Jeff Z. Pan (University of Aberdeen) DBpedia – use cases Nucleus for the Web of Data Revolutionise Access to Wikipedia information “Give me all cities in New Jersey with more than 10,000 inhabitants” 27
28.
Jeff Z. Pan (University of Aberdeen) WikiData • A collaboraAvely edited knowledge base operated by the Wikimedia FoundaLon. • Can be read and edited by both humans and machines. • Acts as central storage for the structured data of its Wikimedia sister projects including Wikipedia, Wikivoyage, Wikisource, and others 28
29.
Jeff Z. Pan (University of Aberdeen) WikiData – the content Wikidata is a document- oriented, focused around topics. • Information is added to items by creating statements (key-value pairs) 29
30.
Jeff Z. Pan (University of Aberdeen) WikiData -‐ to Linked Data Web (1) Fredo Erxleben, Michael G¨unther, Markus Kr¨otzsch, Julian Mendez, and DennyVrandeˇci´. IntroducingWikidata to the Linked DataWeb. In Proc. Of ISWC 2014, pp. 50-65. Exporting Statements as Triples • Faithful representations: with additional quantifiers and references • Simplified representations: without additional quantifiers and references 30
31.
Jeff Z. Pan (University of Aberdeen) WikiData -‐ to Linked Data Web (2) Fredo Erxleben, Michael G¨unther, Markus Kr¨otzsch, Julian Mendez, and DennyVrandeˇci´. IntroducingWikidata to the Linked DataWeb. In Proc. Of ISWC 2014, pp. 50-65. Extracting Schema Information from Wikidata • instance of (P31) → rdf:type and subclass of (P279) → rdfs:subClassOf • constraints for the use of properties → OWL Axioms 31
32.
Jeff Z. Pan (University of Aberdeen) WikiData – use case & data access Use Cases • Information about the sources helps support the notion of verifiability • Collecting structured data: allow easy reuse of that data • Support for Wikimedia projects: reducing the workload in Wikipedia and increasing its quality • Support well beyond that. Everyone can use Wikidata Accessing the data • Mediawiki Lua Scribunto interface • Wikibase/API • RDF Dumphttp://tools.wmflabs.org/wikidata-exports/rdf/exports/20141013/ 32
33.
Jeff Z. Pan (University of Aberdeen) GoodRelaAons GoodRelations is a lightweight ontology for annotating offerings and other aspects of e-commerce on the Web. [Slide credit: MarLn Hepp] 33
34.
Jeff Z. Pan (University of Aberdeen) GoodRelaAons – use cases [Slide credit: MarLn Hepp] 34
35.
Jeff Z. Pan (University of Aberdeen) GoodRelaAons – use cases(2) 35 Google, Bing, Yahoo, and Yandex will improve the rendering of your page directly in the search results Rich Snippets:Search engines use your markup to augment the preview of your site Targeted Searching:profile and preferences of the person behind the query
36.
Jeff Z. Pan (University of Aberdeen) GoodRelaAons – who are using 36 Search Engines and 10,000+ small and large shops Publishers Software OpenLink (Virtuoso)
37.
Jeff Z. Pan (University of Aberdeen) CURRENT STATUS OF ONLINE LINKED DATA The good, the bad and the ugly 37
38.
Jeff Z. Pan (University of Aberdeen) The Good Ontology Mapping Data linkage RDF / OWL Querying and reasoning techniques -‐ Flexible schema sebng -‐ schemaless -‐> simple schema -‐> rich schema -‐ Universal Unique ID for data enLLes: URI -‐ Shared vocabularies -‐ Schema mapping -‐ Instance mapping -‐ SPARQL entailment regimes -‐ DisLrbuted SPARQL endpoints 38 Flexible linked data eco-‐system
39.
Jeff Z. Pan (University of Aberdeen) The Good • Flexible linked data eco-‐system • FaciliLes of sharing and linking knowledge in open environment • Knowledge representaLon: various levels of expressive power • Services, tools, and approaches for knowledge generaLon, understanding, and consuming • Interlinked knowledge repositories across various domains 39
40.
Jeff Z. Pan (University of Aberdeen) The Bad • Knowledge Quality (errors, provenance, quanLfier, freshness…) • Data protecLon (license, access control) • Data business model 40
41.
Jeff Z. Pan (University of Aberdeen) The Ugly • Excel in knowledge representaLon • But, a large amount of datasets missing schema informaLon • RDF is triple based model • But, it is hard and Lme-‐consuming (even for SW geeks) to understand a RDF knowledge repository 41
42.
Jeff Z. Pan (University of Aberdeen) RESEARCH CHALLENGES 42
43.
Jeff Z. Pan (University of Aberdeen) Research Challenges • KG Construction • Ontology / Schema Construction • Data Lifting • Quality Evaluation • Understanding KG • User Understanding • Data Understanding • Dynamic Knowledge in KG • Stream Data / Prediction • Belief Revision • Intelligent Services for KG • Ontology Reasoning (see my tutorial at ISWC2014) • Problem Solving / Workflow 43
44.
Jeff Z. Pan (University of Aberdeen) • Incompleteness of data: is the constructed schema generic enough to accommodate new data? • Inconsistency of data: what if data conflicts with each other? e.g. Birthdate of people: some people may not have birthdate asserted in the dataset, should the schema specify that each people has a birthdate? Some people may have different birthdates asserted in different datasets, should the schema specify that birthdate is unique? Challenges in AutomaAc ConstrucAon 44
45.
Jeff Z. Pan (University of Aberdeen) • Expertise of ontology engineers: do the engineers have sufficient understanding and experience of ontology technologies (RDF(S), OWL, SPARQL, RIF, etc…) • Workload of ontology engineers: how much time does it take to manually construct a large ontology? E.g. SNOMED CT has about 400,000 concepts • Collaboration: when multiple ontology engineers work together, how to make sure they have consistent understanding of the ontology? Challenges in Mannual ConstrucAon 45
46.
Jeff Z. Pan (University of Aberdeen) • Requirement and evaluation: how to specify the requirement of ontology construction and test if the requirements have been fulfilled? • Expressiveness v.s. Efficiency: which knowledge representation should we use? Is it sufficient to describe the domain? Is there efficient reasoning and query answering mechanism and system available? • Ontology reuse: do we have to construct everything from scratch? Is there ontology available covering partially the domain? Challenges for both AutomaAc/Mannual ConstrucAons 46
47.
Jeff Z. Pan (University of Aberdeen) Key challenges: • Entity identification: certain entities can be hard to identify, e.g. movie titles • AVP (attribute-value pair) identification: an entity, attribute and its value may scattered across the text or dataset, making it hard to establish the relation Challenge in Data Liding Data Lifting enrichs unstructured data with structural annotations, therefore extract the entities and their relations, properties for knowledge graph 47
48.
Jeff Z. Pan (University of Aberdeen) Challenge in EnAty IdenAficaAon • There different ways to identify entities: e.g. “The President of the U.S.” and “Barak Obama” • The same name can be referring to different entities • People may use acronym or abbreviation for entities: e.g. “K-Drive” is the acronym for “Knowledge-driven Data Exploitation” project instead of the drive labelled K in my computer. • Natural language text may have typos, values may use different notations 48
49.
Jeff Z. Pan (University of Aberdeen) • Users are unfamiliar with the content of knowledge graphs: • What is the vocabulary? • What is described by the knowledge graph? • How is the content organised? • How is it connected to the other datasets I have? • Users do not know how to exploit the knowledge graph: • Which query can I ask this knowledge graph? • Which query can be answered with this knowledge graph? Challenge in Data Understanding 49
50.
Jeff Z. Pan (University of Aberdeen) Challenge in Knowledge Dynamics • Validity of knowledge: is a piece of information permanent or temporary? • Representation: e.g. to represent the temporal dependency of knowledge, e.g. “George W. Bush was the president of the U.S. until Barak Obama became the president.” • Updating of knowledge graph: When and how do we retract a previously unknown mistake from the knowledge graph? Which knowledge should become obsolete after the current update? • Querying: to query w.r.t. the temporal properties of knowledge, e.g. “Who was the last president of the U.S.?” • Predicting the dynamics: which change is likely to occur given the history of the knowledge graph? 50
51.
Jeff Z. Pan (University of Aberdeen) Key challenges • Efficiency of the services: knowledge graphs are usually accessed by multiple users in real-time. Efficiency is crucial to the quality of service. • Scalability of the services: knowledge graphs are usually of large scale while basic reasoning services, e.g. transitive closure, can already consume large amount of time and resources. Challenge in Intelligent Services The large amount of information and their inter-connection in a knowledge graph can be used to provide intelligent services; e.g. reasoning can be used to discover hidden relations in a knowledge graph 51
52.
Jeff Z. Pan (University of Aberdeen) Agenda ConstrucAng Knowledge Graphs 2:00pm – 3:05pm 3:05pm – 3:40pm Understanding Knowledge Graphs 2:30pm – 2:45pm Coffee Break PART II METHODS & TECHNIQUES 3:40pm – 3:45pm Outlook 52
53.
Jeff Z. Pan (University of Aberdeen) • Test Driven Ontology Construction • Methodology • A Protégé plug-in • Handling Entity DisambiguaLon • Approach • Some evaluation result • Briding Requirements and Authoring Tests • Competency Questions as Informal Requirement Specification • Some evaluation results CONSTRUCTING KNOWLEDGE GRAPHS 53
54.
Jeff Z. Pan (University of Aberdeen) Uschold & King’s (1995) Methodology on Ontology ConstrucAon • Key steps: capturing, coding, integrating and evaluating/testing • Ontology evaluation/testing: • to make a technical judgment of the ontologies • w.r.t. to a frame of reference • A frame of reference can be: • requirement specifications • competency questions • or, the real world 54 54
55.
Jeff Z. Pan (University of Aberdeen) Ontology and Tests • Uschold & King’s methodology • Test ontology after axioms are written • Test-driven ontology authoring • Write authoring tests before writing axioms • Writing authoring tests before axioms does not take any more efforts than writing them after axioms • Force authors to think about requirements before writing axioms • Writing authoring tests first will help authors to detect and remove errors sooner • Understand how good is a(n) existing/reused ontology 55 55
56.
Jeff Z. Pan (University of Aberdeen) Gruninger & Fox’s (1995) Methodology Key steps: 1. Motivating Scenarios 2. Informal competency questions 3. FOL terminology (classes, properties, objects) 4. Formal competency questions (2 -> 4?) 5. FOL axioms 6. Completeness theorem (defining the conditions under which the solutions to the questions are complete) 56 56
57.
Jeff Z. Pan (University of Aberdeen) The METHONTOLOGY (2003) Methodology • Key steps: 1. specification of requirements 2. terminology with tabular and/or graph notations 3. formalisation with logic based ontology language 4. maintenance (including evaluation/testing) • Ontology evaluation/testing: • checking consistency, completeness, redundancy 57 57
58.
Jeff Z. Pan (University of Aberdeen) The DKAP (2007) Methodology • Key steps: 1. determine the domain and scope 2. check availability of existing ontologies 3. collect and analyse data for knowledge extraction 4. develop initial ontology 5. refine and validate ontology • Ontology Validation/testing: • consistency and accuracy checking 58 58
59.
Jeff Z. Pan (University of Aberdeen) LimitaAons of ExisAng Methodologies • Methodology level: • Lack of details about the transitions • from requirement to tests • from requirements to terminology • form terminology to axioms • Tool level: • lack of tools to guide the above transitions 59 59
60.
Jeff Z. Pan (University of Aberdeen) An approach to Test-‐Driven Ontology Authoring (presented in an invited talk at BMIR, Stanford University, June 2013) • An ontology contains not only OWL files, but also a test suit • A test suit contains a set of tests as SPARQL 1.1 queries • not all requirements can be represented in SPARQL 1.1 though • Ontology reuse • check the associated test suit before ontology reuse, to better understand the original intention • Collaborative ontology authoring • all authors agree upon a common test suit • each author can have their an extra test suit locally 60 60
61.
Jeff Z. Pan (University of Aberdeen) Authoring Tests Test Suite Test 1 Test 2 … Query Expected results Ontology Actual results Pass/ fail reasoner SPARQL 1.1 61
62.
Jeff Z. Pan (University of Aberdeen) A Protégé Plug-‐in for Authoring Tests (based on the TrOWL reasoner) 62 62
63.
Jeff Z. Pan (University of Aberdeen) • Clicking on a test to show the expected and actual results Loading the Manifest File • A manifest file specifies queries and expected results • Running reasoner to get the results for each test 63 63
64.
Jeff Z. Pan (University of Aberdeen) Compute JusAficaAons for Errors Related to Failed Tests • with the jusLficaLon plug-‐in (and reasoners, such as TrOWL) 64 64
65.
Jeff Z. Pan (University of Aberdeen) Modify the Ontology • so that CheeseTopping no longer disjoint with VegetableTopping 65 65
66.
Jeff Z. Pan (University of Aberdeen) Key Issue (to be revisited ader the EnAty DisambiguaAon part) • Understanding the intension of ontology authors • How to generate authoring tests? • How to judge the quality of the authoring tests? 66 66
67.
Jeff Z. Pan (University of Aberdeen) EnAty RecogniAon and DisambiguaAon • Challenge revisit: • There different ways to identify entities: e.g. “The President of the U.S.” and “Barak Obama” • The same name can be referring to different entities • Contextual hypothesis used in many existing aproaches • terms with similar meanings are oien used in similar contexts • The role of these contexts is typically played by already annotated documents (e.g. wikipedia arLcles) which are used to train term classifiers 67 67
68.
Jeff Z. Pan (University of Aberdeen) AlternaAve Context: Evidence Model • Idea: semantic entities that may serve as disambiguation evidence for the scenario’s target entities 68
69.
Jeff Z. Pan (University of Aberdeen) Evidence Model ConstrucAon (Manual) • The identification of target concepts whose instances we wish to disambiguate (e.g. locations) • The determination related concepts whose instances may serve as contextual disambiguation evidence. • For example, in texts that describe historical events, some concepts whose instances may act as location evidence are related locations, historical events, and historical groups and persons. • The identification, for each pair of evidence and target concept, of the relation paths that links them. 69
70.
Jeff Z. Pan (University of Aberdeen) Evidence-‐Target Paths 70
71.
Jeff Z. Pan (University of Aberdeen) Term ExtracAon (AutomaAc) Extraction is performed with Knowledge Tagger (from iSOCO) based on GATE. 71
72.
Jeff Z. Pan (University of Aberdeen) EvaluaAon Results: Football Match Scenario • 50 texts describing football matches. • E.g. “It's the 70th minute of the game and after a magnificent pass by Pedro, Messi managed to beat Claudio Bravo. Barcelona now leads 1-0 against Real." 72
73.
Jeff Z. Pan (University of Aberdeen) EvaluaAon Results: Military Conflict Scenario • 50 historical texts describing military conflicts. • E.g. “The Siege of Augusta was a significant battle of the American Revolution. Fought for control of Fort Cornwallis, a British fort near Augusta, the battle was a major victory for the Patriot forces of Lighthorse Harry Lee and a stunning reverse to the British and Loyalist forces in the South”. 73
74.
Jeff Z. Pan (University of Aberdeen) Future Work • Fully automated construction of the disambiguation evidence model. • Challenge here is how to automatically identify the text’s domain/ topic. • Combination with statistical methods for cases where available domain semantic information is incomplete. • Challenge here is how to select the optimal ratio of ontological evidence v.s. statistical one. • Development of tool to enable users to dynamically build such models out of existing semantic data and use them for disambiguation purposes 74
75.
Jeff Z. Pan (University of Aberdeen) Issues in Test-‐Driven Ontology Authoring 1. How to generate tests 2. How to judge the quality of tests • why they are relevant • how to provide the correct expected answers 75 75
76.
Jeff Z. Pan (University of Aberdeen) Requirement Driven? • How about starLng from requirements instead of tests? Ontology Authoring Requirements Ontology Authoring Tests Test Results 76
77.
Jeff Z. Pan (University of Aberdeen) Requirement-‐Driven Ontology Authoring [Ren et. al, 2014] • Key questions • RQ1: what forms of requirements should we consider • RQ2: how to generate authoring tests from requirements 77 77
78.
Jeff Z. Pan (University of Aberdeen) Competency QuesAon • QuesLons that people expect the constructed ontologies to answer • Useful for novice users • in natural languages • about domain knowledge • requires liSle understanding of ontology technologies • A typical CQ: Which pizza has some cheese topping? 78
79.
Jeff Z. Pan (University of Aberdeen) RQ1: what forms of requirements should we consider RQ1’: How are CQs formulated? Competency QuesAons (CQs) can be regarded as a funcAonal requirement of the ontology 79
80.
Jeff Z. Pan (University of Aberdeen) Key Idea 1: IdenAficaAon of CQ Paoerns • A typical CQ: Which pizza has some cheese topping? • Hypothesis: CQs usually have clear syntacLc paSerns • Features and elements can be extracted Feature: Type of quesLon Element: Class expression CE1 Element: Object property expressions OPE Feature: Binary predicate Element: Class expression CE2 CE1 OPE CE2 80
81.
Jeff Z. Pan (University of Aberdeen) Result 1: A Feature-‐based Framework for CQ FormulaAon • Based on CQs collected from the Soiware Ontology Project (75 CQs) and Manchester OWL Workshops (70 CQs) • Primary features -‐> CQ Archetypes • Secondary features -‐> CQ Subtypes Feature Primary Feature Secondary Feature QuesLon Type Element Visibility SelecLon Boolean CounLng Explicit Implicit Predicate Arity Unary Binary N-‐ary RelaLon Type Object Datatype Modifier QuanLty Numeric Domain Independent Element SpaLal Temporal QuesLon Polarity PosiLve NegaLve 81
82.
Jeff Z. Pan (University of Aberdeen) Result 2: Archetypes of CQ Paoerns 82
83.
Jeff Z. Pan (University of Aberdeen) Answerability of CQs • ExisLng work focused on answering CQs directly • But is the answer meaningful? • The ability to answer CQs meaningfully can be regarded as a funcLonal requirement of the ontology • What if the answer is an empty set • Possible scenarios • Pizza does not exist • Cheese topping does not exist • Pizzas are not allowed to have cheese topping • The ontology has not been populated with any cheesy pizza yet • … • A typical CQ: Which pizza has some cheese topping? 83
84.
Jeff Z. Pan (University of Aberdeen) RQ2: how to generate authoring tests from requirements RQ2’: How can we automaLcally test whether a CQ can be meaningfully answered? 84
85.
Jeff Z. Pan (University of Aberdeen) Key Idea 2: PresupposiAons of CQ • A CQ comes with certain presupposi(ons • Some condi(ons the speakers assume to be met • A CQ can be meaningfully answered only when its presupposiLons are saLsfied • Classes Pizza, CheeseTopping should occur in the ontology • Property has(Topping) should occur in the ontology • The ontology should allow Pizza to have CheeseTopping • The ontology should also allow Pizza to not have CheeseTopping • A typical CQ: Which pizza has some cheese topping? 85
86.
Jeff Z. Pan (University of Aberdeen) CQs and Authoring Tests • A typical CQ: Which pizza has some cheese topping? • SaLsfiability of CQ presupposiLons can be verified by authoring tests generated based on its features and elements • Classes Pizza, CheeseTopping should occur in the ontology • [CE1], [CE2] should both occur in the class vocabulary • Property has(Topping) should occur in the ontology • [OPE] should occur in the property vocabulry • The ontology should allow Pizza to have CheeseTopping • should be sa6sfiable • The ontology should also allow Pizza to not have CheeseTopping • should be sa6sfiable CE1 OPE CE2 86
87.
Jeff Z. Pan (University of Aberdeen) Result 3: Associate PresupposiAons with Features • Features in a CQ are associated with the presupposiLons of the CQ. • An example on the ques6on type feature: QuesLon Type SelecLon Boolean CounLng Occurrence of “Pizza”, “Pork”, “contains” Which pizza contains pork? Can pizza contain pork? How many pizza contains pork? Some pizza can contain pork Some pizza can contain no pork 87
88.
Jeff Z. Pan (University of Aberdeen) Result 4: Formal Authoring Tests • All tesLngs can be automated 88
89.
Jeff Z. Pan (University of Aberdeen) Class Hierarchy Verbalise r Competency QuesLons User/System Dialogue History User Input WhatIf Gadget 89
90.
Jeff Z. Pan (University of Aberdeen) Input (Manchester Syntax) 1. User selects a speech act by clicking or selecLng a shortcut. 2. We need to evaluate their usefulness. 3. Examples: ● Class: Pizza SubClassOf: Food ● Class: Fruit DisjointWith: Pizza 90
91.
Jeff Z. Pan (University of Aberdeen) Input (OWL Simplified English) 1. A set of restricted natural language paSerns. 2. System recognises the speech act. 3. Capable of accepLng Competency QuesLons. 4. Examples: ● Which pizza has topping a tomato topping? ● An apple is a fruit. 91
92.
Jeff Z. Pan (University of Aberdeen) Modelling User Goals (1) 1. Users can import or write their own CQs in OWL Simplified English 2. Based on the inserted CQ, a list of Authoring Tests (ATs) will be generated. 3. A tree structure displays these CQs and ATs. 4. The system is constantly monitoring these CQs and ATs. Any change in saLsfiability of ATs: a. Will be reported by changing the icon of ATs in the tree. Red/Green respecLvely represent fail/ pass of each AT. b. Will be reported in the “history” pane. 92
93.
Jeff Z. Pan (University of Aberdeen) Modelling User Goals (2) CQ + AT hierarchical representaLon. Icons represent the saLsfiability state WriSen feedback presented to the user in the “history” pane. 93
94.
Jeff Z. Pan (University of Aberdeen) Further Challenges ● Maintaining a continuous and meaningful interaction with the user ● Generating a coherent and comprehensive set of entailments in response to What-If questions ❖ Content selection ❖ Grouping and aggregation ❖ Ordering 94
95.
Jeff Z. Pan (University of Aberdeen) • Data understanding • Data summarisation • Query generation UNDERSTANDING KNOWLEDGE GRAPHS 95
96.
Jeff Z. Pan (University of Aberdeen) Data Understanding: A Core AcAvity in Data ExploitaAon • TradiLonal focus in semanLc web research: data understanding for machines and programs. • More importantly: Data understanding for human • humans are the ulLmate owners and consumers of data • systems such as knowledge graphs, Watson, Siri, etc. • to help human users to understand the contents, implicaLons and applicaLons of data • More than HCI, we want interesLng and insighqul data! 9696
97.
Jeff Z. Pan (University of Aberdeen) SemanAc Datasets Are HARD to Understand • Non-expert users might not be familiar with RDF, OWL and SPARQL • RDF(s) has 6 core documents • OWL 2 has 6 core documents • SPARQL 1.1 has 11 core documents • Users are unfamiliar with datasets • That are too large to explore • That are external to their organisation • … • It is HARD for novice users to construct queries 9797
98.
Jeff Z. Pan (University of Aberdeen) Challenges of Data Understanding • Challenges • Expressing needs (keywords/SPARQL) • Describing datasets • Only retrieve the relevant parts • 9.96% SPARQL / 8.19% DUMP
99.
Jeff Z. Pan (University of Aberdeen) SoluAon – Summary based profiling for LD • Key idea: building block based informaLon space modelling • Decomposing & ConstrucLng
100.
Jeff Z. Pan (University of Aberdeen) The philosophy of interpreAng informaAon • Task: explain the data to human users Entity Centric
101.
Jeff Z. Pan (University of Aberdeen) EnAty-‐centric View of RDF Data En6ty Descrip6on Block
102.
Jeff Z. Pan (University of Aberdeen) Concrete to abstract En6ty Descrip6on Pa?ern
103.
Jeff Z. Pan (University of Aberdeen) Data SummarisaAon – EDP Graph • Reveal the schema level informaLon • What concepts are there (nodes)and how they are related to each other(edges)? • Disclose individual level distribuLon • StaAsAcs aSached to nodes and edges Jamendo dataset
104.
Jeff Z. Pan (University of Aberdeen) Understanding Data Redundancy [Wu et. al, 2014] 104
105.
Jeff Z. Pan (University of Aberdeen) Related Paper at JIST2014 • Graph PaSern based RDF Data Compression Jeff Z. Pan, Jose Manuel Gomez-‐Perez, Yuan Ren, Honghan Wu, Haofen Wang and Man Zhu • (Monday aiernoon) 105
106.
Jeff Z. Pan (University of Aberdeen) Understanding How Data Can be Used • Given a knowledge graph, generate candidate insighqul queries • Manual generaLon/automaLc generaLon • GeneraLon based on schema/actual data • With/without user interference • Our aim: automaLc generaLon based on data without user interference • Most friendly to new, novice users • Complementary to inference (heavily based on schema) 106106
107.
Jeff Z. Pan (University of Aberdeen) Candidate Insighpul Queries [Pan, et al, 2013] • Graph paSerns are summarisaLons that represent many subsets of the RDF graph • PaSern structure • Structured knowledge, which is difficult to express with schema • Such as star, chain, tree, loop • Correspondences between mulLple graph paSerns • Strongly corresponding paSerns (large overlapping) • Weakly corresponding paSerns (liSle overlapping) • ExcepLons 107
108.
Jeff Z. Pan (University of Aberdeen) Query GeneraAon Framework • 1. data summarisaLon • Significantly decrease the search space in rule mining • 2. data analyLcs • First order inducLve learning • AssociaLon rule mining • 3. query generaLon • ExploiLng the relaLons between queries and rules
Jeff Z. Pan (University of Aberdeen) Another Example • Given university data set in LUBM, the following two queries have the same results (when no reasoning is applied)
111.
Jeff Z. Pan (University of Aberdeen) Summary and Future Work • Take home message • Data summarisaLon and data analyLcs technologies not only help people to find answers, but also help people asking quesLons! • Future work • Integrate with applicaLon scenario background knowledge • Integrate with reasoning • Integrate with user preferences
112.
Jeff Z. Pan (University of Aberdeen) OUTLOOK Outlook of Knowledge Graph: from application’s point of view 112
113.
Jeff Z. Pan (University of Aberdeen) What knowledge graph still needs: • “How to…” knowledge in addition to “What is …” knowledge • Operations associated to the entities Outlook What knowledge graph is good at: Maintaining factual knowledge in a structural manner and answer queries about them 113
114.
JIST2014 Tutorial on ConstrucAng and Understanding Knowledge Graphs Thanks you! QuesAons?