GRAPHITE: An Extensible Graph Traversal Framework for Relational Database Management Systems Marcus Paradies*, Wolfgang Lehner*, and Christof Bornhoevd° | SSDBM‘ 15 *TU Dresden °Risk Management Solutions, Inc.
2 Graph Processing on Enterprise Data Relational + Application Logic Relational + Graph + Application Logic Data already in RDBMS SQL as the only interface/no graph abstraction Data transfer to application Efficient processing in GDBMS Processing on replicated data Data transfer to application No combination with other data models possible
3 Integration of Graph Processing into an RDBMS How could a deep integration of graph functionality into an RDBMS look like? Graph operators can be seamlessly combined with other plan operators
4 Columnar Graph Storage All available data types can be used as vertex/edge attributes
5 Columnar Graph Storage All available data types can be used as vertex/edge attributes Lightweight compression techniques (RLE)
6 Graph Traversal Workflow Traversal Configuration Traversal Kernels • S – set of start vertices • φ – edge predicate • c – collection boundary • r – recursion boundary • d – traversal direction Pluggable physical traversal kernels as implementations of a logical traversal operator
7 Graph Traversal Formalism FORMAL DESCRIPTION (SET-BASED) • A traversal operation is a totally ordered set P of path steps • Each path step pi receives a vertex set Di-1 discovered at level (i-1) and returns a set of adjacent vertices Di (1 ≤ i ≤ r, r is recursion boundary) • Initally, D0 = 𝑆 Di = 𝑣 ∃𝑢 ∈ 𝐷i−1 : 𝑒= (𝑢,𝑣) ∈ 𝐸 ˄ 𝑒𝑣𝑎𝑙(𝑒, φ) • The final output R is defined as R = 𝑖=𝑐 𝑟 𝐷𝑖 𝑖=0 𝑐−1 𝐷𝑖 Target vertices Visited vertices
8 Graph Traversals by Example Traversal Configuration Result { { A }, “type=‘a‘“, 1, 1,  } { B, C, D } Root vertex Discovered vertex
9 Graph Traversals by Example Traversal Configuration Result { { E }, “type=‘b‘“, 2, 2,  } { D } Root vertex Discovered vertex
10 Graph Traversals by Example Traversal Configuration Result { { A }, “type=‘a‘“, 1, ∞,  } { B, C, D, F} Root vertex Discovered vertex
11 Graph Traversals by Example Traversal Configuration Result { {A}, “type=‘a‘ OR type=‘b‘“, 2, 2,  } { E, F } Root vertex Discovered vertex
12 Graph Traversals by Example Traversal Configuration Result { { A }, “type=‘a‘“, 0, 1,  } { A, B, C, D } Root vertex Discovered vertex
13 Level-Synchronous (LS) Traversal SCAN-BASED GRAPH TRAVERSAL Scan partitions (dictionary-encoded)
14 Level-Synchronous (LS) Traversal SCAN-BASED GRAPH TRAVERSAL Keep (intermediate) results as bitsets
15 Level-Synchronous (LS) Traversal SCAN-BASED GRAPH TRAVERSAL Fetch neighbors by position
16 Level-Synchronous (LS) Traversal SCAN-BASED GRAPH TRAVERSAL Set operations on vectorized bitsets
17 Level-Synchronous (LS) Traversal SCAN-BASED GRAPH TRAVERSAL EDGE CLUSTERING Clustering by edge type Clustering by source vertex
18 Fragmented-Incremental (FI) Traversal • Partition column into fragments • Track dependencies between fragments in index structure • Goal: Minimize number of fragment reads Fragment For a fragment size equal to |E|, FI-traversal degenerates to LS- traversal
19 Fragmented-Incremental (FI) Traversal • Partition column into fragments • Track dependencies between fragments in index structure • Goal: Minimize number of fragment reads Transition 8  13  12 For a fragment size equal to |E|, FI-traversal degenerates to LS- traversal
20 Fragmented-Incremental (FI) Traversal • Partition column into fragments • Track dependencies between fragments in index structure • Goal: Minimize number of fragment reads Transition Graph Probabilistic Fragment Synopsis For a fragment size equal to |E|, FI-traversal degenerates to LS- traversal
21 Fragmented-Incremental (FI) Traversal • Partition column into fragments • Track dependencies between fragments in index structure • Goal: Minimize number of fragment reads Transition Graph Fragment Queue Execution Chain For a fragment size equal to |E|, FI-traversal degenerates to LS- traversal
22 Experimental Evaluation EVALUATED REAL-WORLD DATA SETS • Six real-world data sets with different topology characteristics EVALUATED SYSTEMS • Implementation in main-memory column store prototype (C++) • Graph database (Neo4j) • RDF DBMS (Virtuoso 7.0 with columnar storage layout) • Commerical columnar RDBMS (via chained self-joins, with and without index support)
23 Experimental Evaluation COMPARISON OF LS-TRAVERSAL AND FI-TRAVERSAL Traversal performance depends on the traversal depth and the topology FI-Traversal outperforms LS-Traversal by one order of magnitude
24 Experimental Evaluation COMPARISON OF LS-TRAVERSAL AND FI-TRAVERSAL Traversal performance depends on the traversal depth and the topology LS-Traversal starts outperforming FI-Traversal
25 Experimental Evaluation COMPARISON OF LS-TRAVERSAL AND FI-TRAVERSAL Traversal performance depends on the traversal depth and the topology Different performance characteristics for different fragment sizes
26 Experimental Evaluation COMPARISON OF LS-TRAVERSAL AND FI-TRAVERSAL Traversal performance depends on the traversal depth and the topology
27 Experimental Evaluation COMPARISON OF LS-TRAVERSAL AND FI-TRAVERSAL Traversal performance depends on the traversal depth and the topology Scan-based traversal outperforms fragmented traversal depending on traversal depth and graph topology
28 Experimental Evaluation SYSTEM-LEVEL BENCHMARK Combination of LS and FI-traversal outperforms native graph systems by up to two orders of magnitude
29 Summary GRAPHITE • Graph processing tightly integrated into RDBMS • Extensions of core components by graph extensions (operators, cost model, index structures) • Topology characteristics-aware traversal operators GRAPH-SPECIFIC DATA STATISTICS AND ALGORITHMS• Diverse graph topologies demand different algorithmic design decisions • Index scan versus full column scan decision also applies for graph traversals FUTURE WORK • Integration with temporal, spatial, and text data • Language extensions for custom code executed during graph traversal Integration of GRAPHITE into RDBMS
Contact Marcus Paradies, TU Dresden m.paradies@sap.com https://wwwdb.inf.tu-dresden.de/team/external-members/marcus-paradies/

GRAPHITE: An Extensible Graph Traversal Framework for Relational Database Management Systems

  • 1.
    GRAPHITE: An ExtensibleGraph Traversal Framework for Relational Database Management Systems Marcus Paradies*, Wolfgang Lehner*, and Christof Bornhoevd° | SSDBM‘ 15 *TU Dresden °Risk Management Solutions, Inc.
  • 2.
    2 Graph Processing onEnterprise Data Relational + Application Logic Relational + Graph + Application Logic Data already in RDBMS SQL as the only interface/no graph abstraction Data transfer to application Efficient processing in GDBMS Processing on replicated data Data transfer to application No combination with other data models possible
  • 3.
    3 Integration of GraphProcessing into an RDBMS How could a deep integration of graph functionality into an RDBMS look like? Graph operators can be seamlessly combined with other plan operators
  • 4.
    4 Columnar Graph Storage Allavailable data types can be used as vertex/edge attributes
  • 5.
    5 Columnar Graph Storage Allavailable data types can be used as vertex/edge attributes Lightweight compression techniques (RLE)
  • 6.
    6 Graph Traversal Workflow Traversal Configuration Traversal Kernels •S – set of start vertices • φ – edge predicate • c – collection boundary • r – recursion boundary • d – traversal direction Pluggable physical traversal kernels as implementations of a logical traversal operator
  • 7.
    7 Graph Traversal Formalism FORMALDESCRIPTION (SET-BASED) • A traversal operation is a totally ordered set P of path steps • Each path step pi receives a vertex set Di-1 discovered at level (i-1) and returns a set of adjacent vertices Di (1 ≤ i ≤ r, r is recursion boundary) • Initally, D0 = 𝑆 Di = 𝑣 ∃𝑢 ∈ 𝐷i−1 : 𝑒= (𝑢,𝑣) ∈ 𝐸 ˄ 𝑒𝑣𝑎𝑙(𝑒, φ) • The final output R is defined as R = 𝑖=𝑐 𝑟 𝐷𝑖 𝑖=0 𝑐−1 𝐷𝑖 Target vertices Visited vertices
  • 8.
    8 Graph Traversals byExample Traversal Configuration Result { { A }, “type=‘a‘“, 1, 1,  } { B, C, D } Root vertex Discovered vertex
  • 9.
    9 Graph Traversals byExample Traversal Configuration Result { { E }, “type=‘b‘“, 2, 2,  } { D } Root vertex Discovered vertex
  • 10.
    10 Graph Traversals byExample Traversal Configuration Result { { A }, “type=‘a‘“, 1, ∞,  } { B, C, D, F} Root vertex Discovered vertex
  • 11.
    11 Graph Traversals byExample Traversal Configuration Result { {A}, “type=‘a‘ OR type=‘b‘“, 2, 2,  } { E, F } Root vertex Discovered vertex
  • 12.
    12 Graph Traversals byExample Traversal Configuration Result { { A }, “type=‘a‘“, 0, 1,  } { A, B, C, D } Root vertex Discovered vertex
  • 13.
    13 Level-Synchronous (LS) Traversal SCAN-BASEDGRAPH TRAVERSAL Scan partitions (dictionary-encoded)
  • 14.
    14 Level-Synchronous (LS) Traversal SCAN-BASEDGRAPH TRAVERSAL Keep (intermediate) results as bitsets
  • 15.
    15 Level-Synchronous (LS) Traversal SCAN-BASEDGRAPH TRAVERSAL Fetch neighbors by position
  • 16.
    16 Level-Synchronous (LS) Traversal SCAN-BASEDGRAPH TRAVERSAL Set operations on vectorized bitsets
  • 17.
    17 Level-Synchronous (LS) Traversal SCAN-BASEDGRAPH TRAVERSAL EDGE CLUSTERING Clustering by edge type Clustering by source vertex
  • 18.
    18 Fragmented-Incremental (FI) Traversal •Partition column into fragments • Track dependencies between fragments in index structure • Goal: Minimize number of fragment reads Fragment For a fragment size equal to |E|, FI-traversal degenerates to LS- traversal
  • 19.
    19 Fragmented-Incremental (FI) Traversal •Partition column into fragments • Track dependencies between fragments in index structure • Goal: Minimize number of fragment reads Transition 8  13  12 For a fragment size equal to |E|, FI-traversal degenerates to LS- traversal
  • 20.
    20 Fragmented-Incremental (FI) Traversal •Partition column into fragments • Track dependencies between fragments in index structure • Goal: Minimize number of fragment reads Transition Graph Probabilistic Fragment Synopsis For a fragment size equal to |E|, FI-traversal degenerates to LS- traversal
  • 21.
    21 Fragmented-Incremental (FI) Traversal •Partition column into fragments • Track dependencies between fragments in index structure • Goal: Minimize number of fragment reads Transition Graph Fragment Queue Execution Chain For a fragment size equal to |E|, FI-traversal degenerates to LS- traversal
  • 22.
    22 Experimental Evaluation EVALUATED REAL-WORLDDATA SETS • Six real-world data sets with different topology characteristics EVALUATED SYSTEMS • Implementation in main-memory column store prototype (C++) • Graph database (Neo4j) • RDF DBMS (Virtuoso 7.0 with columnar storage layout) • Commerical columnar RDBMS (via chained self-joins, with and without index support)
  • 23.
    23 Experimental Evaluation COMPARISON OFLS-TRAVERSAL AND FI-TRAVERSAL Traversal performance depends on the traversal depth and the topology FI-Traversal outperforms LS-Traversal by one order of magnitude
  • 24.
    24 Experimental Evaluation COMPARISON OFLS-TRAVERSAL AND FI-TRAVERSAL Traversal performance depends on the traversal depth and the topology LS-Traversal starts outperforming FI-Traversal
  • 25.
    25 Experimental Evaluation COMPARISON OFLS-TRAVERSAL AND FI-TRAVERSAL Traversal performance depends on the traversal depth and the topology Different performance characteristics for different fragment sizes
  • 26.
    26 Experimental Evaluation COMPARISON OFLS-TRAVERSAL AND FI-TRAVERSAL Traversal performance depends on the traversal depth and the topology
  • 27.
    27 Experimental Evaluation COMPARISON OFLS-TRAVERSAL AND FI-TRAVERSAL Traversal performance depends on the traversal depth and the topology Scan-based traversal outperforms fragmented traversal depending on traversal depth and graph topology
  • 28.
    28 Experimental Evaluation SYSTEM-LEVEL BENCHMARK Combinationof LS and FI-traversal outperforms native graph systems by up to two orders of magnitude
  • 29.
    29 Summary GRAPHITE • Graph processingtightly integrated into RDBMS • Extensions of core components by graph extensions (operators, cost model, index structures) • Topology characteristics-aware traversal operators GRAPH-SPECIFIC DATA STATISTICS AND ALGORITHMS• Diverse graph topologies demand different algorithmic design decisions • Index scan versus full column scan decision also applies for graph traversals FUTURE WORK • Integration with temporal, spatial, and text data • Language extensions for custom code executed during graph traversal Integration of GRAPHITE into RDBMS
  • 30.
    Contact Marcus Paradies, TUDresden m.paradies@sap.com https://wwwdb.inf.tu-dresden.de/team/external-members/marcus-paradies/

Editor's Notes

  • #14 MOM: Message Oriented Middlewared Middleware bezeichnet Middleware, die auf der asynchronen oder synchronen Kommunikation, also der Übertragung von Nachrichten (Messages) beruht. WSMS: Web Service Management Syste
  • #15 MOM: Message Oriented Middlewared Middleware bezeichnet Middleware, die auf der asynchronen oder synchronen Kommunikation, also der Übertragung von Nachrichten (Messages) beruht. WSMS: Web Service Management Syste
  • #16 MOM: Message Oriented Middlewared Middleware bezeichnet Middleware, die auf der asynchronen oder synchronen Kommunikation, also der Übertragung von Nachrichten (Messages) beruht. WSMS: Web Service Management Syste
  • #17 MOM: Message Oriented Middlewared Middleware bezeichnet Middleware, die auf der asynchronen oder synchronen Kommunikation, also der Übertragung von Nachrichten (Messages) beruht. WSMS: Web Service Management Syste
  • #18 MOM: Message Oriented Middlewared Middleware bezeichnet Middleware, die auf der asynchronen oder synchronen Kommunikation, also der Übertragung von Nachrichten (Messages) beruht. WSMS: Web Service Management Syste
  • #19 MOM: Message Oriented Middlewared Middleware bezeichnet Middleware, die auf der asynchronen oder synchronen Kommunikation, also der Übertragung von Nachrichten (Messages) beruht. WSMS: Web Service Management Syste
  • #20 MOM: Message Oriented Middlewared Middleware bezeichnet Middleware, die auf der asynchronen oder synchronen Kommunikation, also der Übertragung von Nachrichten (Messages) beruht. WSMS: Web Service Management Syste
  • #21 MOM: Message Oriented Middlewared Middleware bezeichnet Middleware, die auf der asynchronen oder synchronen Kommunikation, also der Übertragung von Nachrichten (Messages) beruht. WSMS: Web Service Management Syste
  • #22 MOM: Message Oriented Middlewared Middleware bezeichnet Middleware, die auf der asynchronen oder synchronen Kommunikation, also der Übertragung von Nachrichten (Messages) beruht. WSMS: Web Service Management Syste
  • #24 MOM: Message Oriented Middlewared Middleware bezeichnet Middleware, die auf der asynchronen oder synchronen Kommunikation, also der Übertragung von Nachrichten (Messages) beruht. WSMS: Web Service Management Syste
  • #25 MOM: Message Oriented Middlewared Middleware bezeichnet Middleware, die auf der asynchronen oder synchronen Kommunikation, also der Übertragung von Nachrichten (Messages) beruht. WSMS: Web Service Management Syste
  • #26 MOM: Message Oriented Middlewared Middleware bezeichnet Middleware, die auf der asynchronen oder synchronen Kommunikation, also der Übertragung von Nachrichten (Messages) beruht. WSMS: Web Service Management Syste
  • #27 MOM: Message Oriented Middlewared Middleware bezeichnet Middleware, die auf der asynchronen oder synchronen Kommunikation, also der Übertragung von Nachrichten (Messages) beruht. WSMS: Web Service Management Syste
  • #28 MOM: Message Oriented Middlewared Middleware bezeichnet Middleware, die auf der asynchronen oder synchronen Kommunikation, also der Übertragung von Nachrichten (Messages) beruht. WSMS: Web Service Management Syste