Graph Gurus 26 Using Graph Algorithms for Advanced Analytics Part 1 - Shortest Paths 1
© 2020 TigerGraph. All Rights Reserved Some Housekeeping Items ● Although your phone is muted we do want to answer your questions - submit your questions at any time using the Q&A tab in the menu ● The webinar is being recorded and will uploaded to our website shortly (https://www.tigergraph.com/webinars-and-events/) and the URL will be emailed you ● If you have issues with Zoom please contact the panelists via chat 2
© 2020 TigerGraph. All Rights Reserved Today's Presenter 3 Victor Lee Head of Product Strategy ● BS in Electrical Engineering and Computer Science from UC Berkeley, MS in Electrical Engineering from Stanford University ● PhD in Computer Science from Kent State University focused on graph data mining ● 20+ years in tech industry
© 2020 TigerGraph. All Rights Reserved Move Faster with TigerGraph Cloud Built for agile teams who would rather build innovative applications than procure hardware or configure and manage databases 4
© 2020 TigerGraph. All Rights Reserved Today’s Outline 5 1 3 2 Graph Analytics - uses and benefits Shortest Path Algorithms How to select and use the right one What are Graph Algorithms? Overview of different types 4 Demo GSQL Path Algorithms
© 2020 TigerGraph. All Rights Reserved Data-Driven Business Collect more data. See more. Understand more. Decide better. 6
© 2020 TigerGraph. All Rights Reserved Analyzing Connected Data ● All data processing is "connecting data" A + B = C ● Linking datasets together enriches your knowledge. ● Statistical Analysis and Machine Learning are about correlation: seeing how one set of things relates to another set of things. ⇒ Graphs are an ideal way to organize and analyze complex data. 7 Online sales In-store salesStore Demographics Web analytics
© 2020 TigerGraph. All Rights Reserved 8 What is a Graph? ● A Graph is a collection of Vertices and Vertex-to-Vertex Connections, called Edges ● Each Vertex (sometimes referred to as a node) and each Edge has a type and characteristic properties. ● Natural storage model for modeling data and its interconnections or relationships ● Natural model for representing transactions ● Natural model for knowledge/analysis/ learning – through following and studying connections Matt Customer Chris Customer John Customer Bob Customer Stephanie Partner Jack Sales Dan Partner Tom Sales W orkswith Friends w ith Partners with W orkswith Sells to Friends with Friends with Knows Product A Product B Owns Loves Bought Resells Property Graph
© 2020 TigerGraph. All Rights Reserved Why Are Graphs Important for Business? 9 100% Annual Growth in Graph through 2022 Graph Named Trend No. 5 on Gartner’s Top 10 Data and Analytics Technology Trends for 2019 What is graph analytics? A set of analytic techniques that allows for the exploration of relationships between entities of interest such as organizations, people and transactions. What’s unique about graph? Graph data stores can efficiently model, explore and query data with complex interrelationships across data silos. Why is graph one of the top 10 trends for data and analytics? Graph analytics is growing due to the need to ask complex questions across complex data, which is not always practical or even possible at scale using SQL queries.
© 2020 TigerGraph. All Rights Reserved Graph algorithms are functions for measuring characteristics of graphs, vertices, or relationships. A graph algorithm library serves as a toolkit and as building blocks for analyzing your data. Specialized functions Combine to make something greater Graph Algorithms 10
© 2020 TigerGraph. All Rights Reserved GSQL Graph Algorithm Library ● Written in GSQL - high-level, parallelized ● Open-source, user-extensible ● Well-documented 11 docs.tigergraph.com/graph-algorithm-library
© 2020 TigerGraph. All Rights Reserved Example Questions/Analyses for Graph Algorithms Which entity is most centrally located? ● For delivery logistics or greatest visibility ● Closeness Centrality, Betweenness Centrality algorithms 12 How much influence does this entity exert over the others? ● For market penetration & buyer influence ● PageRank algorithm Which entity has similar relationships to this entity? ● For grouping customers, products, etc. ● Cosine Similarity, SimRank, RoleSim algorithms What are the natural community groupings in the graph? ● For partitioning risk groups, workgroups, product offerings, etc. ● Community Detection, MinCut algorithms
© 2020 TigerGraph. All Rights Reserved Picking the Right Algorithm for the Job To become a master designer: ● Learn what each tool can do ● Learn to combine building blocks to solve complex issues This webinar series is designed to show you what each tool can do. We'll also show you some application examples. 13
© 2020 TigerGraph. All Rights Reserved Case Study: Finding Regions of Influence Step 1: Hub Detection/Ranking - Finding the Influencers 14 Additional details at https://www.tigergraph.com/solutions/product-service-marketing/ Source: Pharmaceutical/Life Science company seeking the most influential prescribers.
© 2020 TigerGraph. All Rights Reserved Case Study: Finding Regions of Influence Step 1: Community Detection 15 Additional details at https://www.tigergraph.com/solutions/product-service-marketing/
© 2020 TigerGraph. All Rights Reserved Some Types of Graph Algorithms ● Search ● Path Finding & Analytics ● Centrality / Ranking ● Clustering / Community Detection ● Similarity ● Classification 16
© 2020 TigerGraph. All Rights Reserved Path Analytics A B C D E F H G Is there a path from A to H? ● How many paths? ● What is the shortest or cheapest path? ● What is the degree of separation (number of hops)? ○ The smaller the degree, the greater the knowledge/influence/trust of the other ○ Likelihood of encountering one another ○ Referral for jobs, etc. ● Games: ○ Kevin Bacon number: https://oracleofbacon.org/ ○ Erdős number 17
© 2020 TigerGraph. All Rights Reserved Common Path Questions ● What is the shortest distance from vertex A to vertex B? ● What is a shortest path (sequence of vertices) from A to B? ● Case 1: Unweighted edges ○ We only care if a relationship exists, not how strong or long it is A B C D E F H G 18
© 2020 TigerGraph. All Rights Reserved Shortest Distance in an Unweighted Graph A B C D E F H G Short Distance algorithm: ● Count levels in breadth-first search ● Side benefit: Also finds the shortest distance to other vertices along the way 1 1 1 2 2 3 3 function: shortest_distance_unweighted(A, H) output: 3 19
© 2020 TigerGraph. All Rights Reserved Shortest Path in an Unweighted Graph A B C D E F H G Short Path algorithm: ● Use breadth-first search ● Record how you got to each vertex ● Variation: return 1 shortest path or all shortest paths? A A A A,B A,C A,B,E A,B,E; A,C,F function: shortest_path_unweighted(A, H) output: {A,B,E,H},{A,C,F,H} ● Side benefit: Also finds the shortest path to other vertices along the way ● Improved version: Search simultaneously from both source and destination (bidirectional search) 20
© 2020 TigerGraph. All Rights Reserved Shortest Path in a Weighted Graph ● Case 2: Weighted edges ○ Each edge has a numerical weight, which is the "cost" to traverse that edge. ○ Weight could represent distance or elapsed time or cost A B C D E F H G 2 4 2 1 2 3 5 2 3 2 3 21
© 2020 TigerGraph. All Rights Reserved Dijkstra's Algorithm for Shortest Path ● Finds a shortest distance or path from a source to every other vertex. ● Works by assuming the worst case (infinite distance) and then incrementally finding better (shorter) paths. ● Procedure: ○ Initialize every distance to ∞. ○ Use breadth-first search BUT go in order of shortest distance. ○ Every time an edge reaches a destination vertex, update the shortest distance AND the path to get there. ● More complex (slower) than unweighted shortest path. A B C D E F H G 2 4 2 1 2 3 5 2 3 2 3 22
© 2020 TigerGraph. All Rights Reserved Dijkstra's Algorithm for Shortest Path A B C D E F H G 2 4 2 1 2 3 5 2 3 2 3 2,A 4,A 2,A ∞ ∞ ∞ ∞ A B C D E F H G 2 4 2 1 2 3 5 2 3 2 3 2,A 3,AD 2,A 4,AB 6,ADC ∞ ∞ 23
© 2020 TigerGraph. All Rights Reserved Dijkstra's Algorithm for Shortest Path A B C D E F H G 2 4 2 1 2 3 5 2 3 2 3 2,A 3,AD 2,A 4,AB 6,ADC|ABE 7,ABE 9,ABE A B C D E F H G 2 4 2 1 2 3 5 2 3 2 3 2,A 3,AD 2,A 4,AB 6,ADC|ABE 7,ABE 8,ADCF|ABEF function: shortest_path_weighted(A, H) output: {A,D,C,F,H},{A,B,E,F,H} 24
© 2020 TigerGraph. All Rights Reserved Review: Analytics with Graph Algorithms ● Graph Algorithms answer fundamental questions about connected data ● Each algorithm in a library is tool in an analytics toolkit. ● Building Blocks for more complex business questions ● Some questions/algorithms are faster or slower than others 25
© 2020 TigerGraph. All Rights Reserved Practical SW Issues with Graph Algorithms Wish List: ● Algorithms are functions you can call with your application ● Run the algorithms in-database (don't export the data) ● Option to update the graph with the algorithm results ● Be able to modify/customize the algorithms ● Massively parallel processing to handle big graphs 26
© 2020 TigerGraph. All Rights Reserved TigerGraph GSQL Graph Algorithm Library ✓ Call each algorithm as a GSQL function or as a RESTful endpoint ✓ Run the algorithms in-database (don't export the data) ✓ Option to update the graph with the algorithm results ✓ Able to modify/customize the algorithms. Turing-complete language. ✓ Massively parallel processing to handle big graphs 27
DEMO GSQL Graph Algorithms in TigerGraph Cloud 28
© 2020 TigerGraph. All Rights Reserved Real-World Example: Airline Routes ● Dataset: Global commercial airline routes from openflights.org ○ 7,698 airports ○ 67,664 direct flight routes ● Unweighted search: find the route(s) from A to B with the fewest stops, ignoring actual distance traveled ○ Simplifications: not considering actual flight schedules or which airlines or prices ● Add weight: distance from city to city ○ BUT dataset doesn't have that info. Write a simple GSQL to calculate the distance, based on locations of source and destination, and add it to the graph data ● Weighted search: find the shortest route from A to B 29
© 2020 TigerGraph. All Rights Reserved Enhancements (Might or might not do, if we have time to develop and time to demo) ● ALL the shortest routes (e.g., there might be multiple 1-stop routes from Des Moines to Cleveland ● Find the Top K shortest routes ● Find routes only on a given airline ● Find multi-hop routes that don't change airlines 30
© 2020 TigerGraph. All Rights Reserved Summary 31 1 4 3 Graph Algorithms - tools and building blocks for analyzing graph data GSQL Algorithm Library - runs in-database, high-performance, easy to read and modify Shortest Path Algorithms - different algorithms for weighted and unweighted graphs 2 Learning To Use Algorithms - know what problem they solve, pros and cons
Q&A Please submit your questions via the Q&A tab in Zoom 32
© 2020 TigerGraph. All Rights Reserved More Questions? Join our Developer Forum https://groups.google.com/a/opengsql.org/forum/#!forum/gsql-users Sign up for our Developer Office Hours (every Thursday at 11 AM PST) https://info.tigergraph.com/officehours 33
© 2020 TigerGraph. All Rights Reserved Additional Resources Start Free at TigerGraph Cloud Today! https://www.tigergraph.com/cloud/ Test Drive Online Demo https://www.tigergraph.com/demo Download the Developer Edition https://www.tigergraph.com/download/ Guru Scripts https://github.com/tigergraph/ecosys/tree/master/guru_scripts 34
© 2020 TigerGraph. All Rights Reserved Upcoming Graph Guru Events 35 Coming to Dallas, Houston, Austin, Seattle, San Francisco… and Munich! More are in the works. View all events and request your own here: https://www.tigergraph.com/graphguruscomestoyou/ Graph Gurus 27: An In-Database Machine Learning Solution For Real-Time Recommendations https://info.tigergraph.com/graph-gurus-27
Thank You 36

Graph Gurus Episode 26: Using Graph Algorithms for Advanced Analytics Part 1

  • 1.
    Graph Gurus 26 UsingGraph Algorithms for Advanced Analytics Part 1 - Shortest Paths 1
  • 2.
    © 2020 TigerGraph.All Rights Reserved Some Housekeeping Items ● Although your phone is muted we do want to answer your questions - submit your questions at any time using the Q&A tab in the menu ● The webinar is being recorded and will uploaded to our website shortly (https://www.tigergraph.com/webinars-and-events/) and the URL will be emailed you ● If you have issues with Zoom please contact the panelists via chat 2
  • 3.
    © 2020 TigerGraph.All Rights Reserved Today's Presenter 3 Victor Lee Head of Product Strategy ● BS in Electrical Engineering and Computer Science from UC Berkeley, MS in Electrical Engineering from Stanford University ● PhD in Computer Science from Kent State University focused on graph data mining ● 20+ years in tech industry
  • 4.
    © 2020 TigerGraph.All Rights Reserved Move Faster with TigerGraph Cloud Built for agile teams who would rather build innovative applications than procure hardware or configure and manage databases 4
  • 5.
    © 2020 TigerGraph.All Rights Reserved Today’s Outline 5 1 3 2 Graph Analytics - uses and benefits Shortest Path Algorithms How to select and use the right one What are Graph Algorithms? Overview of different types 4 Demo GSQL Path Algorithms
  • 6.
    © 2020 TigerGraph.All Rights Reserved Data-Driven Business Collect more data. See more. Understand more. Decide better. 6
  • 7.
    © 2020 TigerGraph.All Rights Reserved Analyzing Connected Data ● All data processing is "connecting data" A + B = C ● Linking datasets together enriches your knowledge. ● Statistical Analysis and Machine Learning are about correlation: seeing how one set of things relates to another set of things. ⇒ Graphs are an ideal way to organize and analyze complex data. 7 Online sales In-store salesStore Demographics Web analytics
  • 8.
    © 2020 TigerGraph.All Rights Reserved 8 What is a Graph? ● A Graph is a collection of Vertices and Vertex-to-Vertex Connections, called Edges ● Each Vertex (sometimes referred to as a node) and each Edge has a type and characteristic properties. ● Natural storage model for modeling data and its interconnections or relationships ● Natural model for representing transactions ● Natural model for knowledge/analysis/ learning – through following and studying connections Matt Customer Chris Customer John Customer Bob Customer Stephanie Partner Jack Sales Dan Partner Tom Sales W orkswith Friends w ith Partners with W orkswith Sells to Friends with Friends with Knows Product A Product B Owns Loves Bought Resells Property Graph
  • 9.
    © 2020 TigerGraph.All Rights Reserved Why Are Graphs Important for Business? 9 100% Annual Growth in Graph through 2022 Graph Named Trend No. 5 on Gartner’s Top 10 Data and Analytics Technology Trends for 2019 What is graph analytics? A set of analytic techniques that allows for the exploration of relationships between entities of interest such as organizations, people and transactions. What’s unique about graph? Graph data stores can efficiently model, explore and query data with complex interrelationships across data silos. Why is graph one of the top 10 trends for data and analytics? Graph analytics is growing due to the need to ask complex questions across complex data, which is not always practical or even possible at scale using SQL queries.
  • 10.
    © 2020 TigerGraph.All Rights Reserved Graph algorithms are functions for measuring characteristics of graphs, vertices, or relationships. A graph algorithm library serves as a toolkit and as building blocks for analyzing your data. Specialized functions Combine to make something greater Graph Algorithms 10
  • 11.
    © 2020 TigerGraph.All Rights Reserved GSQL Graph Algorithm Library ● Written in GSQL - high-level, parallelized ● Open-source, user-extensible ● Well-documented 11 docs.tigergraph.com/graph-algorithm-library
  • 12.
    © 2020 TigerGraph.All Rights Reserved Example Questions/Analyses for Graph Algorithms Which entity is most centrally located? ● For delivery logistics or greatest visibility ● Closeness Centrality, Betweenness Centrality algorithms 12 How much influence does this entity exert over the others? ● For market penetration & buyer influence ● PageRank algorithm Which entity has similar relationships to this entity? ● For grouping customers, products, etc. ● Cosine Similarity, SimRank, RoleSim algorithms What are the natural community groupings in the graph? ● For partitioning risk groups, workgroups, product offerings, etc. ● Community Detection, MinCut algorithms
  • 13.
    © 2020 TigerGraph.All Rights Reserved Picking the Right Algorithm for the Job To become a master designer: ● Learn what each tool can do ● Learn to combine building blocks to solve complex issues This webinar series is designed to show you what each tool can do. We'll also show you some application examples. 13
  • 14.
    © 2020 TigerGraph.All Rights Reserved Case Study: Finding Regions of Influence Step 1: Hub Detection/Ranking - Finding the Influencers 14 Additional details at https://www.tigergraph.com/solutions/product-service-marketing/ Source: Pharmaceutical/Life Science company seeking the most influential prescribers.
  • 15.
    © 2020 TigerGraph.All Rights Reserved Case Study: Finding Regions of Influence Step 1: Community Detection 15 Additional details at https://www.tigergraph.com/solutions/product-service-marketing/
  • 16.
    © 2020 TigerGraph.All Rights Reserved Some Types of Graph Algorithms ● Search ● Path Finding & Analytics ● Centrality / Ranking ● Clustering / Community Detection ● Similarity ● Classification 16
  • 17.
    © 2020 TigerGraph.All Rights Reserved Path Analytics A B C D E F H G Is there a path from A to H? ● How many paths? ● What is the shortest or cheapest path? ● What is the degree of separation (number of hops)? ○ The smaller the degree, the greater the knowledge/influence/trust of the other ○ Likelihood of encountering one another ○ Referral for jobs, etc. ● Games: ○ Kevin Bacon number: https://oracleofbacon.org/ ○ Erdős number 17
  • 18.
    © 2020 TigerGraph.All Rights Reserved Common Path Questions ● What is the shortest distance from vertex A to vertex B? ● What is a shortest path (sequence of vertices) from A to B? ● Case 1: Unweighted edges ○ We only care if a relationship exists, not how strong or long it is A B C D E F H G 18
  • 19.
    © 2020 TigerGraph.All Rights Reserved Shortest Distance in an Unweighted Graph A B C D E F H G Short Distance algorithm: ● Count levels in breadth-first search ● Side benefit: Also finds the shortest distance to other vertices along the way 1 1 1 2 2 3 3 function: shortest_distance_unweighted(A, H) output: 3 19
  • 20.
    © 2020 TigerGraph.All Rights Reserved Shortest Path in an Unweighted Graph A B C D E F H G Short Path algorithm: ● Use breadth-first search ● Record how you got to each vertex ● Variation: return 1 shortest path or all shortest paths? A A A A,B A,C A,B,E A,B,E; A,C,F function: shortest_path_unweighted(A, H) output: {A,B,E,H},{A,C,F,H} ● Side benefit: Also finds the shortest path to other vertices along the way ● Improved version: Search simultaneously from both source and destination (bidirectional search) 20
  • 21.
    © 2020 TigerGraph.All Rights Reserved Shortest Path in a Weighted Graph ● Case 2: Weighted edges ○ Each edge has a numerical weight, which is the "cost" to traverse that edge. ○ Weight could represent distance or elapsed time or cost A B C D E F H G 2 4 2 1 2 3 5 2 3 2 3 21
  • 22.
    © 2020 TigerGraph.All Rights Reserved Dijkstra's Algorithm for Shortest Path ● Finds a shortest distance or path from a source to every other vertex. ● Works by assuming the worst case (infinite distance) and then incrementally finding better (shorter) paths. ● Procedure: ○ Initialize every distance to ∞. ○ Use breadth-first search BUT go in order of shortest distance. ○ Every time an edge reaches a destination vertex, update the shortest distance AND the path to get there. ● More complex (slower) than unweighted shortest path. A B C D E F H G 2 4 2 1 2 3 5 2 3 2 3 22
  • 23.
    © 2020 TigerGraph.All Rights Reserved Dijkstra's Algorithm for Shortest Path A B C D E F H G 2 4 2 1 2 3 5 2 3 2 3 2,A 4,A 2,A ∞ ∞ ∞ ∞ A B C D E F H G 2 4 2 1 2 3 5 2 3 2 3 2,A 3,AD 2,A 4,AB 6,ADC ∞ ∞ 23
  • 24.
    © 2020 TigerGraph.All Rights Reserved Dijkstra's Algorithm for Shortest Path A B C D E F H G 2 4 2 1 2 3 5 2 3 2 3 2,A 3,AD 2,A 4,AB 6,ADC|ABE 7,ABE 9,ABE A B C D E F H G 2 4 2 1 2 3 5 2 3 2 3 2,A 3,AD 2,A 4,AB 6,ADC|ABE 7,ABE 8,ADCF|ABEF function: shortest_path_weighted(A, H) output: {A,D,C,F,H},{A,B,E,F,H} 24
  • 25.
    © 2020 TigerGraph.All Rights Reserved Review: Analytics with Graph Algorithms ● Graph Algorithms answer fundamental questions about connected data ● Each algorithm in a library is tool in an analytics toolkit. ● Building Blocks for more complex business questions ● Some questions/algorithms are faster or slower than others 25
  • 26.
    © 2020 TigerGraph.All Rights Reserved Practical SW Issues with Graph Algorithms Wish List: ● Algorithms are functions you can call with your application ● Run the algorithms in-database (don't export the data) ● Option to update the graph with the algorithm results ● Be able to modify/customize the algorithms ● Massively parallel processing to handle big graphs 26
  • 27.
    © 2020 TigerGraph.All Rights Reserved TigerGraph GSQL Graph Algorithm Library ✓ Call each algorithm as a GSQL function or as a RESTful endpoint ✓ Run the algorithms in-database (don't export the data) ✓ Option to update the graph with the algorithm results ✓ Able to modify/customize the algorithms. Turing-complete language. ✓ Massively parallel processing to handle big graphs 27
  • 28.
    DEMO GSQL Graph Algorithmsin TigerGraph Cloud 28
  • 29.
    © 2020 TigerGraph.All Rights Reserved Real-World Example: Airline Routes ● Dataset: Global commercial airline routes from openflights.org ○ 7,698 airports ○ 67,664 direct flight routes ● Unweighted search: find the route(s) from A to B with the fewest stops, ignoring actual distance traveled ○ Simplifications: not considering actual flight schedules or which airlines or prices ● Add weight: distance from city to city ○ BUT dataset doesn't have that info. Write a simple GSQL to calculate the distance, based on locations of source and destination, and add it to the graph data ● Weighted search: find the shortest route from A to B 29
  • 30.
    © 2020 TigerGraph.All Rights Reserved Enhancements (Might or might not do, if we have time to develop and time to demo) ● ALL the shortest routes (e.g., there might be multiple 1-stop routes from Des Moines to Cleveland ● Find the Top K shortest routes ● Find routes only on a given airline ● Find multi-hop routes that don't change airlines 30
  • 31.
    © 2020 TigerGraph.All Rights Reserved Summary 31 1 4 3 Graph Algorithms - tools and building blocks for analyzing graph data GSQL Algorithm Library - runs in-database, high-performance, easy to read and modify Shortest Path Algorithms - different algorithms for weighted and unweighted graphs 2 Learning To Use Algorithms - know what problem they solve, pros and cons
  • 32.
    Q&A Please submit yourquestions via the Q&A tab in Zoom 32
  • 33.
    © 2020 TigerGraph.All Rights Reserved More Questions? Join our Developer Forum https://groups.google.com/a/opengsql.org/forum/#!forum/gsql-users Sign up for our Developer Office Hours (every Thursday at 11 AM PST) https://info.tigergraph.com/officehours 33
  • 34.
    © 2020 TigerGraph.All Rights Reserved Additional Resources Start Free at TigerGraph Cloud Today! https://www.tigergraph.com/cloud/ Test Drive Online Demo https://www.tigergraph.com/demo Download the Developer Edition https://www.tigergraph.com/download/ Guru Scripts https://github.com/tigergraph/ecosys/tree/master/guru_scripts 34
  • 35.
    © 2020 TigerGraph.All Rights Reserved Upcoming Graph Guru Events 35 Coming to Dallas, Houston, Austin, Seattle, San Francisco… and Munich! More are in the works. View all events and request your own here: https://www.tigergraph.com/graphguruscomestoyou/ Graph Gurus 27: An In-Database Machine Learning Solution For Real-Time Recommendations https://info.tigergraph.com/graph-gurus-27
  • 36.