Graph Algorithms For Developers Amy Hodler & William Lyon, @Neo4j Seattle Graph Tour July 2019 bit.ly/seattlealgos
Relationships The Strongest Predictors of Behavior! “Increasingly we're learning that you can make better predictions about people by getting all the information from their friends and their friends’ friends than you can from the information you have about the person themselves” James Fowler David Burkus Albert-László Barabási 3
William Lyon @lyonwj 4 Neo4j Labs Engineer Amy Hodler @amyhodler Analytics & AI Program Manager
Graph Platform ● Database management system (DBMS) ● Property Graph data model ● Cypher query language ● Graph analytics ● Data visualization ● Developer tool for building applications What is Neo4j? neo4j.com/
Labeled Property Graph
Query (e.g. Cypher/Python) Real-time, local decisioning and pattern matching Graph Algorithms Libraries Global analysis and iterations You know what you’re looking for and making a decision You’re learning the overall structure of a network, updating data, and predicting Local Patterns Global Computation
What Do People Do with Graph Algorithms?
Understand & Predict Complex Behavior Requires Understanding Relationships and Structures Flow & Dynamics Interactions & Resiliency Propagation Pathways
Using Graph Algorithms Explore, Plan, Measure Find significant patterns and plan for optimal structures Score outcomes and set a threshold value for a prediction Feature Engineering for Machine Learning The measures as features to train 1st Node 2nd Node Common Neighbors Preferential Attachment label 1 2 4 15 1 3 4 7 12 1 5 6 1 1 0
• Current data science models ignore network structure & complex relationships • Graphs add highly predictive features to existing ML models • Otherwise unattainable predictions based on relationships More Accurate Predictions with the Data You Already Have Machine Learning Pipeline
Financial Crimes Recommendations Cybersecurity Predictive Maintenance Customer Segmentation Churn Prediction Search & MDM Graph Data Science Applications EXAMPLES Drug Discovery 12
Why Not Use Other Types of Analytics?
Structure is Hard to Unfold
15 NumberofNODES Number of RELATIONSHIPS per Node Average Distribution Most nodes have the same number of relationships “No Network in Nature that we know of that would be described by the Random network model.” –Albert-László Barabási Random Network
16 NumberofNODES Number of RELATIONSHIPS per Node Power-Law Distribution Some nodes have a lot of relationships but MOST have very few Structured Network
17 NumberofNODES Number of RELATIONSHIPS per Node Many approaches erroneously focus on the average population where few entities actually exist Graphs help us invest in populous areas Find strategic entities Uncover structural information
Graph Algorithms Extract Structure and Infer Behavior Source: “Communities, modules and large-scale structure in networks“ - Mark Newman Source: “Hierarchical structure and the prediction of missing links in networks”; ”Structure and inference in annotated networks” - A. Clauset, C. Moore, and M.E.J. Newman.
Neo4j Graph Algorithms
Graph Algorithm Categories in Neo4j neo4j.com/ graph-algorithms- book/ Book Signing 3:15 Pathfinding & Search Centrality / Importance Community Detection Link Prediction Finds optimal paths or evaluates route availability and quality Determines the importance of distinct nodes in the network Detects group clustering or partition options Evaluates how alike nodes are Estimates the likelihood of nodes forming a future relationship Similarity
Graph Algorithms in Neo4j • Parallel Breadth First Search & DFS • Shortest Path • Single-Source Shortest Path • All Pairs Shortest Path • Minimum Spanning Tree • A* Shortest Path • Yen’s K Shortest Path • K-Spanning Tree (MST) • Random Walk • Degree Centrality • Closeness Centrality • CC Variations: Harmonic, Dangalchev, Wasserman & Faust • Betweenness Centrality • Approximate Betweenness Centrality • PageRank • Personalized PageRank • ArticleRank • Eigenvector Centrality • Triangle Count • Clustering Coefficients • Connected Components (Union Find) • Strongly Connected Components • Label Propagation • Louvain Modularity – 1 Step & Multi- Step • Balanced Triad (identification) • Euclidean Distance • Cosine Similarity • Jaccard Similarity • Overlap Similarity • Pearson Similarity Pathfinding & Search Centrality / Importance Community Detection Similarity neo4j.com/docs/ graph-algorithms/current/ Updated June 2019 Link Prediction • Adamic Adar • Common Neighbors • Preferential Attachment • Resource Allocations • Same Community • Total Neighbors +35
Demo: Business Reviews Application
Yelp Public Dataset (as a graph) https://yelp.com/dataset
24 How To Use Graph Algos To Enhance An Application? ● Category hierarchy - knowledge graph ○ Overlap similarity algorithm ● Showing relevant reviews ○ Personalized PageRank ● Personalized recommendations ○ Jaccard similarity ○ Label propagation
25 Business Reviews Application Neo4j JavaScript Driver
26 Cypher Query Language
27 Cypher Query Language average(5,4,2) = 3.67
Neo4j Graph Algorithms Plugin One-click install in Neo4j Desktop … or grab plugin directly https://neo4j.com/download-center/
1. Call as Cypher procedure 1. Pass in specification (Label, Prop, Query) and configuration 1. stream variant returns (a lot) of results CALL algo.<name>.stream('Label','TYPE',{conf}) YIELD nodeId, score 2. non-stream variant writes results to graph returns statistics CALL algo.<name>('Label','TYPE',{conf}) How To… Pathfinding & Search Centrality / Importance Community Detection Link Prediction Similarity
30
31 Find All Categories
Overlap Similarity Algorithm Ideal choice for finding hierarchy in data and developing super and sub- categories Overlap similarity coefficient represents the co-occurrence of items between groups A B A B
Jaccard Similarity Algorithm Often used to find recommendations of similar items as well as part of link prediction Jaccard similarity measures the similarity between sets A B A B
35 Category Co-Occurence What categories show up together?
36 Running Overlap Similarity In Neo4j 1) Find category co- occurences 1) Pass to overlap similarity procedure
37 Find Category Hierarchy
38 Find Top Level Categories Only return categories without an outgoiong NARROWER_THAN relationship
39
40
PageRank & Personalized PageRank Measures the transitive (directional) influence of nodes and considers the influence of neighbors and their neighbors Personalized PageRank CALL algo.pageRank('Page', 'LINKS', {iterations:20, dampingFactor:0.85, sourceNodes: [siteA]})
44 Using Personalized PageRank To Surface Relevant Reviews ● Find influential reviewers in my network ● Find users who have reviewed the same businesses as me
45 Inferred Relationships
46 Projected User-User Graph Leverage inferred relationships. We can run graph algorithms on these projected graphs.
47 Running Personalized PageRank
48 Running Personalized PageRank
49 Running Personalized PageRank
50 Running Personalized PageRank
51 Query Using TRUSTS Relationships Order reviews by PageRank score on TRUSTS relationship
53 Personalized Recommendations Content based vs collaborative filtering ● Photo based recommendations: 1) Similar photos using Jaccard similarity 2) Cluster similar photos using Label Propagation 3) Recommend businesses connected to photos in the same community
Label Propagation Nodes adopt labels based on neighbors to infer clusters IterateIterate Community Detection
55 Business Photos
56 Photo Labels Labeled using Google’s Vision API https://cloud.google.com/vision/ https://github.com/moxious/vision-api
57 Photo Labels
A B A BB Jaccard Similarity
59 How Similar Are These Photos?
60 Jaccard Similarity
Label Propagation IterateIterate Community Detection Adds partition property to group photos by community Group by community and count photos per community
Label Propagation Community Detection
Photo Based Recommendations What photos do you like?
64
A Visual Evaluation of GoT
Enter the NEuler (Algorithms Playground)
install.graphapp.io neo4j.com/developer/ neo4j-desktop
69 Game of Thrones (TV-Series) • Based on Andrew Beveridge's script to graph work • 400 Nodes (people) • 3,550 Relationships (interactions)
Triangles and Clustering Coefficient Communities Measures can be counted/normalized globally u Triangles = 2 CC= 0.2 Triangle Count determines the number of triangles passing through a node in the graph Clustering Coefficient is the probability that neighbors of a particular node are connected to each other u Triangles = 2 CC= 0.33
Triangles and Clustering Coefficient Communities Use When Basic network analysis, e.g. does the network exhibit small-world structures? Estimating stability Finding structural holes Scoring for machine learning Spam Classification Semi-streaming web page analysis (local triangle and CC) I don’t like spam!
NEuler Demo
A C D E B F G Strongly Connected Components All nodes can reach each other following direction, but not necessarily directly Find cycles, collapse tight communities, estimate similarity in-group Connected Components All nodes can each other when disregarding direction Find disconnected subgraphs or nodes in common, preprocess data Communities
NEuler Demo
Betweenness Centrality Influence Tip / Caution Computationally intensive: use RA Brandes approximation on large graphs. Assumes all communication between nodes happens along the shortest path and with the same frequency (not always the case in real life) The sum of the % shortest paths that pass through a node, calculated by pairs
Betweenness Centrality - Uses Influence Use When Identify bridges Uncover control points Find bottlenecks and vulnerabilities Network Resilience Key points of cascading failure
NEuler Demo
Resources
80 neo4jsandbox.com
81 Online Training! Applied Graph Algorithms Data Science With Neo4j https://neo4j.com/graphacademy/online-training/data-science https://neo4j.com/graphacademy/online-training/applied-graph-algorithms
Free Digital Book neo4j.com/ graph-algorithms-book • Spark & Neo4j Examples • Machine Learning Chapter
neo4j.com/online-summit/ 30+ Sessions. Online. October 10.
Questions? Amy Hodler amy.hodler@neo4j.com Will Lyon will@neo4j.com bit.ly/seattlealgos

Graph Algorithms for Developers