How Graph Algorithms Answer your Business Questions in Banking and Beyond
This document provides an agenda and overview for a presentation on using graph algorithms in banking. The presentation introduces graphs and the Neo4j graph database, demonstrates sample banking data modeled as a graph, and reviews several graph algorithms that could be used for applications like fraud detection, including PageRank, weakly connected components, node similarity, and Louvain modularity. The document concludes with a demo and Q&A section.
How Graph Algorithms Answer your Business Questions in Banking and Beyond
1.
Graph Algorithms inBanking Joe Depeau Sr. Presales Consultant, UK 15th April, 2020 @joedepeau http://linkedin.com/in/joedepeau
2.
• Introduction toGraphs and Neo4j • Introduction to The Neo4j Graph Data Science Library • Demo Data Overview • Review of Graph Algorithms for Demo • Demo • Q&A 2 Agenda
7 Car DRIVES name: “Dan” born: May29, 1970 twitter: “@dan” name: “Ann” born: Dec 5, 1975 since: Jan 10, 2011 brand: “Volvo” model: “V70” Anatomy of a Property Graph Database Nodes • Represent the objects in the graph • Can be labeled Relationships • Relate nodes by type and direction Properties • Name-value pairs that can go on nodes and relationships. LOVES LOVES LIVES WITH OW NS Person Person
Graph Algorithms arecalculations that describe the topology and connectivity of your graph 9 What the heck are graph algorithms? - Global traversals & computations - Learning overall structure - Typically heuristics and approximations - Extracting new data from what you already have What’s important? What’s similar? What are efficient traversals?
10.
10 ...and what doI do with them? Explore, plan, measure Find significant patterns and plan for optimal structures Score outcomes and set a threshold value for a prediction Machine learning Use the measures as features to train an ML model 1st node 2nd node Common neighbors Preferential attachment Label 1 2 4 15 1 3 4 7 12 1 5 6 1 1 0
11.
11 Tell me more! Pathfinding &Search Centrality / Importance Community Detection Link Prediction Finds optimal paths or evaluates route availability and quality. Determines the importance of distinct nodes in the network. Detects group clustering or partition options. Evaluates how alike nodes are by neighbors and relationships. Estimates the likelihood of nodes forming a future relationship. Similarity
12.
Graph and MLalgorithms in Neo4j • Minimum Weight Spanning Tree • Shortest Path • Single Source Shortest Path • All Pairs Shortest Path • A* • Yen’s K-shortest Paths • Random Walk • Breadth First Search • Depth First Search • Degree Centrality • Closeness Centrality • Betweenness Centrality • PageRank • ArticleRank • Eigenvector Centrality • Triangle Count / Clustering Coefficient • Weakly Connected Components • Strongly Connected Components • Label Propagation • Louvain Modularity • K-1 Colouring • Modularity Optimisation • Node Similarity • Approximate Nearest Neighbours • Cosine Similarity • Euclidean Similarity • Jaccard Similarity • Overlap Similarity • Pearson Similarity Pathfinding & Search Centrality / Importance Community Detection Similarity https://neo4j.com/docs/graph-data-science/1.0/ Link Prediction • Adamic Adar • Common Neighbours • Preferential Attachment • Resource Allocations • Same Community • Total Neighbours 12
14 Some Examples ofTypical Bank Data Event DataProduct and Services Data Customer DataOrganisational Data 3rd Party Data Documentation Employee Data Processes Systems and Databases KPIs and Reports Address Personal Data Documents Relationships Assets Documentation Processes Product / Service Details Product / Service Hierarchy Pricing Money Movements Web / App Activity Customer Contact Social Media Credit Rating Agencies Market Data Organisational Hierarchy Corporate Data
15.
15 Some Examples ofTypical Bank Data Event DataProduct and Services Data Customer DataOrganisational Data 3rd Party Data Documentation Employee Data Processes Systems and Databases KPIs and Reports Address Personal Data Documents Relationships Assets Documentation Processes Product / Service Details Product / Service Hierarchy Pricing Money Movements Web / App Activity Customer Contact Social Media Credit Rating Agencies Market Data Organisational Hierarchy Corporate Data
PageRank What: Finds importantnodes based on their relationships. Why: Identify important or influential Client nodes by quantifying the flows of money towards them. Uses: - Fraud detection - Anti-money Laundering - Inform prioritization during analysis and investigation19
21 The PageRank Algorithm PageRank:what nodes can be considered ‘important’ in our graph based on money flows ? Inputs .pagerank Property Output
22.
Weakly Connected Components What:Finds disconnected community subgraphs in our data. Why: Identify communities based on connections with shared pieces of identity. Uses: - Householding - Synthetic identities - Stolen identities 22
23.
23 The Weakly ConnectedComponents Algorithm Weakly Connected Components: what communities exist in the data based on connections to pieces of identity ?
24.
24 The Weakly ConnectedComponents Algorithm Weakly Connected Components: what communities exist in the data based on connections to pieces of identity ? .component_id Property Output Inputs
25.
Node Similarity What: Similaritybetween nodes based on neighbours. Writes a new relationship to the graph. Why: Identify similar nodes who share common pieces of identity. Uses: - Entity Resolution - Synthetic identities - Stolen identities 25
26.
26 The Node SimilarityAlgorithm Node Similarity : how similar are two Client nodes based on pieces of shared identity ?
27.
27 The Node SimilarityAlgorithm Node Similarity : how similar are two Client nodes based on pieces of shared identity ? SIMILAR Relationship Output with .score property Inputs
28.
Louvain Modularity What: Findscommunities in our graph who are connected. Can return intermediate results. Why: Useful for identifying communities based on transaction behaviour rather than identity. Uses: - Fraud ring detection - Anti-money Laundering 28