GRAPH ANALYTICS AND MACHINE LEARNING STANLEY WANG SOLUTION ARCHITECT, TECH LEAD @SWANG68 http://www.linkedin.com/in/stanley-wang-a2b143b
Mathematics on Graph • An abstract representation of a set of entities where some pairs are connected by links;  Entity (Vertex, Node)  Link ( Edge, Relationship)
What is Graph?
Constructing of Graph
Graph Affinity Matrix
Graph Laplacian Matrix
Update Function on Graph
Magic of Properties of Laplacian Matrix
What is a Graph Database? • A Database with an Explicit Graph Structure; • Each Node Knows its Adjacent Nodes; • As the Number of Nodes Increases, the Cost of a Local Step Remains the Same, O(n); • An Index for Lookups;
Relational Model vs Graph Model Optimized for Aggregation Optimized for Connections
RDBMS SQL vs NOSQL Complexity Big Table Column Family Size Key-Value Store Document Databases Graph Databases 90% of Use Cases Relational Databases
Performance Comparison
Value in Relationships Low High Key-Value Why Graph Databases? K V BigTable K V V V V Document Relational Graph 
NoSQL and Big Data 14 • Traditional databases handle big data sets, too. But, more on structure data; • NoSQL databases have poor analytics; • HDFS, MapReduce often works from text files; • NoSQL is more for high throughput, basically, AP from the CAP theorem, instead of CP; • In practice, Big Data is likely to be a mix of text files, NoSQL, and SQL RDBMS;
Graph Terminology • Graph Computation(Analytics): o Whole graph is processed, typically for several iterations  vertex-centric computation. o Examples: Belief Propagation, Pagerank, Community detection, Triangle Counting, Matrix Factorization, Machine Learning… • Graph Database (Queries): o Selective graph queries (compare to SQL queries) o Traversals: shortest-path, friends-of-friends,… 15
GRAPH ANALYTICS
What Graph Can Model?
Graphs are Essential to ML • Identify influential people and information; • Discover communities; • Understand people’s interests in common; • Model complex real life data dependencies; It’s all about GRAPH: The Value of Data is Proportional to the Number of Meaningful Relationships!
Complex Big Data Graph ML Algorithms
Graph Social Network Model Model can be easily used in real life applications for customer classification, profiling, segmentation and product recommendations.
Identifying Key People
Social Network Tie Recommendation
Full Stack Graph ML Algorithms
Typical Graph Analytics
Graph Analytics - Page Rank • PageRank, is about the importance of nodes in GRAPH – Link Analysis, which is defined as the probability falling into node depending on:  The probability landing onto one of the node’s neighbor;  The probability crossing the link from neighbor to it; o Identify the influential leader;
Graph Analytics - Triangle Count • Clustering coefficient (CC) is a measure of the degree to which nodes in a graph tend to cluster together; • Calculation of CC can be tuned to counting the number of triangles around one particular node in the graph; • CC indicates the degree to which a node’s neighbors are themselves neighbors; • CC of a graph is closely related to the transitivity of a graph;
Graph Analytics - Connected Components • Connected component is a subgraph in which any two vertices are connected and no additional vertices connected to the supergraph; • A graph is strongly connected if every vertex is reachable from other vertices. The strongly connected components form a partition into subgraphs that are themselves strongly connected; • A spanning tree is a subgraph of the original graph, which connect all the vertexes that where originally connected; • A minimum spanning tree (mst) is a spanning tree such that the sum of the weights of its edges is not greater than the sum of the edges of any other spanning tree;
Graph Analytics - Betweenness centrality • Betweenness centrality is an indicator of a node's centrality in a network, which is equal to the number of shortest paths from all vertices to all others that pass through that node; • A node with high betweenness centrality has a large influence on the transfer of items through the network; • Betweenness centrality is related to a network's connectivity;
Graph Social Media Recommendation
Graph Computing Opportunity Combining with the leading tools such as Graph Database, Machine Learning, High Performance Computing, Clustering, Streaming, Graph Computing Technology is ready to take off in Big Data Era!
Distributed Graph Analytics System
How to Construct Graph?
Graph ETL Data Flow
Graph ETL Example
Graph ETL Architecture

Graph analytic and machine learning

  • 1.
    GRAPH ANALYTICS AND MACHINELEARNING STANLEY WANG SOLUTION ARCHITECT, TECH LEAD @SWANG68 http://www.linkedin.com/in/stanley-wang-a2b143b
  • 2.
    Mathematics on Graph •An abstract representation of a set of entities where some pairs are connected by links;  Entity (Vertex, Node)  Link ( Edge, Relationship)
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
    Magic of Propertiesof Laplacian Matrix
  • 9.
    What is aGraph Database? • A Database with an Explicit Graph Structure; • Each Node Knows its Adjacent Nodes; • As the Number of Nodes Increases, the Cost of a Local Step Remains the Same, O(n); • An Index for Lookups;
  • 10.
    Relational Model vsGraph Model Optimized for Aggregation Optimized for Connections
  • 11.
    RDBMS SQL vs NOSQL Complexity BigTable Column Family Size Key-Value Store Document Databases Graph Databases 90% of Use Cases Relational Databases
  • 12.
  • 13.
    Value in Relationships LowHigh Key-Value Why Graph Databases? K V BigTable K V V V V Document Relational Graph 
  • 14.
    NoSQL and BigData 14 • Traditional databases handle big data sets, too. But, more on structure data; • NoSQL databases have poor analytics; • HDFS, MapReduce often works from text files; • NoSQL is more for high throughput, basically, AP from the CAP theorem, instead of CP; • In practice, Big Data is likely to be a mix of text files, NoSQL, and SQL RDBMS;
  • 15.
    Graph Terminology • GraphComputation(Analytics): o Whole graph is processed, typically for several iterations  vertex-centric computation. o Examples: Belief Propagation, Pagerank, Community detection, Triangle Counting, Matrix Factorization, Machine Learning… • Graph Database (Queries): o Selective graph queries (compare to SQL queries) o Traversals: shortest-path, friends-of-friends,… 15
  • 16.
  • 17.
  • 18.
    Graphs are Essentialto ML • Identify influential people and information; • Discover communities; • Understand people’s interests in common; • Model complex real life data dependencies; It’s all about GRAPH: The Value of Data is Proportional to the Number of Meaningful Relationships!
  • 19.
    Complex Big DataGraph ML Algorithms
  • 20.
    Graph Social NetworkModel Model can be easily used in real life applications for customer classification, profiling, segmentation and product recommendations.
  • 21.
  • 22.
    Social Network TieRecommendation
  • 23.
    Full Stack GraphML Algorithms
  • 24.
  • 25.
    Graph Analytics -Page Rank • PageRank, is about the importance of nodes in GRAPH – Link Analysis, which is defined as the probability falling into node depending on:  The probability landing onto one of the node’s neighbor;  The probability crossing the link from neighbor to it; o Identify the influential leader;
  • 26.
    Graph Analytics -Triangle Count • Clustering coefficient (CC) is a measure of the degree to which nodes in a graph tend to cluster together; • Calculation of CC can be tuned to counting the number of triangles around one particular node in the graph; • CC indicates the degree to which a node’s neighbors are themselves neighbors; • CC of a graph is closely related to the transitivity of a graph;
  • 27.
    Graph Analytics -Connected Components • Connected component is a subgraph in which any two vertices are connected and no additional vertices connected to the supergraph; • A graph is strongly connected if every vertex is reachable from other vertices. The strongly connected components form a partition into subgraphs that are themselves strongly connected; • A spanning tree is a subgraph of the original graph, which connect all the vertexes that where originally connected; • A minimum spanning tree (mst) is a spanning tree such that the sum of the weights of its edges is not greater than the sum of the edges of any other spanning tree;
  • 28.
    Graph Analytics -Betweenness centrality • Betweenness centrality is an indicator of a node's centrality in a network, which is equal to the number of shortest paths from all vertices to all others that pass through that node; • A node with high betweenness centrality has a large influence on the transfer of items through the network; • Betweenness centrality is related to a network's connectivity;
  • 29.
    Graph Social MediaRecommendation
  • 30.
    Graph Computing Opportunity Combiningwith the leading tools such as Graph Database, Machine Learning, High Performance Computing, Clustering, Streaming, Graph Computing Technology is ready to take off in Big Data Era!
  • 32.
  • 34.
  • 35.
  • 36.
  • 37.