Graph Gurus 21 Integrating Real-Time Deep-Link Graph Analytics With Spark AI
© 2019 TigerGraph. All Rights Reserved Today’s Presenter 2 Emma Liu Product Manager ● BS in Engineering from Harvey Mudd College, MS in Engineering Systems from MIT ● Prior engineering leadership experience at Oracle and MarkLogic ● Areas of specialty include cloud, containers, enterprise infrastructure, monitoring, management, and connectors
© 2019 TigerGraph. All Rights Reserved Some Housekeeping Items ● Although your phone is muted we do want to answer your questions - submit your questions at any time using the Q&A tab in the menu ● The webinar is being recorded and will uploaded to our website shortly (https://www.tigergraph.com/webinars-and-events/) and the URL will be emailed you ● If you have issues with Zoom please contact the panelists via chat 3
© 2019 TigerGraph. All Rights Reserved https://www.gartner.com/doc/2852717/it-market-clock-database-management “Graph analysis is possibly the single most effective competitive differentiator for organizations pursuing data-driven operations and decisions after the design of data capture.” 4
7 Key Data Science Capabilities Powered By a Native Parallel Graph
© 2019 TigerGraph. All Rights Reserved Power Explainable AI with TigerGraph 6
© 2019 TigerGraph. All Rights Reserved Why Spark + TigerGraph? +
© 2019 TigerGraph. All Rights Reserved Spark + TigerGraph Data Pipeline 8
© 2019 TigerGraph. All Rights Reserved Typical Spark + TigerGraph Integration ● Data Preparation and Integration (TigerGraph/Spark) ● Unsupervised Learning (TigerGraph) ● Feature Extraction for Supervised Learning (TigerGraph/Spark) ● Model Training (Spark) ● Validate and Apply Model (TigerGraph) ● Visualize and Explore Interconnected Data (TigerGraph) 9
© 2019 TigerGraph. All Rights Reserved Machine Learning with TigerGraph
11 Real-Time Phone-Based Fraud Detection Massive, Worldwide Problem ● 18 Billion robocalls in US in 2017 (hiya.com) ● Spam/Scam - agile, spoofed numbers Customer: ● 600M subscribers ● 300M calls/day, peak 10K calls/sec ● Need: Real-time detection of various types of phone-based fraud
Real-Time Phone Anti-Spam/Scam Detection 12 TigerGraph Solution: Real-time graph-based machine learning and decision system Graph Analytics ● Real-time machine learning ○ 118 graph features per call ○ Retrained periodically with 2M calls ● Real-time decisions ○ Call recipient sees alert if ML system says call is suspicious ● In production since Dec 2016 Graph Database ● 600M phone numbers (inside and outside network) ● 15B phone-phone call edges (2 month sliding window) ○ Time ○ Duration ● Real-time graph updates Peak 10K+ calls/sec ● 118 graph features per phone
© 2019 TigerGraph. All Rights Reserved Examples of Graph Features for Machine Learning Good Phone Features Bad Phone Features (1) Short term call duration (2) Empty stable group (3) No call back phone (4) Many rejected calls (5) Average distance > 3 Empty stable group Many rejected calls Average distance > 3 (1) High call back phone (2) Stable group (3) Long term phone (4) Many in-group connections (5) 3-step friend relation Stable group Many in-group connections Good Phone Features 3-step friend relation /// Good phone Bad phone X X X
© 2019 TigerGraph. All Rights Reserved China Mobile - Detecting Phone-Based Fraud by Analyzing Network or Graph Pattern Features 14 • Each phone node has a fraud flag, indicating it’s a good phone or a bad phone and what type of fraud: scam, harassment, advertisement • Run real-time GSQL query for each call: ○ Collect 118 features ○ Compute composite score ○ Update fraud flag ○ Return fraud type Real-Time Call Event Caller Callee Time Call Detail Records Caller Callee Time Duration Query Continuous Graph Update Fraud Type
© 2019 TigerGraph. All Rights Reserved Phone Fraud Real-Time Detection System phone vertex - fraud flag - expiration time target4 target3 - num of call - total duration - call date list - num of rejection target2 target1 ● 600 Million Vertices ● 15+ Billion Edges ● 300 Million Daily Updatesphone_phone
© 2019 TigerGraph. All Rights Reserved Case 1: Call type was recently flagged Real-time Call Event Call Time Caller ID Callee ID If caller was recently flagged as “bad” If Caller is classified as “bad”Classifier Query Real-time Collect Caller’s Graph Features Update
© 2019 TigerGraph. All Rights Reserved Case 2: Call needs to be classified Real-time Call Event Call Time Caller ID Callee ID If caller was recently flagged as “bad” If Caller is classified as “bad”Classifier Query Real-time Collect Caller’s Graph Features Update Input: list of calls with phone pairs and call time (batch) Output: 1. Call fraud type; 2. Scoring and feature vector of fraud calls for supporting evidence Explainable AI
© 2019 TigerGraph. All Rights Reserved China Mobile Machine Learning Workflow 1. Data labels from police reports and online third party sources 2. A total of 118 graph features analyzed to build fraud detection model 3. All 118 graph features collected by one GSQL query 4. Training data’s features collected in GSQL in batch processing and stored as CSV file for future model training 5. TigerGraph performs fraud scoring with multiple Machine Learning models in real-time 6. Machine Learning models are trained offline and model parameters stored as configuration files for GSQL to use for real-time scoring (Future: Training ML models in Spark)
© 2019 TigerGraph. All Rights Reserved Machine Learning with TigerGraph Real-time Scoring with Multiple ML models in GSQL Efficient EasyFast Real-time response for both feature collection and scoring Aggregation during traversal - multiple features in one Collect complex features without multiple RDBMS joins
© 2019 TigerGraph. All Rights Reserved China Mobile Anti-Fraud Results from TigerGraph Machine Learning Solutions ● 3.2 million fraud notifications in Shandong Province (Dec 2016 – July 2019) ● Save potential loss ~39.86 million RMB (~ 6 million US dollars)
© 2019 TigerGraph. All Rights Reserved Why Spark + TigerGraph? +
© 2019 TigerGraph. All Rights Reserved Why TigerGraph + Spark For Machine Learning? Parallel processing, distributed systems in training, ETL & feature collections Capture business moments with real-time response with explainable AI 22 Enrich machine learning with complex graph features AT SCALE ! AT SCALE ! AT SCALE !
© 2019 TigerGraph. All Rights Reserved Spark and TigerGraph Data Pipeline Static Data Sources TigerGraph JDBC Driver Streaming Data Sources
© 2019 TigerGraph. All Rights Reserved JDBC Driver (v1.2) ● Type 4 driver ● Support Read and Write bi-directional data flow to TigerGraph ● Read: Converts ResultSet to DataFrame ● Write: Load DataFrame and files to vertex/edge in TigerGraph ● Supports REST endpoints of built-in, compiled and interpreted GSQL queries from TigerGraph ● Open Source: ● https://github.com/tigergraph/ecosys/tree/master/etl/tg-jdbc-driver
© 2019 TigerGraph. All Rights Reserved DEMO Graph Feature Extraction from TigerGraph to Spark Via TigerGraph’s JDBC Driver
© 2019 TigerGraph. All Rights Reserved Examples of Graph Features for Machine Learning Good Phone Features Bad Phone Features (1) Short term call duration (2) Empty stable group (3) No call back phone (4) Many rejected calls (5) Average distance > 3 Empty stable group Many rejected calls Average distance > 3 (1) High call back phone (2) Stable group (3) Long term phone (4) Many in-group connections (5) 3-step friend relation Stable group Many in-group connections Good Phone Features 3-step friend relation /// Good phone Bad phone X X X
© 2019 TigerGraph. All Rights Reserved Graph Features: Stable Group & InGroup Connection • Stable Group: phones in the target group that have regular calls (stable connection) with source phone • Stable InGroup Connections: phones in the target group that have regular calls (stable connection) among themselves Stable Connection defined as ● Has both call and callback ● Num of calls is larger than a given limit ● Total duration is larger than a given limit
© 2019 TigerGraph. All Rights Reserved Resources ● TigerGraph Cloud Machine Learning Starter Kit a. Register at tgcloud.us ● JDBC Driver (Open Source) b. https://github.com/tigergraph/ecosys/tree/master/etl/tg-jdbc-driver 28
© 2019 TigerGraph. All Rights Reserved Graph analysis is possibly the single most effective competitive differentiator for organizations pursuing data-driven operations and decisions after the design of data capture.” Realtime deep link graph analytics at scale is the differentiator to your machine learning pipeline!
Q&A Please submit your questions via the Q&A tab in Zoom 30
© 2019 TigerGraph. All Rights Reserved Additional Resources Start Free at TigerGraph Cloud Today! https://www.tigergraph.com/cloud/ Test Drive Online Demo https://www.tigergraph.com/demo Download the Developer Edition https://www.tigergraph.com/download/ Guru Scripts https://github.com/tigergraph/ecosys/tree/master/guru_scripts Join our Developer Forum https://groups.google.com/a/opengsql.org/forum/#!forum/gsql-users
© 2019 TigerGraph. All Rights Reserved Coming To A City Near You 32 Let us know if you would like to help organize a Graph Gurus Comes To You workshop in your city https://info.tigergraph.com/graph-gurus-request
Thank You

Graph Gurus 21: Integrating Real-Time Deep-Link Graph Analytics with Spark AI

  • 1.
    Graph Gurus 21 IntegratingReal-Time Deep-Link Graph Analytics With Spark AI
  • 2.
    © 2019 TigerGraph.All Rights Reserved Today’s Presenter 2 Emma Liu Product Manager ● BS in Engineering from Harvey Mudd College, MS in Engineering Systems from MIT ● Prior engineering leadership experience at Oracle and MarkLogic ● Areas of specialty include cloud, containers, enterprise infrastructure, monitoring, management, and connectors
  • 3.
    © 2019 TigerGraph.All Rights Reserved Some Housekeeping Items ● Although your phone is muted we do want to answer your questions - submit your questions at any time using the Q&A tab in the menu ● The webinar is being recorded and will uploaded to our website shortly (https://www.tigergraph.com/webinars-and-events/) and the URL will be emailed you ● If you have issues with Zoom please contact the panelists via chat 3
  • 4.
    © 2019 TigerGraph.All Rights Reserved https://www.gartner.com/doc/2852717/it-market-clock-database-management “Graph analysis is possibly the single most effective competitive differentiator for organizations pursuing data-driven operations and decisions after the design of data capture.” 4
  • 5.
    7 Key DataScience Capabilities Powered By a Native Parallel Graph
  • 6.
    © 2019 TigerGraph.All Rights Reserved Power Explainable AI with TigerGraph 6
  • 7.
    © 2019 TigerGraph.All Rights Reserved Why Spark + TigerGraph? +
  • 8.
    © 2019 TigerGraph.All Rights Reserved Spark + TigerGraph Data Pipeline 8
  • 9.
    © 2019 TigerGraph.All Rights Reserved Typical Spark + TigerGraph Integration ● Data Preparation and Integration (TigerGraph/Spark) ● Unsupervised Learning (TigerGraph) ● Feature Extraction for Supervised Learning (TigerGraph/Spark) ● Model Training (Spark) ● Validate and Apply Model (TigerGraph) ● Visualize and Explore Interconnected Data (TigerGraph) 9
  • 10.
    © 2019 TigerGraph.All Rights Reserved Machine Learning with TigerGraph
  • 11.
    11 Real-Time Phone-Based FraudDetection Massive, Worldwide Problem ● 18 Billion robocalls in US in 2017 (hiya.com) ● Spam/Scam - agile, spoofed numbers Customer: ● 600M subscribers ● 300M calls/day, peak 10K calls/sec ● Need: Real-time detection of various types of phone-based fraud
  • 12.
    Real-Time Phone Anti-Spam/ScamDetection 12 TigerGraph Solution: Real-time graph-based machine learning and decision system Graph Analytics ● Real-time machine learning ○ 118 graph features per call ○ Retrained periodically with 2M calls ● Real-time decisions ○ Call recipient sees alert if ML system says call is suspicious ● In production since Dec 2016 Graph Database ● 600M phone numbers (inside and outside network) ● 15B phone-phone call edges (2 month sliding window) ○ Time ○ Duration ● Real-time graph updates Peak 10K+ calls/sec ● 118 graph features per phone
  • 13.
    © 2019 TigerGraph.All Rights Reserved Examples of Graph Features for Machine Learning Good Phone Features Bad Phone Features (1) Short term call duration (2) Empty stable group (3) No call back phone (4) Many rejected calls (5) Average distance > 3 Empty stable group Many rejected calls Average distance > 3 (1) High call back phone (2) Stable group (3) Long term phone (4) Many in-group connections (5) 3-step friend relation Stable group Many in-group connections Good Phone Features 3-step friend relation /// Good phone Bad phone X X X
  • 14.
    © 2019 TigerGraph.All Rights Reserved China Mobile - Detecting Phone-Based Fraud by Analyzing Network or Graph Pattern Features 14 • Each phone node has a fraud flag, indicating it’s a good phone or a bad phone and what type of fraud: scam, harassment, advertisement • Run real-time GSQL query for each call: ○ Collect 118 features ○ Compute composite score ○ Update fraud flag ○ Return fraud type Real-Time Call Event Caller Callee Time Call Detail Records Caller Callee Time Duration Query Continuous Graph Update Fraud Type
  • 15.
    © 2019 TigerGraph.All Rights Reserved Phone Fraud Real-Time Detection System phone vertex - fraud flag - expiration time target4 target3 - num of call - total duration - call date list - num of rejection target2 target1 ● 600 Million Vertices ● 15+ Billion Edges ● 300 Million Daily Updatesphone_phone
  • 16.
    © 2019 TigerGraph.All Rights Reserved Case 1: Call type was recently flagged Real-time Call Event Call Time Caller ID Callee ID If caller was recently flagged as “bad” If Caller is classified as “bad”Classifier Query Real-time Collect Caller’s Graph Features Update
  • 17.
    © 2019 TigerGraph.All Rights Reserved Case 2: Call needs to be classified Real-time Call Event Call Time Caller ID Callee ID If caller was recently flagged as “bad” If Caller is classified as “bad”Classifier Query Real-time Collect Caller’s Graph Features Update Input: list of calls with phone pairs and call time (batch) Output: 1. Call fraud type; 2. Scoring and feature vector of fraud calls for supporting evidence Explainable AI
  • 18.
    © 2019 TigerGraph.All Rights Reserved China Mobile Machine Learning Workflow 1. Data labels from police reports and online third party sources 2. A total of 118 graph features analyzed to build fraud detection model 3. All 118 graph features collected by one GSQL query 4. Training data’s features collected in GSQL in batch processing and stored as CSV file for future model training 5. TigerGraph performs fraud scoring with multiple Machine Learning models in real-time 6. Machine Learning models are trained offline and model parameters stored as configuration files for GSQL to use for real-time scoring (Future: Training ML models in Spark)
  • 19.
    © 2019 TigerGraph.All Rights Reserved Machine Learning with TigerGraph Real-time Scoring with Multiple ML models in GSQL Efficient EasyFast Real-time response for both feature collection and scoring Aggregation during traversal - multiple features in one Collect complex features without multiple RDBMS joins
  • 20.
    © 2019 TigerGraph.All Rights Reserved China Mobile Anti-Fraud Results from TigerGraph Machine Learning Solutions ● 3.2 million fraud notifications in Shandong Province (Dec 2016 – July 2019) ● Save potential loss ~39.86 million RMB (~ 6 million US dollars)
  • 21.
    © 2019 TigerGraph.All Rights Reserved Why Spark + TigerGraph? +
  • 22.
    © 2019 TigerGraph.All Rights Reserved Why TigerGraph + Spark For Machine Learning? Parallel processing, distributed systems in training, ETL & feature collections Capture business moments with real-time response with explainable AI 22 Enrich machine learning with complex graph features AT SCALE ! AT SCALE ! AT SCALE !
  • 23.
    © 2019 TigerGraph.All Rights Reserved Spark and TigerGraph Data Pipeline Static Data Sources TigerGraph JDBC Driver Streaming Data Sources
  • 24.
    © 2019 TigerGraph.All Rights Reserved JDBC Driver (v1.2) ● Type 4 driver ● Support Read and Write bi-directional data flow to TigerGraph ● Read: Converts ResultSet to DataFrame ● Write: Load DataFrame and files to vertex/edge in TigerGraph ● Supports REST endpoints of built-in, compiled and interpreted GSQL queries from TigerGraph ● Open Source: ● https://github.com/tigergraph/ecosys/tree/master/etl/tg-jdbc-driver
  • 25.
    © 2019 TigerGraph.All Rights Reserved DEMO Graph Feature Extraction from TigerGraph to Spark Via TigerGraph’s JDBC Driver
  • 26.
    © 2019 TigerGraph.All Rights Reserved Examples of Graph Features for Machine Learning Good Phone Features Bad Phone Features (1) Short term call duration (2) Empty stable group (3) No call back phone (4) Many rejected calls (5) Average distance > 3 Empty stable group Many rejected calls Average distance > 3 (1) High call back phone (2) Stable group (3) Long term phone (4) Many in-group connections (5) 3-step friend relation Stable group Many in-group connections Good Phone Features 3-step friend relation /// Good phone Bad phone X X X
  • 27.
    © 2019 TigerGraph.All Rights Reserved Graph Features: Stable Group & InGroup Connection • Stable Group: phones in the target group that have regular calls (stable connection) with source phone • Stable InGroup Connections: phones in the target group that have regular calls (stable connection) among themselves Stable Connection defined as ● Has both call and callback ● Num of calls is larger than a given limit ● Total duration is larger than a given limit
  • 28.
    © 2019 TigerGraph.All Rights Reserved Resources ● TigerGraph Cloud Machine Learning Starter Kit a. Register at tgcloud.us ● JDBC Driver (Open Source) b. https://github.com/tigergraph/ecosys/tree/master/etl/tg-jdbc-driver 28
  • 29.
    © 2019 TigerGraph.All Rights Reserved Graph analysis is possibly the single most effective competitive differentiator for organizations pursuing data-driven operations and decisions after the design of data capture.” Realtime deep link graph analytics at scale is the differentiator to your machine learning pipeline!
  • 30.
    Q&A Please submit yourquestions via the Q&A tab in Zoom 30
  • 31.
    © 2019 TigerGraph.All Rights Reserved Additional Resources Start Free at TigerGraph Cloud Today! https://www.tigergraph.com/cloud/ Test Drive Online Demo https://www.tigergraph.com/demo Download the Developer Edition https://www.tigergraph.com/download/ Guru Scripts https://github.com/tigergraph/ecosys/tree/master/guru_scripts Join our Developer Forum https://groups.google.com/a/opengsql.org/forum/#!forum/gsql-users
  • 32.
    © 2019 TigerGraph.All Rights Reserved Coming To A City Near You 32 Let us know if you would like to help organize a Graph Gurus Comes To You workshop in your city https://info.tigergraph.com/graph-gurus-request
  • 33.