How to Build An AI Based Customer Data Platform: Learn the design patterns for Real Time Use Cases

Kumar Ramanathan, Gautam Gupta How to Build An AI Based Customer Data Platform: Learn the design patterns for Real Time Use Cases September 2020

Personalize the journey for everyone in ecosystem 1 2 What do we know about our Customer? What do our customers need?

©2020 Intuit Inc. All rights reserved. 6 Financial Ownership WHAT IS THE FINANCIAL OWNERSHIP OF THIS USER? WHO IS THIS USER OR VISITOR? WHAT IS THEIR INTENT? Identity Resolution WHERE IS THIS USER IN THEIR JOURNEY WITH US/LIFE Customer Journey Unlock Relationships and ML driven Insights visitor Financial attributes user company ﬁnancial ownership offerings journey stage Intent persona transactions Graph Queries Analytics & ML Graph Mining

©2020 Intuit Inc. All rights reserved. 7 Why did we switch to Tiger Graph? ● Developer Friendly Platform ● Better 1-hop query performance & efﬁcient multi-hop queries ● 77% reduction in AWS infrastructure costs; 10x less IOPS ● Excellent, responsive customer support Improved OPEX Savings Developer Friendly Excellent Support

©2020 Intuit Inc. All rights reserved. 8 Use Case: Increase Sign-in Success Rate for Tax Prep If we Leverage Identity Graph in the Risk Scoring Service Sign-in Flow for TurboTax Online Then We can recognize more unknown visitors So that We can provide a lower friction sign-in experience for those visitors Increase Sign-in Success Rate

Intuit Conﬁdential and Proprietary 9 Identity Graph Stitching anonymous visitor to known user Returning Customer Recognition Frictionless Sign In/up Personalization visitor user user <> visitor :: stitch Clickstream: 159 columns x ∞ rows Users: 142M Nodes Input Model Pairwise binary classiﬁcation Let: Learn if pair (IVID, UID) is “matched” to each other where Θ parameter vector of the learned model Optimize resulting quadratic complexity by selecting subset Final prediction function: Chose unique UID, if exists: 99.9982% Ranked multiple UID candidates: 98.8609% Results● Identity graph able to recognize ~4% more visitors ● Sign-in Success rate for unrecognized cohort went from 89% to 94%

©2020 Intuit Inc. All rights reserved. 10 1. Readily Accessible Data Publishing data from source systems through batch and eventing to stream processing infrastructure & data lake Top 3 challenges for creating Identity Graph 2. Lack of Universal First-class Entities Creating universal deﬁnition of key entities like User, Visitor, Account etc. across product lines 3. Entity Resolution & Attribute Normalization Data across multiple sources is not resolved and normalized, through deterministic & predictive algorithms

©2020 Intuit Inc. All rights reserved. 12 1. Data Movement: Platform not Pipelines Why? Rapid increase of new data sources and existing ones changing fast Domain ownership for publishing data High quality, large scale, domain agnostic data infrastructure How? Generic event processing pipeline with built-in conﬁgurable stages Standardized implementation of domain agnostic stages ● Sessionization, Geo-Coding, Entity standardization & resolution, encryption/decryption, compression, schema validation, authentication/security controls, governance & compliance checks and many more. ● Metadata repository with beautiful UI for discoverability, lineage tracking & data trust ● Scalable operational platform - ability consume data from batch and stream sources with built-in auto scaling, monitoring, alerting, error handling etc. Support for adding custom stream computation stages for unique needs for specialized pipelines - ML feature computation, dynamic traits computation Step 2 Deduplication Step 3 Data Validation Step 4 NotiﬁcationAuthentication Step 1

©2020 Intuit Inc. All rights reserved. 13 2. Data Storage: Polyglot not Monolithic Why? Performance at scale is critical in a real-time data platform serving customer experiences Shaping data to match access patterns allows for optimal and efﬁcient access patterns Tools for operating distributed systems and native NoSQL (KV, Search, Graph) DBs have matured How? One can use following patterns for creating a data store to handle all the information about Customer: ● Leverage a KV data store for entities and attributes ● Use Search based persistence for search queries on the attributes ● Store relationships using a Graph database We can solve a wide variety of use cases with low latency. Using same database for handling Entities, Relationships and Search capabilities leads to performance bottlenecks and higher latencies.

©2020 Intuit Inc. All rights reserved. 14 3. Data Access: Right-for-me not One-size-fits-all Why? Data products are used in very different contexts. Need right interface for the right context. How? Support as many patterns below as possible: ● UI Widgets: Ability to data-enable products/features by embed winning experiences quickly ● Request/Response: Provide direct API access for synchronous communication ● Pub/Sub: Publish CDC notifications/messages to consumers who store data for specific use case ● Data Lake: To support offline model training or historical bootstrap for new pub/sub consumers

©2020 Intuit Inc. All rights reserved. 15 4. AI Toolchain: Deeply integrated not Bolted on Why? AI models evolve over time and need access to new data AI data needs grow with the evolution of models AI models need real time access to data in production How? Develop a self serve mechanism for onboarding new AI models and Features required for them ● Create an input for a rich Feature store from Entities, Attributes and Relationships ● Provide aggregation and functional formulation on attributes ● Capture feedback from AI models to provide 360 view of model performance ● Solve for Real time availability of Data to AI models ● Data Exploration -> Featurization -> Training -> Model Optimization - > Model Deployment -> Model execution

©2020 Intuit Inc. All rights reserved. 16 5. Data Entities: Self-serve not Product Backlog Why? Time to market for new data products or expanding the features of existing ones is an important driver of growth How? ● Mindset: Data is product. Domain teams think of data consumers as their customers just like the end users of their products through UI and other developers through their APIs. ● Producer and consumer work directly with each other to create value quickly using shared domain knowledge ● No new engineering work is needed in the domain agnostic part of the platform to add new data entities or attributes or inferring new relationships. ● Self-describing semantics for data - set of robust metadata capabilities that conﬁgures the platform behavior for speciﬁc use cases. Metadata determines which stages of the pipeline is executed, which version of business logic is run in a particular stage ● You’re successful when non-technical business users like product managers are able to discover data, understand the business meaning of it and expand the data set in a self-serve form

How to Build An AI Based Customer Data Platform: Learn the design patterns for Real Time Use Cases

More Related Content

What's hot

Similar to How to Build An AI Based Customer Data Platform: Learn the design patterns for Real Time Use Cases

More from TigerGraph

Recently uploaded

How to Build An AI Based Customer Data Platform: Learn the design patterns for Real Time Use Cases