Oracle Machine Learning From Oracle Data Professional to Oracle Data Scientist Charlie Berger Sr. Director Product Management, Machine Learning, AI and CognitiveAnalytics, charlie.berger@oracle.com www.twitter.com/CharlieDataMine Copyright © 2019 Oracle and/or its affiliates. Move the Algorithms; Not the Data!
Safe harbor statement The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, timing, and pricing of any features or functionality described for Oracle’s products may change and remains at the sole discretion of Oracle Corporation. 2 Copyright © 2020 Oracle and/or its affiliates.
Goal Share an attainable, logical, evolutionary path for Oracle data professionals to add machine learning to their valuable Oracle data skills to extract more information, insights and to make predictions. Copyright © 2020 Oracle and/or its affiliates.Copyright © 2020 Oracle and/or its affiliates.
Oracle Database Converged Features Oracle Machine Learning Copyright © 2020 Oracle and/or its affiliates.
Oracle Mission Statement “Our mission is to help people see data in new ways, discover insights, unlock endless possibilities” © 2020 Oracle - Portland OUG Training Day 10/22/2020Copyright © 2020 Oracle and/or its affiliates.
Operational DBAs spend a lot of time… 85% of security breaches occurred after the CVE was published DB Maestro Security 85% 91% experience unplanned data center outages Healthcare IT News Database downtime costs $7,900 / minute DB Maestro Reliability 91% 72% of IT budget is spent on generic maintenance tasks vs innovation ComputerWorld Maintenance 72% Copyright © 2020 Oracle and/or its affiliates.
Oracle Autonomous Database Can Help 7 Self-Driving Automates all database and infrastructure management, monitoring, tuning Self-Securing Protects from both external attacks and malicious internal users Self-Repairing Protects from all downtime including planned maintenance Oracle Autonomous Database does the grunt work forYOU! Copyright © 2020 Oracle and/or its affiliates.
HIGHER AGILITY LOWER RISK You are more VALUABLE Automation moves DBA up the value chain… Copyright © 2020 Oracle and/or its affiliates.
Data Engineer Architecture, “data wrangler” Machine Learning Solving data-driven problems Discovering insights Making predictions Data Security Data classification, Data life-cycle mgmt ApplicationTuning SQL tuning, connection mgmt The Evolution of the DBA/Database Developer Role Copyright © 2020 Oracle and/or its affiliates.
“Why Oracle? Because that’s where the data is!” Larry Ellison, Executive Chairman and CTO of Oracle Corporation Copyright © 2020 Oracle and/or its affiliates.
Algorithms automatically sift through large amounts of data to discover hidden patterns, new insights and make predictions What is Machine Learning? Identify most important factor (Attribute Importance) Predict customer behavior (Classification) Find profiles of targeted people or items (Classification Predict or estimate a value (Regression) Segment a population (Clustering) Find fraudulent or “rare events” (Anomaly Detection) Determine co-occurring items in a “basket” (Associations) X1 X2 A1A2A3A4 A5A6 A7 SupervisedLearningUnsupervisedLearning Copyright © 2020 Oracle and/or its affiliates.
Example Machine Learning Use Cases http://www.slideshare.net/bigdataelephants/big-data-elephants-strategic-consulting-engineering-services-34175779
Machine Learning Algorithms Need Data Move the Algorithms, Not the Data! 3 An “AI Database”? A “Thinking Database”? It Changes Everything! Copyright © 2020 Oracle and/or its affiliates.
Oracle Machine Learning Oracle Machine Learning extends Oracle Database(s) and enables users to build “AI” applications and analytics dashboards OML delivers powerful in-database machine learning algorithms, automated ML functionality, and integration with open source Python* and R. Oracle Machine Learning OML Services* Model Deployment and Management, Cognitive Image and Text OML4SQL SQL API OML4Py* Python API OML4R R API OML Notebooks with Apache Zeppelin on Autonomous Database OML4Spark R API on Big Data Oracle Data Miner Oracle SQL Developer extension * Coming soonCopyright © 2020 Oracle and/or its affiliates.
Operationalizing and Embedding ML Length of time to put a model into production. Based on 141 respondents who stated they are doing this today How long does it take to put a defined model into operational use? ? ? ? Copyright © 2020 Oracle and/or its affiliates.
Why Do 87% of Data Science Projects Never Make It Into Production? But now that it’s a team sport, … work is now being embedded into the fabric of the company, it’s essential that every person on the team is able to collaborate with everyone else: the data engineers, the data stewards, people that understand the data science, or analytics, or BI specialists, all the way up to DevOps and engineering. “This is a big place that holds companies back because they’re not used to collaborating in this way,” Leff says. “Because when they take those insights, and they flip them over the wall, now you’re asking an engineer to rewrite a data science model created by a data scientist, how’s that work out, usually?” “Well,” Chapo says, “It doesn’t.” “Oftentimes people imagine a world where we’re doing this amazing, fancy, unicorn, sprinkling-pixie-dust sort of AI projects,” he said. https://venturebeat.com/2019/07/19/why-do-87-of-data-science-projects-never-make-it-into-production/
OML Functionality and Supported Languages Copyright © 2020 Oracle and/or its affiliates.
CLASSIFICATION Naïve Bayes Logistic Regression (GLM) Decision Tree Random Forest Neural Network SupportVector Machine Explicit Semantic Analysis CLUSTERING Hierarchical K-Means Hierarchical O-Cluster Expectation Maximization (EM) ANOMALY DETECTION One-Class SVM TIME SERIES Forecasting - Exponential Smoothing Includes popular models e.g. Holt-Winters with trends, seasonality, irregularity, missing data REGRESSION Linear Model Generalized Linear Model SupportVector Machine (SVM) Stepwise Linear regression Neural Network ATTRIBUTE IMPORTANCE Minimum Description Length Principal Comp Analysis (PCA) Unsupervised Pair-wise KL Div CUR decomposition for row & AI ASSOCIATION RULES A priori/ market basket PREDICTIVE QUERIES Predict, cluster, detect, features SQL ANALYTICS SQL Windows SQL Patterns SQL Aggregates Oracle Machine Learning Algorithms FEATURE EXTRACTION Principal Comp Analysis (PCA) Non-negative Matrix Factorization Singular Value Decomposition (SVD) Explicit Semantic Analysis (ESA) TEXT MINING SUPPORT Algorithms support text Tokenization and theme extraction Explicit Semantic Analysis (ESA) for document similarity STATISTICAL FUNCTIONS Basic statistics: min, max, median, stdev, t-test, F-test, Pearson’s, Chi-Sq, ANOVA, etc. R & PYTHON Third-party R & Python Packages through Embedded Execution Spark MLlib algorithm integration MODEL DEPLOYMENT & MONITORING SQL—1st Class Objects Oracle RESTful API (ORDS) OML Services X1 X2 A1 A2 A3 A4 A5 A6 A7 Includes support for Partitioned Models,Transactional data and aggregations, Unstructured data, Geo-spatial data, Graph data * Coming soon * Coming soon Copyright © 2020 Oracle and/or its affiliates.
STATISTICAL FUNCTIONS Descriptive statistics (e.g. median, stdev, mode, sum, etc.) Hypothesis testing (t-test, F-test, Kolmogorov-Smirnov test, Mann Whitney test, Wilcoxon Signed Ranks test Correlations analysis (parametric and nonparametric e.g. Pearson’s test for correlation, Spearman's rho coefficient, Kendall's tau-b correlation coefficient) Ranking functions CrossTabulations with Chi-square statistics Linear regression ANOVA (Analysis of variance) Test Distribution fit (e.g., Normal distribution test, Binomial test, Weibull test, Uniform test, Exponential test, Poisson test) Statistical Aggregates (min, max, mean, median, stdev, mode, quantiles, plus x sigma, minus x sigma, top n outliers, bottom n outliers) Statistical Functions and Analytical SQL ANALYTICAL SQL SQL Windows SQL Aggregate functions LAG/LEAD functions SQL for Pattern Matching Additional approximate query processing: APPROX_COUNT, APPROX_SUM, APPROX_RANK Regular Expressions Copyright © 2020 Oracle and/or its affiliates.
Goal: Manage and Analyze All Your Data Big Data SQL SQL / R / Python Object Store “Engineered Features” – Derived attributes that reflect domain knowledge—key to best models e.g.: • Counts • Totals • Changes over time Boil down the Data Lake Architecturally, Many Options and Flexibility Coming soon Copyright © 2020 Oracle and/or its affiliates.
OML for SQL Model Build & SQL Apply BEGIN DBMS_DATA_MINING.CREATE_MODEL( model_name => 'BUY_INSUR1', mining_function => dbms_data_mining.classification, data_table_name => 'CUST_INSUR_LTV', case_id_column_name => 'CUST_ID', target_column_name => 'BUY_INSURANCE', settings_table_name => 'CUST_INSUR_LTV_SET'); END; Simple SQL Syntax—Classification Model Select prediction_probability(BUY_INSUR1, 'Yes' USING 3500 as bank_funds, 825 as checking_amount, 400 as credit_balance, 22 as age, 'Married' as marital_status, 93 as MONEY_MONTLY_OVERDRAWN, 1 as house_ownership) from dual; ML Model Build (PL/SQL) Model Apply (SQL query) Copyright © 2020 Oracle and/or its affiliates.
OML for SQL Model Build BEGIN DBMS_DATA_MINING.CREATE_MODEL( model_name => 'BUY_INSURANCE_AI', mining_function => DBMS_DATA_MINING.ATTRIBUTE_IMPORTANCE, data_table_name => 'CUST_INSUR_LTV', case_id_column_name => 'cust_id', target_column_name => 'BUY_INSURANCE', settings_table_name => 'Att_Import_Model_Settings'); END; Simple SQL Syntax—Attribute Importance SELECT attribute_name, explanatory_value, rank FROM BUY_INSURANCE_AI ORDER BY rank, attribute_name; ML Model Build (PL/SQL) Model Results (SQL query) ATTRIBUTE_NAME RANK ATTRIBUTE_VALUE BANK_FUNDS 1 0.2161 MONEY_MONTLY_OVERDRAWN 2 0.1489 N_TRANS_ATM 3 0.1463 N_TRANS_TELLER 4 0.1156 T_AMOUNT_AUTOM_PAYMENTS 5 0.1095 A1A2A3A4 A5 A6 A7 Copyright © 2020 Oracle and/or its affiliates.
OML for R Model Build > ore.odmAI (BUY_INSURANCE ~ ., CUST_INSUR_LTV) Call: ore.odmAI(formula = BUY_INSURANCE ~ ., data = CUST_INSUR_LTV) Simple R Language Syntax—Attribute Importance ML Model Build (R) Model Results (R) Importance: importance rank BANK_FUNDS 0.2161187797 1 MONEY_MONTLY_OVERDRAWN 0.1489347141 2 N_TRANS_ATM 0.1463026512 3 N_TRANS_TELLER 0.1155879786 4 T_AMOUNT_AUTOM_PAYMENTS 0.1095178647 5 A1A2A3A4 A5A6 A7 Copyright © 2020 Oracle and/or its affiliates.
OML for Python Model Build—Coming soon! > ai_mod = ai(**setting) # Create AI model object > ai_mod = ai_mod.fit(train_x, train_y) Simple Python Language Syntax—Attribute Importance ML Model Build (Python) Model Results (Python) Importance: variable importance rank BANK_FUNDS 0.2161187797 1 MONEY_MONTLY_OVERDRAWN 0.1489347141 2 N_TRANS_ATM 0.1463026512 3 N_TRANS_TELLER 0.1155879786 4 T_AMOUNT_AUTOM_PAYMENTS 0.1095178647 5 A1A2A3A4 A5A6 A7 Copyright © 2020 Oracle and/or its affiliates.
Oracle Data Miner UI Easy to use to define analytical methodologies that can be shared SQL Developer Extension Workflow API and generates SQL code for immediate deployment Drag and Drop,Workflows, Easy to Use UI for “Citizen Data Scientist” Copyright © 2020 Oracle and/or its affiliates.
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 26
OML4R R languageSQL “push down” Transparency layer for “push down” to equivalent SQL for parallelized in-DB processing Direct access to DB data ROracle pkg for OCI connectivity “Embedded R” call outs to R packages R Language API to OML Algorithms and Integration with R Copyright © 2020 Oracle and/or its affiliates.
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 28
Oracle Machine Learning Key Features: • Collaborative UI for data scientist and analysts • Packaged with Autonomous Databases • Quick start Example notebooks • Easy access to shared notebooks, templates, permissions, scheduler, etc. • OML4SQL • OML4Py coming soon • Supports deployment of OML models Machine Learning Notebooks included in Autonomous Databases Copyright © 2020 Oracle and/or its affiliates.
Oracle Machine Learning Key Features: • Collaborative UI for data scientist and analysts • Packaged with Autonomous Databases • Quick start Example notebooks • Easy access to shared notebooks, templates, permissions, scheduler, etc. • OML4SQL • OML4Py coming soon • Supports deployment of OML models Machine Learning Notebooks included in Autonomous Databases Copyright © 2020 Oracle and/or its affiliates.
Oracle Machine Learning for R / Python Transparency layer ‐ Leverage proxy objects so data remain in database ‐ Overload native functions translating functionality to SQL ‐ Use familiar R/Python syntax to manipulate database data Parallel, distributed algorithms ‐ Scalability and performance ‐ Exposes in-database algorithms available from OML4SQL Embedded execution ‐ Manage and invoke R or Python scripts in Oracle Database ‐ Data-parallel, task-parallel, and non-parallel execution ‐ Use open source packages to augment functionality OML4Py, Automated Machine Learning - AutoML ‐ Feature selection, model selection, hyper-parameter tuning Multiple Components/APIs of Oracle Machine Learning Database Server Client SQL Interfaces SQL*Plus SQLDeveloper OML4Py OML4R Copyright © 2020 Oracle and/or its affiliates. * Coming soon
Coming Soon! | AutoML – new with OML4Py Auto Feature Selection – Reduce # of features by identifying most predictive – Improve performance and accuracy Increase data scientist productivity – reduce overall compute time Auto Algorithm Selection Much faster than exhaustive search Auto Feature Selection De-noise data and reduce # of features AutoTune Significant accuracy improvement Auto Algorithm Selection – Identify in-database algorithm that achieves highest model quality – Find best algorithm faster than with exhaustive search AutoTune Hyperparameters – Significantly improve model accuracy – Avoid manual or exhaustive search techniques Copyright © 2020 Oracle and/or its affiliates. Enables non-expert users to leverage Machine Learning Data Table ML Model
Coming Soon! | OML AutoML User Interface Automate production and deployment of ML models Enhance Data Scientist productivity and user-experience Enable non-expert users to leverage ML Unify model deployment and monitoring Support model management Features Minimal user input: data, target Model leaderboard Model deployment via REST Model monitoring Cognitive features for image and text “Code-free” user interface supporting automated end-to-end machine learning Copyright © 2020 Oracle and/or its affiliates.
Coming Soon! | OML AutoML User Interface Automate production and deployment of ML models Enhance Data Scientist productivity and user-experience Enable non-expert users to leverage ML Unify model deployment and monitoring Support model management Features Minimal user input: data, target Model leaderboard Model deployment via REST Model monitoring Cognitive features for image and text “Code-free” user interface supporting automated end-to-end machine learning Copyright © 2020 Oracle and/or its affiliates.
Coming Soon! | Algorithms for Database 20c Gradient BoostedTrees (XGBoost) Highly popular and powerful algorithm – Kaggle winners Classification, regression, ranking, survival analysis MSET-SPRT Multivariate State EstimationTechnique - Sequential Probability RatioTest (MSET-SPRT) Nonlinear, nonparametric anomaly detection algorithm designed to monitor critical processes. Detects subtle anomalies while also producing minimal false alarms. Calibrates expected behavior from historical normal operational sequence of monitored signals. Re-implemented and sped up in-DB and based on original Oracle Labs algorithm Two major new ML algorithms Copyright © 2020 Oracle and/or its affiliates.
OracleApplications that Embed Oracle Machine Learning Algorithms Copyright © 2020 Oracle and/or its affiliates.
Enabling Predictive Enterprise Applications Integrated data management + embedded predictive analytics Full 360 degree employee view Single source of HCM data data Interactive dashboards and “What if” analysis Customizable if desired to add input variables to predictive models Mobile + Oracle Cloud solutions HCM PredictiveWorkforce 37 Additional relevant data and “engineered features” Sensor data, Text, unstructured data, transactional data, spatial data, etc. Historical data Assembled historical data Historical or Current Data to be “scored” for predictions Predictions & Insights Oracle Database Link to HCM PredictiveWorkforce demoCopyright © 2020 Oracle and/or its affiliates.
Oracle Adaptive Intelligent (AI) Apps for Manufacturing Insights (Patterns and CorrelationsAnalysis) – Discover key influencers and patterns that affect yield & quality Predictive Analytics – Predictive critical outcomes during manufacturing to minimize losses Reasons why using/like OAA’s ML – Easy-to-integrate R & PL/SQLAPIs for many ML algorithms – In-database execution & scalable performance – Enterprise grade support for OAA ML – GA Q4FY18 Achieve Manufacturing Operational Excellence using Machine Learning & AI Copyright © 2020 Oracle and/or its affiliates.
From Database Developer to Data Scientist in 6Weeks! The Changing Role of the DBA https://www.datacamp.com/community/blog/data-scientist-vs-data-engineer https://www.kdnuggets.com/2020/02/poll-automl-replace-data-scientists.html Copyright © 2020 Oracle and/or its affiliates.
Database Developer to Data Scientist Journey Data extraction Data wrangling Typically 80% of the work! Deriving new attributes (“feature engineering”) … … Import predictions & insights Translate and deploy ML models Eliminated or minimized w/ Oracle Automate You are Likely Already Doing Much ofTheWork! 1 - https://www.infoworld.com/article/3228245/data-science/the-80-20-data-science-dilemma.html Most data scientists spend only 20 percent of their time on actual data analysis and 80 percent of their time finding, cleaning, and reorganizing huge amounts of data, which is an inefficient data strategy1 Data Management platform becomes combined/hybrid data management + machine learning platform Where the Machine Learning “Magic” Happens Copyright © 2020 Oracle and/or its affiliates.
CRISP-DM Methodology Six Major Steps https://en.wikipedia.org/wiki/Cross-industry_standard_process_for_data_miningCopyright © 2020 Oracle and/or its affiliates. DATA UNDERSTANDING DATA PREPARATION MODELING EVALUATION DEPLOYMENT BUSINESS UNDERSTANDING
CRISP-DM Methodology Six Major Steps https://en.wikipedia.org/wiki/Cross-industry_standard_process_for_data_miningCopyright © 2020 Oracle and/or its affiliates. DATA UNDERSTANDING Assemble the “right data” Data profiling • Data visualization • Univariate statistics/group by • Bi-variate statistics DATA PREPARATION Sampling/Stratified Algorithm req’d transforms • Auto Data Preparation • MissingValues, Binning, Normalization, etc. • Unstructured data • Aggregations Domain specific transforms • “Engineered Features” Features Selection MODELING Algorithm settings/defaults • Stratified sampling • Feature selection • Build model(s) EVALUATION Model evaluation Model comparison Model selection DEPLOYMENT In-DB ML model apply • Real-time ML apply • In-database, REST Embed methodology • Applications • Dashboards BUSINESS UNDERSTANDING Well-defined business problem
CRISP-DM Methodology Six Major Steps https://en.wikipedia.org/wiki/Cross-industry_standard_process_for_data_miningCopyright © 2020 Oracle and/or its affiliates. DATA UNDERSTANDING Assemble the “right data” Data profiling • Data visualization • Univariate statistics/group by • Bi-variate statistics DATA PREPARATION Sampling/Stratified Algorithm req’d transforms • Auto Data Preparation • MissingValues, Binning, Normalization, etc. • Unstructured data • Aggregations Domain specific transforms • “Engineered Features” Features Selection MODELING Algorithm settings/defaults • Stratified sampling • Feature selection • Build model(s) EVALUATION Model evaluation Model comparison Model selection DEPLOYMENT In-DB ML model apply • Real-time ML apply • In-database, REST Embed methodology • Applications • Dashboards BUSINESS UNDERSTANDING Well-defined business problem *Automated and/or system defaults
Database Developer to Data Scientist Journey • Business Understanding—Week 1 • Data Understanding—Week 2 • Data Preparation—Week 3 • Modeling (ML)—Week 4 • Evaluation—Week 5 • Deployment—Week 6 Six Major Steps (Oracle Machine Learning POV) https://en.wikipedia.org/wiki/Cross-industry_standard_process_for_data_mining Copyright © 2020 Oracle and/or its affiliates.
Oracle Machine Learning SQL Developer Extension: Oracle Data Miner UI Business Understanding: Target customers most likely to Buy Insurance Data Understanding Modeling (ML) Evaluation Deployment Data Preparation Copyright © 2020 Oracle and/or its affiliates.
Copyright © 2020 Oracle and/or its affiliates.
Week 1—Business Understanding • Predict employees that voluntarily churn • Predict customers that are likely to churn • Target “best” customers • Find items that will help me sell more most profitable items • What is a specific customer most likely to purchase next? • Who are my “best customers”? • How can I combat fraud? • I’ve got all this data; can you “mine” it and find useful insights? 47 Start with aWell-Defined Business Problem Statement Copyright © 2020 Oracle and/or its affiliates.
Week 1—Business Understanding “If I had an hour to solve a problem I'd spend 55 minutes thinking about the problem and 5 minutes thinking about solutions.” ― Albert Einstein Start with aWell-Defined Business Problem Statement Copyright © 2020 Oracle and/or its affiliates.
The Sand Trap of Poorly Formed Problem Statements I’ve got all this data; can you “mine” it and find useful insights?
Week 1—Business Understanding Be Extremely Specific in Problem Statement Poorly Defined Better ML Function Predict employees that leave •Based on past employees that voluntarily left: • Create New Attribute EmplTurnover  O/1 Classification Predict customers that churn •Based on past customers that have churned: • Create New Attribute Churn  YES/NO Classification Target “best” customers •Recency, Frequency Monetary (RFM) Analysis •Specific Dollar Amount over Time Window: • Who has spent $500+ in most recent 18 months Classification How can I make more $$? •What helps me sell soft drinks & coffee? Association Rules Which customers are likely to buy? •How much is each customer likely to spend? Regression Who are my “best customers”? •What descriptive “rules” describe “best customers”? Classification How can I combat fraud? •Which transactions are the most anomalous? • Then roll-up to physician, claimant, employee, etc. Anomaly Detection X1 X2 Copyright © 2020 Oracle and/or its affiliates.
Week 1—Business Understanding Target “best” customers who have GOOD CREDIT and make payments 51 Be Extremely Specific in your Problem Statement Copyright © 2020 Oracle and/or its affiliates.
Copyright © 2019 Oracle and/or its affiliates. “Good_Credit” customers who complete all their payments are hard to find. Copyright © 2020 Oracle and/or its affiliates.
Week 2—Data Understanding 53 Review the Data; Does it Makes Sense? AreAGEs all positive, 0-120? Are INCOME values weekly or monthly? Are the LOAN_AMOUNTS reasonable? Etc…. Copyright © 2020 Oracle and/or its affiliates.
Week 2—Data Understanding 54 Review the Data; Does it Makes Sense? Copyright © 2019 Oracle and/or its affiliates. Simple, exploratory graphs to understand the data Copyright © 2020 Oracle and/or its affiliates.
Week 2—Data Understanding 55 Review the Data; Does it Makes Sense? AreAGEs all positive, 0-120? Are INCOME values weekly or monthly? Are the LOAN_AMOUNTS reasonable? Etc…. Copyright © 2020 Oracle and/or its affiliates.
Week 3—Data Preparation Prepare the Data, Create New Derived Attributes or “Engineered Features” Source Attribute New Attribute/”Engineered Feature” Date of Birth AGE Address DISTANCE_TO_DESTINATION COMMUTE_TIME Call detail records (CDRs) #_DROPPED_CALLS PERCENT_INTERNATIONAL Salary PERCENT_VS_PEERS Purchases TOTALS_PER_CATEGORY (e.g. Food, Clothing) Copyright © 2020 Oracle and/or its affiliates.
Week 3—Data Preparation Oracle Data Miner’s Column Filter Node does automated data profiling to highlight issues and make recommendations – Missing values – Outliers – Too many distinct values – Too many constants – Correlated data 57 Prepare the Data, Create New Derived Attributes or “Engineered Features” Copyright © 2020 Oracle and/or its affiliates.
Week 3—Data Preparation Oracle Machine Learning’s Auto Data Prep (ADP) and ML algorithms are designed with intelligent defaults and can automatically deal with: – Missing values – Outliers – Binning – Too many distinct values – Too many constants – Trans data/aggregations – Unstructured data – Correlated data 58 Prepare the Data, Create New Derived Attributes or “Engineered Features” Copyright © 2020 Oracle and/or its affiliates.
Week 4—Modeling (Machine Learning) First, Identify the KeyAttributesThat Most Influence theTarget Attribute Copyright © 2020 Oracle and/or its affiliates.
Week 4—Modeling (Machine Learning) Training andTesting ML Models using 60/40% Random Samples Historical DataTrain Test Build Model Test Model Evaluate ModelTrain ModelHistorical Data Copyright © 2020 Oracle and/or its affiliates.
Week 4—Modeling (Machine Learning) Build multiple models with different algorithms and settings Copyright © 2020 Oracle and/or its affiliates.
Week 5—Model Evaluation (ML) Randomly selected “hold out” sample of data that was used to train the ML model ComputeCumulative Gains, Lift, Accuracy, etc. Review the attributes used in the model and model coefficients Make sure the model makes sense 62 Next, test model accuracy Copyright © 2020 Oracle and/or its affiliates. Model Evaluation
Week 6—Deployment Simple SQL Apply scripts run 100% inside the Database for immediate ML model deployment Apply the Models to Predict “Best Customers” Model Apply/”Scoring” Copyright © 2020 Oracle and/or its affiliates.
Week 6—Deployment Simple SQL Apply scripts run 100% inside the Database for model build, model apply and immediate ML model deployment Apply the Models to Predict “Best Customers” Copyright © 2020 Oracle and/or its affiliates. Model Build Model Apply Results
Congratulations! You are an Oracle Data Scientist! Data Scientist Copyright © 2020 Oracle and/or its affiliates.
Wait, there is more! Copyright © 2020 Oracle and/or its affiliates.
OML + APEX Interactively Explore Data and OML Insights and Predictons Copyright © 2020 Oracle and/or its affiliates. Predictions
OML + Analytics Cloud Interactively Explore Data and OML Insights and Predictons Predictions, Probabilities and Insights * Oracle AnalyticsCloud screen from “Predicting a GoodWine” by Francesco Tisiot, Rittman Mead and Charlie Berger, OracleCopyright © 2020 Oracle and/or its affiliates.
OML + Analytics Cloud Interactively Explore Data and OML Insights and Predictons * Oracle AnalyticsCloud screen from “Predicting a GoodWine” by Francesco Tisiot, Rittman Mead and Charlie Berger, OracleCopyright © 2020 Oracle and/or its affiliates. Predictions, Probabilities and Insights
ML Model Deployment via ORDS REST API Launch Development APEX Copyright © 2020 Oracle and/or its affiliates.
For More Information Google: Oracle Machine Learning on OTN https://www.oracle.com/machinelearning 71Copyright © 2019 Oracle and/or its affiliates.Copyright © 2019 Oracle and/or its affiliates.
Where should I start? Quick Starts, HOLs, Docs and Oracle Learning Library Tutorials • Hands-0n Lab: How to Pick a Good Wine for $30< using Oracle Autonomous Database, Oracle Machine Learning, APEX, Oracle Analytics Cloud and REST Services • Oracle Machine Learning for R Learning Path • Autonomous Data Warehouse For Developers. Get Hands on with Oracle Public Cloud • Learn How to Use Oracle Data Miner UI in 45 Minutes • Hands-on Lab: Learn to Use Oracle Machine Learning Notebooks • OML Getting Started Documentation: Copyright © 2019 Oracle and/or its affiliates.
ThankYou Charlie Berger Senior Director, Product Management Machine Learning, AI and Cognitive Analytics

Oracle Machine Learning Overview and From Oracle Data Professional to Oracle Data Scientist in 6 Weeks!

  • 1.
    Oracle Machine Learning FromOracle Data Professional to Oracle Data Scientist Charlie Berger Sr. Director Product Management, Machine Learning, AI and CognitiveAnalytics, charlie.berger@oracle.com www.twitter.com/CharlieDataMine Copyright © 2019 Oracle and/or its affiliates. Move the Algorithms; Not the Data!
  • 2.
    Safe harbor statement Thefollowing is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, timing, and pricing of any features or functionality described for Oracle’s products may change and remains at the sole discretion of Oracle Corporation. 2 Copyright © 2020 Oracle and/or its affiliates.
  • 3.
    Goal Share an attainable,logical, evolutionary path for Oracle data professionals to add machine learning to their valuable Oracle data skills to extract more information, insights and to make predictions. Copyright © 2020 Oracle and/or its affiliates.Copyright © 2020 Oracle and/or its affiliates.
  • 4.
    Oracle Database ConvergedFeatures Oracle Machine Learning Copyright © 2020 Oracle and/or its affiliates.
  • 5.
    Oracle Mission Statement “Ourmission is to help people see data in new ways, discover insights, unlock endless possibilities” © 2020 Oracle - Portland OUG Training Day 10/22/2020Copyright © 2020 Oracle and/or its affiliates.
  • 6.
    Operational DBAs spenda lot of time… 85% of security breaches occurred after the CVE was published DB Maestro Security 85% 91% experience unplanned data center outages Healthcare IT News Database downtime costs $7,900 / minute DB Maestro Reliability 91% 72% of IT budget is spent on generic maintenance tasks vs innovation ComputerWorld Maintenance 72% Copyright © 2020 Oracle and/or its affiliates.
  • 7.
    Oracle Autonomous DatabaseCan Help 7 Self-Driving Automates all database and infrastructure management, monitoring, tuning Self-Securing Protects from both external attacks and malicious internal users Self-Repairing Protects from all downtime including planned maintenance Oracle Autonomous Database does the grunt work forYOU! Copyright © 2020 Oracle and/or its affiliates.
  • 8.
    HIGHER AGILITY LOWER RISKYou are more VALUABLE Automation moves DBA up the value chain… Copyright © 2020 Oracle and/or its affiliates.
  • 9.
    Data Engineer Architecture, “data wrangler” MachineLearning Solving data-driven problems Discovering insights Making predictions Data Security Data classification, Data life-cycle mgmt ApplicationTuning SQL tuning, connection mgmt The Evolution of the DBA/Database Developer Role Copyright © 2020 Oracle and/or its affiliates.
  • 10.
    “Why Oracle? Because that’swhere the data is!” Larry Ellison, Executive Chairman and CTO of Oracle Corporation Copyright © 2020 Oracle and/or its affiliates.
  • 11.
    Algorithms automatically siftthrough large amounts of data to discover hidden patterns, new insights and make predictions What is Machine Learning? Identify most important factor (Attribute Importance) Predict customer behavior (Classification) Find profiles of targeted people or items (Classification Predict or estimate a value (Regression) Segment a population (Clustering) Find fraudulent or “rare events” (Anomaly Detection) Determine co-occurring items in a “basket” (Associations) X1 X2 A1A2A3A4 A5A6 A7 SupervisedLearningUnsupervisedLearning Copyright © 2020 Oracle and/or its affiliates.
  • 12.
    Example Machine LearningUse Cases http://www.slideshare.net/bigdataelephants/big-data-elephants-strategic-consulting-engineering-services-34175779
  • 13.
    Machine Learning AlgorithmsNeed Data Move the Algorithms, Not the Data! 3 An “AI Database”? A “Thinking Database”? It Changes Everything! Copyright © 2020 Oracle and/or its affiliates.
  • 14.
    Oracle Machine Learning OracleMachine Learning extends Oracle Database(s) and enables users to build “AI” applications and analytics dashboards OML delivers powerful in-database machine learning algorithms, automated ML functionality, and integration with open source Python* and R. Oracle Machine Learning OML Services* Model Deployment and Management, Cognitive Image and Text OML4SQL SQL API OML4Py* Python API OML4R R API OML Notebooks with Apache Zeppelin on Autonomous Database OML4Spark R API on Big Data Oracle Data Miner Oracle SQL Developer extension * Coming soonCopyright © 2020 Oracle and/or its affiliates.
  • 15.
    Operationalizing and EmbeddingML Length of time to put a model into production. Based on 141 respondents who stated they are doing this today How long does it take to put a defined model into operational use? ? ? ? Copyright © 2020 Oracle and/or its affiliates.
  • 16.
    Why Do 87%of Data Science Projects Never Make It Into Production? But now that it’s a team sport, … work is now being embedded into the fabric of the company, it’s essential that every person on the team is able to collaborate with everyone else: the data engineers, the data stewards, people that understand the data science, or analytics, or BI specialists, all the way up to DevOps and engineering. “This is a big place that holds companies back because they’re not used to collaborating in this way,” Leff says. “Because when they take those insights, and they flip them over the wall, now you’re asking an engineer to rewrite a data science model created by a data scientist, how’s that work out, usually?” “Well,” Chapo says, “It doesn’t.” “Oftentimes people imagine a world where we’re doing this amazing, fancy, unicorn, sprinkling-pixie-dust sort of AI projects,” he said. https://venturebeat.com/2019/07/19/why-do-87-of-data-science-projects-never-make-it-into-production/
  • 17.
    OML Functionality and SupportedLanguages Copyright © 2020 Oracle and/or its affiliates.
  • 18.
    CLASSIFICATION Naïve Bayes Logistic Regression(GLM) Decision Tree Random Forest Neural Network SupportVector Machine Explicit Semantic Analysis CLUSTERING Hierarchical K-Means Hierarchical O-Cluster Expectation Maximization (EM) ANOMALY DETECTION One-Class SVM TIME SERIES Forecasting - Exponential Smoothing Includes popular models e.g. Holt-Winters with trends, seasonality, irregularity, missing data REGRESSION Linear Model Generalized Linear Model SupportVector Machine (SVM) Stepwise Linear regression Neural Network ATTRIBUTE IMPORTANCE Minimum Description Length Principal Comp Analysis (PCA) Unsupervised Pair-wise KL Div CUR decomposition for row & AI ASSOCIATION RULES A priori/ market basket PREDICTIVE QUERIES Predict, cluster, detect, features SQL ANALYTICS SQL Windows SQL Patterns SQL Aggregates Oracle Machine Learning Algorithms FEATURE EXTRACTION Principal Comp Analysis (PCA) Non-negative Matrix Factorization Singular Value Decomposition (SVD) Explicit Semantic Analysis (ESA) TEXT MINING SUPPORT Algorithms support text Tokenization and theme extraction Explicit Semantic Analysis (ESA) for document similarity STATISTICAL FUNCTIONS Basic statistics: min, max, median, stdev, t-test, F-test, Pearson’s, Chi-Sq, ANOVA, etc. R & PYTHON Third-party R & Python Packages through Embedded Execution Spark MLlib algorithm integration MODEL DEPLOYMENT & MONITORING SQL—1st Class Objects Oracle RESTful API (ORDS) OML Services X1 X2 A1 A2 A3 A4 A5 A6 A7 Includes support for Partitioned Models,Transactional data and aggregations, Unstructured data, Geo-spatial data, Graph data * Coming soon * Coming soon Copyright © 2020 Oracle and/or its affiliates.
  • 19.
    STATISTICAL FUNCTIONS Descriptive statistics (e.g.median, stdev, mode, sum, etc.) Hypothesis testing (t-test, F-test, Kolmogorov-Smirnov test, Mann Whitney test, Wilcoxon Signed Ranks test Correlations analysis (parametric and nonparametric e.g. Pearson’s test for correlation, Spearman's rho coefficient, Kendall's tau-b correlation coefficient) Ranking functions CrossTabulations with Chi-square statistics Linear regression ANOVA (Analysis of variance) Test Distribution fit (e.g., Normal distribution test, Binomial test, Weibull test, Uniform test, Exponential test, Poisson test) Statistical Aggregates (min, max, mean, median, stdev, mode, quantiles, plus x sigma, minus x sigma, top n outliers, bottom n outliers) Statistical Functions and Analytical SQL ANALYTICAL SQL SQL Windows SQL Aggregate functions LAG/LEAD functions SQL for Pattern Matching Additional approximate query processing: APPROX_COUNT, APPROX_SUM, APPROX_RANK Regular Expressions Copyright © 2020 Oracle and/or its affiliates.
  • 20.
    Goal: Manage andAnalyze All Your Data Big Data SQL SQL / R / Python Object Store “Engineered Features” – Derived attributes that reflect domain knowledge—key to best models e.g.: • Counts • Totals • Changes over time Boil down the Data Lake Architecturally, Many Options and Flexibility Coming soon Copyright © 2020 Oracle and/or its affiliates.
  • 21.
    OML for SQLModel Build & SQL Apply BEGIN DBMS_DATA_MINING.CREATE_MODEL( model_name => 'BUY_INSUR1', mining_function => dbms_data_mining.classification, data_table_name => 'CUST_INSUR_LTV', case_id_column_name => 'CUST_ID', target_column_name => 'BUY_INSURANCE', settings_table_name => 'CUST_INSUR_LTV_SET'); END; Simple SQL Syntax—Classification Model Select prediction_probability(BUY_INSUR1, 'Yes' USING 3500 as bank_funds, 825 as checking_amount, 400 as credit_balance, 22 as age, 'Married' as marital_status, 93 as MONEY_MONTLY_OVERDRAWN, 1 as house_ownership) from dual; ML Model Build (PL/SQL) Model Apply (SQL query) Copyright © 2020 Oracle and/or its affiliates.
  • 22.
    OML for SQLModel Build BEGIN DBMS_DATA_MINING.CREATE_MODEL( model_name => 'BUY_INSURANCE_AI', mining_function => DBMS_DATA_MINING.ATTRIBUTE_IMPORTANCE, data_table_name => 'CUST_INSUR_LTV', case_id_column_name => 'cust_id', target_column_name => 'BUY_INSURANCE', settings_table_name => 'Att_Import_Model_Settings'); END; Simple SQL Syntax—Attribute Importance SELECT attribute_name, explanatory_value, rank FROM BUY_INSURANCE_AI ORDER BY rank, attribute_name; ML Model Build (PL/SQL) Model Results (SQL query) ATTRIBUTE_NAME RANK ATTRIBUTE_VALUE BANK_FUNDS 1 0.2161 MONEY_MONTLY_OVERDRAWN 2 0.1489 N_TRANS_ATM 3 0.1463 N_TRANS_TELLER 4 0.1156 T_AMOUNT_AUTOM_PAYMENTS 5 0.1095 A1A2A3A4 A5 A6 A7 Copyright © 2020 Oracle and/or its affiliates.
  • 23.
    OML for RModel Build > ore.odmAI (BUY_INSURANCE ~ ., CUST_INSUR_LTV) Call: ore.odmAI(formula = BUY_INSURANCE ~ ., data = CUST_INSUR_LTV) Simple R Language Syntax—Attribute Importance ML Model Build (R) Model Results (R) Importance: importance rank BANK_FUNDS 0.2161187797 1 MONEY_MONTLY_OVERDRAWN 0.1489347141 2 N_TRANS_ATM 0.1463026512 3 N_TRANS_TELLER 0.1155879786 4 T_AMOUNT_AUTOM_PAYMENTS 0.1095178647 5 A1A2A3A4 A5A6 A7 Copyright © 2020 Oracle and/or its affiliates.
  • 24.
    OML for PythonModel Build—Coming soon! > ai_mod = ai(**setting) # Create AI model object > ai_mod = ai_mod.fit(train_x, train_y) Simple Python Language Syntax—Attribute Importance ML Model Build (Python) Model Results (Python) Importance: variable importance rank BANK_FUNDS 0.2161187797 1 MONEY_MONTLY_OVERDRAWN 0.1489347141 2 N_TRANS_ATM 0.1463026512 3 N_TRANS_TELLER 0.1155879786 4 T_AMOUNT_AUTOM_PAYMENTS 0.1095178647 5 A1A2A3A4 A5A6 A7 Copyright © 2020 Oracle and/or its affiliates.
  • 25.
    Oracle Data MinerUI Easy to use to define analytical methodologies that can be shared SQL Developer Extension Workflow API and generates SQL code for immediate deployment Drag and Drop,Workflows, Easy to Use UI for “Citizen Data Scientist” Copyright © 2020 Oracle and/or its affiliates.
  • 26.
    Copyright © 2018,Oracle and/or its affiliates. All rights reserved. | 26
  • 27.
    OML4R R languageSQL “push down” Transparencylayer for “push down” to equivalent SQL for parallelized in-DB processing Direct access to DB data ROracle pkg for OCI connectivity “Embedded R” call outs to R packages R Language API to OML Algorithms and Integration with R Copyright © 2020 Oracle and/or its affiliates.
  • 28.
    Copyright © 2018,Oracle and/or its affiliates. All rights reserved. | 28
  • 29.
    Oracle Machine Learning KeyFeatures: • Collaborative UI for data scientist and analysts • Packaged with Autonomous Databases • Quick start Example notebooks • Easy access to shared notebooks, templates, permissions, scheduler, etc. • OML4SQL • OML4Py coming soon • Supports deployment of OML models Machine Learning Notebooks included in Autonomous Databases Copyright © 2020 Oracle and/or its affiliates.
  • 30.
    Oracle Machine Learning KeyFeatures: • Collaborative UI for data scientist and analysts • Packaged with Autonomous Databases • Quick start Example notebooks • Easy access to shared notebooks, templates, permissions, scheduler, etc. • OML4SQL • OML4Py coming soon • Supports deployment of OML models Machine Learning Notebooks included in Autonomous Databases Copyright © 2020 Oracle and/or its affiliates.
  • 31.
    Oracle Machine Learningfor R / Python Transparency layer ‐ Leverage proxy objects so data remain in database ‐ Overload native functions translating functionality to SQL ‐ Use familiar R/Python syntax to manipulate database data Parallel, distributed algorithms ‐ Scalability and performance ‐ Exposes in-database algorithms available from OML4SQL Embedded execution ‐ Manage and invoke R or Python scripts in Oracle Database ‐ Data-parallel, task-parallel, and non-parallel execution ‐ Use open source packages to augment functionality OML4Py, Automated Machine Learning - AutoML ‐ Feature selection, model selection, hyper-parameter tuning Multiple Components/APIs of Oracle Machine Learning Database Server Client SQL Interfaces SQL*Plus SQLDeveloper OML4Py OML4R Copyright © 2020 Oracle and/or its affiliates. * Coming soon
  • 32.
    Coming Soon! |AutoML – new with OML4Py Auto Feature Selection – Reduce # of features by identifying most predictive – Improve performance and accuracy Increase data scientist productivity – reduce overall compute time Auto Algorithm Selection Much faster than exhaustive search Auto Feature Selection De-noise data and reduce # of features AutoTune Significant accuracy improvement Auto Algorithm Selection – Identify in-database algorithm that achieves highest model quality – Find best algorithm faster than with exhaustive search AutoTune Hyperparameters – Significantly improve model accuracy – Avoid manual or exhaustive search techniques Copyright © 2020 Oracle and/or its affiliates. Enables non-expert users to leverage Machine Learning Data Table ML Model
  • 33.
    Coming Soon! |OML AutoML User Interface Automate production and deployment of ML models Enhance Data Scientist productivity and user-experience Enable non-expert users to leverage ML Unify model deployment and monitoring Support model management Features Minimal user input: data, target Model leaderboard Model deployment via REST Model monitoring Cognitive features for image and text “Code-free” user interface supporting automated end-to-end machine learning Copyright © 2020 Oracle and/or its affiliates.
  • 34.
    Coming Soon! |OML AutoML User Interface Automate production and deployment of ML models Enhance Data Scientist productivity and user-experience Enable non-expert users to leverage ML Unify model deployment and monitoring Support model management Features Minimal user input: data, target Model leaderboard Model deployment via REST Model monitoring Cognitive features for image and text “Code-free” user interface supporting automated end-to-end machine learning Copyright © 2020 Oracle and/or its affiliates.
  • 35.
    Coming Soon! |Algorithms for Database 20c Gradient BoostedTrees (XGBoost) Highly popular and powerful algorithm – Kaggle winners Classification, regression, ranking, survival analysis MSET-SPRT Multivariate State EstimationTechnique - Sequential Probability RatioTest (MSET-SPRT) Nonlinear, nonparametric anomaly detection algorithm designed to monitor critical processes. Detects subtle anomalies while also producing minimal false alarms. Calibrates expected behavior from historical normal operational sequence of monitored signals. Re-implemented and sped up in-DB and based on original Oracle Labs algorithm Two major new ML algorithms Copyright © 2020 Oracle and/or its affiliates.
  • 36.
    OracleApplications that Embed OracleMachine Learning Algorithms Copyright © 2020 Oracle and/or its affiliates.
  • 37.
    Enabling Predictive EnterpriseApplications Integrated data management + embedded predictive analytics Full 360 degree employee view Single source of HCM data data Interactive dashboards and “What if” analysis Customizable if desired to add input variables to predictive models Mobile + Oracle Cloud solutions HCM PredictiveWorkforce 37 Additional relevant data and “engineered features” Sensor data, Text, unstructured data, transactional data, spatial data, etc. Historical data Assembled historical data Historical or Current Data to be “scored” for predictions Predictions & Insights Oracle Database Link to HCM PredictiveWorkforce demoCopyright © 2020 Oracle and/or its affiliates.
  • 38.
    Oracle Adaptive Intelligent(AI) Apps for Manufacturing Insights (Patterns and CorrelationsAnalysis) – Discover key influencers and patterns that affect yield & quality Predictive Analytics – Predictive critical outcomes during manufacturing to minimize losses Reasons why using/like OAA’s ML – Easy-to-integrate R & PL/SQLAPIs for many ML algorithms – In-database execution & scalable performance – Enterprise grade support for OAA ML – GA Q4FY18 Achieve Manufacturing Operational Excellence using Machine Learning & AI Copyright © 2020 Oracle and/or its affiliates.
  • 39.
    From Database Developerto Data Scientist in 6Weeks! The Changing Role of the DBA https://www.datacamp.com/community/blog/data-scientist-vs-data-engineer https://www.kdnuggets.com/2020/02/poll-automl-replace-data-scientists.html Copyright © 2020 Oracle and/or its affiliates.
  • 40.
    Database Developer toData Scientist Journey Data extraction Data wrangling Typically 80% of the work! Deriving new attributes (“feature engineering”) … … Import predictions & insights Translate and deploy ML models Eliminated or minimized w/ Oracle Automate You are Likely Already Doing Much ofTheWork! 1 - https://www.infoworld.com/article/3228245/data-science/the-80-20-data-science-dilemma.html Most data scientists spend only 20 percent of their time on actual data analysis and 80 percent of their time finding, cleaning, and reorganizing huge amounts of data, which is an inefficient data strategy1 Data Management platform becomes combined/hybrid data management + machine learning platform Where the Machine Learning “Magic” Happens Copyright © 2020 Oracle and/or its affiliates.
  • 41.
    CRISP-DM Methodology Six MajorSteps https://en.wikipedia.org/wiki/Cross-industry_standard_process_for_data_miningCopyright © 2020 Oracle and/or its affiliates. DATA UNDERSTANDING DATA PREPARATION MODELING EVALUATION DEPLOYMENT BUSINESS UNDERSTANDING
  • 42.
    CRISP-DM Methodology Six MajorSteps https://en.wikipedia.org/wiki/Cross-industry_standard_process_for_data_miningCopyright © 2020 Oracle and/or its affiliates. DATA UNDERSTANDING Assemble the “right data” Data profiling • Data visualization • Univariate statistics/group by • Bi-variate statistics DATA PREPARATION Sampling/Stratified Algorithm req’d transforms • Auto Data Preparation • MissingValues, Binning, Normalization, etc. • Unstructured data • Aggregations Domain specific transforms • “Engineered Features” Features Selection MODELING Algorithm settings/defaults • Stratified sampling • Feature selection • Build model(s) EVALUATION Model evaluation Model comparison Model selection DEPLOYMENT In-DB ML model apply • Real-time ML apply • In-database, REST Embed methodology • Applications • Dashboards BUSINESS UNDERSTANDING Well-defined business problem
  • 43.
    CRISP-DM Methodology Six MajorSteps https://en.wikipedia.org/wiki/Cross-industry_standard_process_for_data_miningCopyright © 2020 Oracle and/or its affiliates. DATA UNDERSTANDING Assemble the “right data” Data profiling • Data visualization • Univariate statistics/group by • Bi-variate statistics DATA PREPARATION Sampling/Stratified Algorithm req’d transforms • Auto Data Preparation • MissingValues, Binning, Normalization, etc. • Unstructured data • Aggregations Domain specific transforms • “Engineered Features” Features Selection MODELING Algorithm settings/defaults • Stratified sampling • Feature selection • Build model(s) EVALUATION Model evaluation Model comparison Model selection DEPLOYMENT In-DB ML model apply • Real-time ML apply • In-database, REST Embed methodology • Applications • Dashboards BUSINESS UNDERSTANDING Well-defined business problem *Automated and/or system defaults
  • 44.
    Database Developer toData Scientist Journey • Business Understanding—Week 1 • Data Understanding—Week 2 • Data Preparation—Week 3 • Modeling (ML)—Week 4 • Evaluation—Week 5 • Deployment—Week 6 Six Major Steps (Oracle Machine Learning POV) https://en.wikipedia.org/wiki/Cross-industry_standard_process_for_data_mining Copyright © 2020 Oracle and/or its affiliates.
  • 45.
    Oracle Machine Learning SQLDeveloper Extension: Oracle Data Miner UI Business Understanding: Target customers most likely to Buy Insurance Data Understanding Modeling (ML) Evaluation Deployment Data Preparation Copyright © 2020 Oracle and/or its affiliates.
  • 46.
    Copyright © 2020Oracle and/or its affiliates.
  • 47.
    Week 1—Business Understanding •Predict employees that voluntarily churn • Predict customers that are likely to churn • Target “best” customers • Find items that will help me sell more most profitable items • What is a specific customer most likely to purchase next? • Who are my “best customers”? • How can I combat fraud? • I’ve got all this data; can you “mine” it and find useful insights? 47 Start with aWell-Defined Business Problem Statement Copyright © 2020 Oracle and/or its affiliates.
  • 48.
    Week 1—Business Understanding “IfI had an hour to solve a problem I'd spend 55 minutes thinking about the problem and 5 minutes thinking about solutions.” ― Albert Einstein Start with aWell-Defined Business Problem Statement Copyright © 2020 Oracle and/or its affiliates.
  • 49.
    The Sand Trapof Poorly Formed Problem Statements I’ve got all this data; can you “mine” it and find useful insights?
  • 50.
    Week 1—Business Understanding BeExtremely Specific in Problem Statement Poorly Defined Better ML Function Predict employees that leave •Based on past employees that voluntarily left: • Create New Attribute EmplTurnover  O/1 Classification Predict customers that churn •Based on past customers that have churned: • Create New Attribute Churn  YES/NO Classification Target “best” customers •Recency, Frequency Monetary (RFM) Analysis •Specific Dollar Amount over Time Window: • Who has spent $500+ in most recent 18 months Classification How can I make more $$? •What helps me sell soft drinks & coffee? Association Rules Which customers are likely to buy? •How much is each customer likely to spend? Regression Who are my “best customers”? •What descriptive “rules” describe “best customers”? Classification How can I combat fraud? •Which transactions are the most anomalous? • Then roll-up to physician, claimant, employee, etc. Anomaly Detection X1 X2 Copyright © 2020 Oracle and/or its affiliates.
  • 51.
    Week 1—Business Understanding Target“best” customers who have GOOD CREDIT and make payments 51 Be Extremely Specific in your Problem Statement Copyright © 2020 Oracle and/or its affiliates.
  • 52.
    Copyright © 2019Oracle and/or its affiliates. “Good_Credit” customers who complete all their payments are hard to find. Copyright © 2020 Oracle and/or its affiliates.
  • 53.
    Week 2—Data Understanding 53 Reviewthe Data; Does it Makes Sense? AreAGEs all positive, 0-120? Are INCOME values weekly or monthly? Are the LOAN_AMOUNTS reasonable? Etc…. Copyright © 2020 Oracle and/or its affiliates.
  • 54.
    Week 2—Data Understanding 54 Reviewthe Data; Does it Makes Sense? Copyright © 2019 Oracle and/or its affiliates. Simple, exploratory graphs to understand the data Copyright © 2020 Oracle and/or its affiliates.
  • 55.
    Week 2—Data Understanding 55 Reviewthe Data; Does it Makes Sense? AreAGEs all positive, 0-120? Are INCOME values weekly or monthly? Are the LOAN_AMOUNTS reasonable? Etc…. Copyright © 2020 Oracle and/or its affiliates.
  • 56.
    Week 3—Data Preparation Preparethe Data, Create New Derived Attributes or “Engineered Features” Source Attribute New Attribute/”Engineered Feature” Date of Birth AGE Address DISTANCE_TO_DESTINATION COMMUTE_TIME Call detail records (CDRs) #_DROPPED_CALLS PERCENT_INTERNATIONAL Salary PERCENT_VS_PEERS Purchases TOTALS_PER_CATEGORY (e.g. Food, Clothing) Copyright © 2020 Oracle and/or its affiliates.
  • 57.
    Week 3—Data Preparation OracleData Miner’s Column Filter Node does automated data profiling to highlight issues and make recommendations – Missing values – Outliers – Too many distinct values – Too many constants – Correlated data 57 Prepare the Data, Create New Derived Attributes or “Engineered Features” Copyright © 2020 Oracle and/or its affiliates.
  • 58.
    Week 3—Data Preparation OracleMachine Learning’s Auto Data Prep (ADP) and ML algorithms are designed with intelligent defaults and can automatically deal with: – Missing values – Outliers – Binning – Too many distinct values – Too many constants – Trans data/aggregations – Unstructured data – Correlated data 58 Prepare the Data, Create New Derived Attributes or “Engineered Features” Copyright © 2020 Oracle and/or its affiliates.
  • 59.
    Week 4—Modeling (MachineLearning) First, Identify the KeyAttributesThat Most Influence theTarget Attribute Copyright © 2020 Oracle and/or its affiliates.
  • 60.
    Week 4—Modeling (MachineLearning) Training andTesting ML Models using 60/40% Random Samples Historical DataTrain Test Build Model Test Model Evaluate ModelTrain ModelHistorical Data Copyright © 2020 Oracle and/or its affiliates.
  • 61.
    Week 4—Modeling (MachineLearning) Build multiple models with different algorithms and settings Copyright © 2020 Oracle and/or its affiliates.
  • 62.
    Week 5—Model Evaluation(ML) Randomly selected “hold out” sample of data that was used to train the ML model ComputeCumulative Gains, Lift, Accuracy, etc. Review the attributes used in the model and model coefficients Make sure the model makes sense 62 Next, test model accuracy Copyright © 2020 Oracle and/or its affiliates. Model Evaluation
  • 63.
    Week 6—Deployment Simple SQLApply scripts run 100% inside the Database for immediate ML model deployment Apply the Models to Predict “Best Customers” Model Apply/”Scoring” Copyright © 2020 Oracle and/or its affiliates.
  • 64.
    Week 6—Deployment Simple SQLApply scripts run 100% inside the Database for model build, model apply and immediate ML model deployment Apply the Models to Predict “Best Customers” Copyright © 2020 Oracle and/or its affiliates. Model Build Model Apply Results
  • 65.
    Congratulations! You are anOracle Data Scientist! Data Scientist Copyright © 2020 Oracle and/or its affiliates.
  • 66.
    Wait, there ismore! Copyright © 2020 Oracle and/or its affiliates.
  • 67.
    OML + APEX InteractivelyExplore Data and OML Insights and Predictons Copyright © 2020 Oracle and/or its affiliates. Predictions
  • 68.
    OML + AnalyticsCloud Interactively Explore Data and OML Insights and Predictons Predictions, Probabilities and Insights * Oracle AnalyticsCloud screen from “Predicting a GoodWine” by Francesco Tisiot, Rittman Mead and Charlie Berger, OracleCopyright © 2020 Oracle and/or its affiliates.
  • 69.
    OML + AnalyticsCloud Interactively Explore Data and OML Insights and Predictons * Oracle AnalyticsCloud screen from “Predicting a GoodWine” by Francesco Tisiot, Rittman Mead and Charlie Berger, OracleCopyright © 2020 Oracle and/or its affiliates. Predictions, Probabilities and Insights
  • 70.
    ML Model Deploymentvia ORDS REST API Launch Development APEX Copyright © 2020 Oracle and/or its affiliates.
  • 71.
    For More Information Google:Oracle Machine Learning on OTN https://www.oracle.com/machinelearning 71Copyright © 2019 Oracle and/or its affiliates.Copyright © 2019 Oracle and/or its affiliates.
  • 72.
    Where should Istart? Quick Starts, HOLs, Docs and Oracle Learning Library Tutorials • Hands-0n Lab: How to Pick a Good Wine for $30< using Oracle Autonomous Database, Oracle Machine Learning, APEX, Oracle Analytics Cloud and REST Services • Oracle Machine Learning for R Learning Path • Autonomous Data Warehouse For Developers. Get Hands on with Oracle Public Cloud • Learn How to Use Oracle Data Miner UI in 45 Minutes • Hands-on Lab: Learn to Use Oracle Machine Learning Notebooks • OML Getting Started Documentation: Copyright © 2019 Oracle and/or its affiliates.
  • 73.
    ThankYou Charlie Berger Senior Director,Product Management Machine Learning, AI and Cognitive Analytics