VP AIOps for the Autonomous Database Sandesh Rao LAOUC Machine Learning and AI at Oracle @sandeshr https://www.linkedin.com/in/raosandesh/ https://www.slideshare.net/SandeshRao4
Types of Machine Learning Supervised Learning Predict future outcomes with the help of training data provided by human experts Semi-Supervised Learning Discover patterns within raw data and make predictions, which are then reviewed by human experts, who provide feedback which is used to improve the model accuracy Unsupervised Learning Find patterns without any external input other than the raw data Reinforcement Learning Take decisions based on past rewards for this type of action
ML Project Workflow Set the business objectives Gather compare and clean data Identify and extract features (important columns) from imported data This helps us identify the efficiency of the algorithm Take the input data which is also called the training data and apply the algorithm to it In order for the algorithm to function efficiently, it is important to pick the right value for hyper parameters (input parameters to the algorithm) Once the training data in the algorithm are combined we get a model 1 2 3 4 5
ML vs AutoML Algorithm Selection Feature Selection Model Tuning Model Evaluation AutoML automates the manual steps Accuracy Repeated retraining cycles Algorithms improve models as they get trained on greater volumes of data or more recent/relevant data
Does not replace data scientists but rather expediate their capabilities Does AutoML remove the need for Data Scientists? At the advent of the assembly line in manufacturing, many tedious processes were automated. This enabled workers to put their time and energy into bigger issues, from quality of product to improving design and manufacturing processes. AutoML gives similar power to data scientists, delivering more time to engineer predictive features, develop data acquisition strategies, improve the data transformation pipelines, and more.
Copyright © 2021, Oracle and/or its affiliates 6 Oracle Machine Learning Automated Automated machine learning supports data scientist productivity and empowers non-experts Algorithm-specific data preparation, integrated text mining, partitioned models Scalable Over 30 high performance, parallelized in-database machine learning algorithms that require no data movement Production-ready Quickly deploy and update machine learning models in production via SQL and REST APIs Deploy R and Python user-defined functions using managed processes with easy data-parallel and task- parallel invocation Model Repository Workspaces and Projects Zeppelin-based Notebooks Model Deployment Model Building Model Management Prediction Details R and Python Integration AutoML Data Management Infrastructure Oracle Database – Oracle Autonomous Database – Data Lake Access – Integration – Preparation – Exploration CPU – Storage – Network Cloud On premises
Oracle Machine Learning interfaces to Oracle Database Oracle Autonomous Database Oracle Database OML Notebooks Oracle Database Cloud Service OML4Py Oracle Data Miner OML4R OML4SQL Python client, Jupyter Notebooks SQL Developer R client, RStudio SQL Developer SQL*Plus Data Management Platform Oracle Machine Learning Component Tool * coming soon Apache Zeppelin OML4SQL OML4Py OML4R* Copyright © 2021 Oracle and/or its affiliates.
Most Productive for Developers and Analysts Integrated microservices, events, REST, SaaS, ML, CI/CD, Low-Code Supports diverse Workload Transactions, analytics, ML, IoT, streaming, blockchain Supports diverse Data Relational, JSON, graph, spatial, text, OLAP, XML, multimedia In cloud and on premises – integrated, not fragmented Building on the world’s only converged database Copyright © 2021, Oracle and/or its affiliates.
Oracle Machine Learning Notebooks Collaborative UI • Based on Apache Zeppelin • Supports data scientists, data analysts, application developers, and DBAs with SQL and Python • Easy notebook sharing • Scheduling, versioning, access control Included with Autonomous Database • Automatically provisioned and managed • In-database algorithms and analytics functions • Explore and prepare, build and evaluate models, score data, deploy solutions Autonomous Database as a Data Science Platform Copyright © 2021 Oracle and/or its affiliates.
CLASSIFICATION • Naïve Bayes • Logistic Regression (GLM) • Decision Tree • Random Forest • Neural Network • Support Vector Machine (SVM) • Explicit Semantic Analysis • XGBoost* ANOMALY DETECTION • One-Class SVM • MSET-SPRT* CLUSTERING • Hierarchical K-Means • Hierarchical O-Cluster • Expectation Maximization (EM) TIME SERIES • Forecasting - Exponential Smoothing • Includes popular models e.g. Holt-Winters with trends, seasonality, irregular time series REGRESSION • Generalized Linear Model (GLM) • Support Vector Machine (SVM) • Stepwise Linear regression • Neural Network • XGBoost* ATTRIBUTE IMPORTANCE • Minimum Description Length • Principal Component Analysis (PCA) • Unsupervised Pairwise KL Divergence • CUR decomposition for row & AI ASSOCIATION RULES • A priori PREDICTIVE QUERIES • Predict, cluster, detect, features SQL ANALYTICS • SQL Windows • SQL Patterns • SQL Aggregates FEATURE EXTRACTION • Principal Comp Analysis (PCA) • Non-negative Matrix Factorization • Singular Value Decomposition (SVD) • Explicit Semantic Analysis (ESA) ROW IMPORTANCE • CUR Decomposition RANKING • XGBoost* TEXT MINING SUPPORT • Algorithms support text columns • Tokenization and theme extraction • Explicit Semantic Analysis (ESA) STATISTICAL FUNCTIONS • min, max, median, stdev, t-test, F-test, Pearson’s, Chi-Sq, ANOVA, etc. Oracle Machine Learning Algorithms and Analytics in Oracle Database * New in 21c Includes support for Partitioned Models, Transactional data and aggregations Copyright © 2021, Oracle and/or its affiliates
Oracle Machine Learning for SQL In-database, parallelized, distributed algorithms • No extracting data to separate ML engine • Fast and scalable • Batch and real-time scoring at scale that leverages Exadata storage-tier function pushdown • Algorithm-specific automatic data preparation • Explanatory prediction details ML models as first-class database objects • Access control per model • Audit user actions • Export / import models across databases • Ease of backup, recovery, and security Faster time-to-market through immediate solution deployment Empower SQL users with immediate access to ML included with Oracle Database and Oracle Autonomous Database SQL Interfaces SQL*Plus SQLDeveloper … Oracle Autonomous Database OML Notebooks Oracle Database with OML Copyright © 2021 Oracle and/or its affiliates.
OML to Predict Customer Behavior -- Build a machine learning model to determine which customers are likely buy Travel Insurance DECLARE v_setlst DBMS_DATA_MINING.SETTING_LIST; BEGIN v_setlst('ALGO_NAME') := 'ALGO_SUPPORT_VECTOR_MACHINES'; V_setlst('PREP_AUTO') := 'ON'; DBMS_DATA_MINING.CREATE_MODEL2( MODEL_NAME => 'BUY_TRVL_INSUR', MINING_FUNCTION => 'CLASSIFICATION', DATA_QUERY => 'select * from CUSTOMERS', SET_LIST => v_setlst, CASE_ID_COLUMN_NAME => 'CUST_ID', TARGET_COLUMN_NAME => BUY_TRAVEL_INSURANCE'); END; -- Apply a machine learning model to predict which customers are likely to buy SELECT prediction_probability(BUY_TRVL_INSUR, 'Yes' USING 3500 as bank_funds, 37 as age, 'Married' as marital_status, 2 as num_previous_cruises) FROM dual; Intuitive SQL API—OML4SQL Copyright © 2021 Oracle and/or its affiliates.
New algorithms and features eXtreme Gradient Boosting Trees (XGBoost) • Classification, regression, ranking • Highly popular and powerful algorithm for speed and model accuracy Multivariate State Estimation Technique- Sequential Probability Ratio Test (MSET-SPRT) • Anomaly detection for sensors, IoT data sources • Detects subtle anomalies while producing minimal false alarms Neural Network • Adam Solver - A minibatch solver – computationally efficient, requires little memory, well-suited to larger data • ReLU activation function – enables easier to train models with better performance Enhanced prediction details • Enables even higher quality understanding of factors that most contribute to a prediction • For Support Vector Machine, Generalized Linear Model, Neural Network, k-Means OML4SQL – new in Database 21c Copyright © 2021, Oracle and/or its affiliates 13
Oracle Data Miner User Interface SQL Developer Extension for Oracle Database on premises and DBCS Automates typical data science steps Easy to use drag-and- drop interface Analytical workflows quickly defined and shared Wide range of algorithms and data transformations Generate SQL code for immediate deployment Create analytical workflows – productivity tool for data scientists – enables citizen data scientists Copyright © 2021 Oracle and/or its affiliates.
New algorithms and features eXtreme Gradient Boosting Trees (XGBoost) • Classification, regression, ranking • Highly popular and powerful algorithm for speed and model accuracy Multivariate State Estimation Technique- Sequential Probability Ratio Test (MSET-SPRT) • Anomaly detection for sensors, IoT data sources • Detects subtle anomalies while producing minimal false alarms Neural Network • Adam Solver - A minibatch solver – computationally efficient, requires little memory, well-suited to larger data • ReLU activation function – enables easier to train models with better performance Enhanced prediction details • Enables even higher quality understanding of factors that most contribute to a prediction • For Support Vector Machine, Generalized Linear Model, Neural Network, k-Means OML4SQL – new in Database 21c Copyright © 2021, Oracle and/or its affiliates 15
Summary • Minimize or eliminate data movement for database data • Multi-persona, collaborative, democratized machine learning for data scientists, citizen data scientists, developers • Multi-language API (SQL, Python) and no-code user interface • Access from broader data lake data through external tables and Cloud SQL • Data and model governance via Oracle Database and Autonomous Database security models in development and production • Scalable and high-performance modeling and scoring • Elastic scaling for machine learning as part of OML on Autonomous Database • Model explainability and prediction details support XAI in development and production • Bridges gap between development and production with model deployment options • MLOps capabilities include immediate model production deployment from SQL and REST, user collaboration, queryable model repositories, and support for streamlined creation of reproducible ML pipelines • Oracle stack, SaaS, PaaS, IaaS provides a strong environment in which data engineers, ML engineers and architects, corporate developers and others can contribute to the DS and ML workflow • On-premises and Cloud availability for ML capabilities • Oracle tools and enterprise applications integration, including Oracle Analytics Server, Oracle Analytics Cloud and Oracle APEX • Simple pricing structure - ML capabilities included in core product at no additional cost Oracle Machine Learning on ADB-S Copyright © 2021, Oracle and/or its affiliates 16
Oracle Machine Learning for R and Python Transparency layer • Leverage proxy objects so data remains in database • Overload native functions translating functionality to SQL • Use familiar R / Python syntax on database data Parallel, distributed algorithms • Scalability and performance • Exposes in-database algorithms available from OML4SQL Embedded execution • Manage and invoke R or Python scripts in Oracle Database • Data-parallel, task-parallel, and non-parallel execution • Use open source packages to augment functionality OML4Py also includes AutoML and MLX • Automated algorithm selection, feature selection, model tuning • Algorithm-agnostic model explainability (MLX) for feature ranking Copyright © 2021 Oracle and/or its affiliates. Empower data scientists with open source environments Oracle Database SQL Interface OML4R OML Notebooks OML4Py REST Interface Oracle Autonomous Database Oracle Database SQL Interface
spawns Embedded Execution Example of parallel partitioned data flow using third party package using OML4Py # user-defined function using sklearn def build_lm(dat): from sklearn import linear_model lm = linear_model.LinearRegression() X = dat[['PETAL_WIDTH']] y = dat[['PETAL_LENGTH']] lm.fit(X, y) return lm # select column(s) for partitioning data index = oml.DataFrame(IRIS['SPECIES']) # invoke function in parallel on IRIS table mods = oml.group_apply(IRIS, index, func=build_lm, parallel=2) mods.pull().items() OML4Py Python Engine OML4Py Python Engine OML4Py OML Notebooks Copyright © 2021 Oracle and/or its affiliates. REST Interface Oracle Autonomous Database User tables
Copyright © 2021, Oracle and/or its affiliates 19
Enhance data scientist productivity and enable non-expert data professionals Accelerate new ML projects Automate repetitive and time-consuming tasks Generate editable notebooks for selected models Deploy models as REST endpoints Featuring • Monitor experiment progress • Customize selection quality metric and metrics display • Even faster data scoring performance for streaming and real-time applications OML AutoML UI 20 Copyright © 2021, Oracle and/or its affiliates
Simplify the machine learning modeling and deployment process OML AutoML UI OML Model Data Copyright © 2021, Oracle and/or its affiliates 21 Auto Algorithm Selection • Identify in-database algorithms likely to achieve higher model quality • Find best algorithm faster than exhaustive search Adaptive Sampling • Identify right sample size for training data • Adjust sample for unbalanced data Auto Feature Selection • De-noise data • Reduce features by identifying most predictive • Improve accuracy and performance Auto Model Tuning • Improves model accuracy • Automated tuning of hyperparameters • Avoid manual or exhaustive search techniques OML AutoML UI Experiment Pipeline Feature Prediction Impact • Rank features most influential for scoring • Algorithm-agnostic technique • For each final model per algorithm Plus…
Comparing OML4Py AutoML with OML AutoML UI Copyright © 2021, Oracle and/or its affiliates 22 Step in workflow OML4Py AutoML API OML AutoML UI Algorithm Selection ü Optional use ü Adaptive Sampling Roadmap ü Feature Selection ü Optional use ü Model Tuning ü ü Model Selection ü Specific API function to return top model or user selection ü Leaderboard ranks models by score metric for use choice Feature Prediction Impact ü Optional use via MLX ü Generate notebook for model Not available ü Integrated model deployment to OML Services Explicit model export and REST API import ü Manual pipeline assembly Experiment assembles the full pipeline
Oracle Machine Learning for Spark Leverage Spark 2 environment for powerful data preparation and machine learning Use data across range of Data Lake sources Achieve scalability and performance using full Hadoop cluster Parallelized and distributed ML algorithms from native and Spark MLlib implementations R Language API Component to Oracle Big Data Connectors and on Oracle Big Data Service Java API HDFS | Hive | Spark DF | Impala | JDBC Sources BDA BDS DIY OML4Spark R Client Copyright © 2021 Oracle and/or its affiliates.
Enable key elements of overall enterprise MLOps strategy Fast data scoring performance for streaming and real-time applications Pay only for actual scoring compute – no pre-provisioned VM Facilitate collaboration across data science team Model Management and Deployment Services • Deploy in-database (native format) and third-party (ONNX format) models • Import ONNX for Tensorflow, PyTorch, MXNet, scikitlearn, etc. • Store, version, compare ML models • Organize models within namespaces Built-in cognitive text services • Extract topics and keywords • Sentiment analysis • Text summary and similarity OML Services Supports lightweight model scoring using REST endpoints for application integration Copyright © 2021 Oracle and/or its affiliates.
25 Connectivity and use from Client Oracle Machine Learning Services architecture Copyright © 2021, Oracle and/or its affiliates. All rights reserved REST Client user/pass GET Token Token + Actions & Text/Objects GET POST DELETE Oracle Autonomous Database /omlusers PDB /omlmod OML Services
Components with built-in Oracle Machine Learning Oracle Machine Learning Services - Methods Copyright © 2021, Oracle and/or its affiliates 26 Repository • Store Model • Update Model Namespace • Model Listing • Model Info • Model Metadata • Model Content • Model Admin • Token using ADB user and password Generic • Metadata for all Versions: Version 1 Metadata • Open API Specification Deployment • Create Model Endpoint • Score Model using Endpoint • Endpoints • Endpoint Details • Open API Specification for Endpoint • Endpoint Cognitive Text • Get Most Relevant Topics • Get Most Relevant Keywords • Get Summaries • Get Sentiments • Get Semantic Similarities • Numeric Features • Get Endpoints GET POST DELETE GET POST DELETE GET POST GET POST
Copyright © 2021 Oracle and/or its affiliates. Demo
Copyright © 2021, Oracle and/or its affiliates 28 OML components deployment scenarios
Copyright © 2021, Oracle and/or its affiliates 29 Prepared Database Table Generate notebook {REST:API} OML Services Enterprise Applications Deploy in-database model OML AutoML UI Build in-db model Export and deploy in-db model In-database SQL scoring Direct model access and In- database SQL scoring Direct model access and In- database SQL scoring Oracle APEX In-database model deployment scenarios – OML AutoML UI
Copyright © 2021, Oracle and/or its affiliates 30 {REST:API} OML Services Oracle APEX Deploy in-database model Import in-db model SQL OML Notebooks Enterprise Applications Direct model access and In-database SQL scoring Export in-db model In-database model deployment scenarios – OML Notebooks Direct model access and In-database SQL scoring
Copyright © 2021, Oracle and/or its affiliates 31 Oracle Database (on premises and DBCS) Oracle Autonomous Database (ADW, ATP, AJD) Oracle Autonomous Database (ADW, ATP, AJD) Export and deploy in-db model Export and deploy in-db model Multi-database model deployment scenarios
Copyright © 2021, Oracle and/or its affiliates 32 Export model in ONNX format {REST:API} OML Services Import model OCI Data Science Oracle APEX Enterprise Applications Model deployment scenarios
Performs text analysis at scale Understand unstructured text in documents e.g.: • Customer feedback interactions • Support tickets • Social media Built-in pre-trained models eliminates the need for machine learning expertise Empowers developers to apply: • Sentiment analysis • Key-phrase extraction • Text classification • Named entity recognition • + more OCI Language
Provides automatic speech recognition Real-time speech recognition using prebuilt models Trained on thousands of native and non-native language speakers Enables developers to easily: • Convert file-based audio data containing human speech into highly accurate text transcriptions • Provide in-workflow closed captions • Index content • Enhance analytics on audio and video content OCI Speech
Provides pre-trained computer vision models Perform image recognition and document analysis tasks Extend the models to other use cases e.g.: • Scene monitoring • Defect detection • Document processing Detect visual anomalies in manufacturing Extract text from forms to automate business workflows Tag items in images to count products or shipments OCI Vision
Business-specific anomaly detection models Flag critical irregularities early, which enables: • Faster resolution • Fewer operational disruption Provides REST APIs and SDKs for several programming languages Built on the patented MSET2 algorithm, which is used worldwide e.g.: • Nuclear reactor health monitoring • Fraud detection • Predicting equipment breakdown • Receiving data from multiple devices to predict failures OCI Anomaly Detection
Delivers time-series forecasts No need for data science expertise Helps developers to quickly create accurate forecasts including: • Product demand • Revenue • Resource requirements Forecasts all have confidence intervals and explainability to help developers make the right business decisions OCI Forecasting
Helps users build labeled datasets to train AI models Via user interfaces and public APIs, users can: • Assemble data • Create and browse datasets • Apply labels to data records The labeled data sets can be exported and used for model development across many of Oracle’s AI and data science services OCI Data Labeling
Products and Services in Context 39 Copyright © 2021, Oracle and/or its affiliates. All rights reserved. • A complete and comprehensive platform for AI/ML • Logical layers supporting wide range of personas • Services integrated with data management services for data science / machine learning solution deployment
Helpful Links 40 ORACLE MACHINE LEARNING ON O.COM https://www.oracle.com/machine-learning OML TUTORIALS OML LiveLab: https://apexapps.oracle.com/pls/apex/dbpm/r/livelabs/view-workshop?p180_id=560 OML4Py LiveLab: https://apexapps.oracle.com/pls/apex/dbpm/r/livelabs/view-workshop?wid=786 Interactive tour: https://docs.oracle.com/en/cloud/paas/autonomous-database/oml-tour OML OFFICE HOURS https://asktom.oracle.com/pls/apex/asktom.search?office=6801#sessionss ORACLE ANALYTICS CLOUD https://www.oracle.com/solutions/business-analytics/data-visualization/examples.html OML4PY ORACLE AUTOML UI OML SERVICES Oracle Machine Learning AutoML UI (2m video) Oracle Machine Learning Demonstration (6m video) OML AutoML UI Technical Brief Blog: Introducing Oracle Machine Learning AutoML UI Oracle Machine Learning Services (2m video) OML Services Technical Brief Oracle Machine Learning Services Documentation Blog: Introducing Oracle Machine Learning Services GitHub Repository with OML Services examples OML4Py (2m video) OML4Py Introduction (17m video) OML4Py Technical Brief OML4Py User’s Guide Blog: Introducing OML4Py GitHub Repository with Python notebooks

Machine Learning and AI at Oracle

  • 1.
    VP AIOps forthe Autonomous Database Sandesh Rao LAOUC Machine Learning and AI at Oracle @sandeshr https://www.linkedin.com/in/raosandesh/ https://www.slideshare.net/SandeshRao4
  • 2.
    Types of MachineLearning Supervised Learning Predict future outcomes with the help of training data provided by human experts Semi-Supervised Learning Discover patterns within raw data and make predictions, which are then reviewed by human experts, who provide feedback which is used to improve the model accuracy Unsupervised Learning Find patterns without any external input other than the raw data Reinforcement Learning Take decisions based on past rewards for this type of action
  • 3.
    ML Project Workflow Setthe business objectives Gather compare and clean data Identify and extract features (important columns) from imported data This helps us identify the efficiency of the algorithm Take the input data which is also called the training data and apply the algorithm to it In order for the algorithm to function efficiently, it is important to pick the right value for hyper parameters (input parameters to the algorithm) Once the training data in the algorithm are combined we get a model 1 2 3 4 5
  • 4.
    ML vs AutoML Algorithm Selection Feature Selection Model Tuning Model Evaluation AutoML automatesthe manual steps Accuracy Repeated retraining cycles Algorithms improve models as they get trained on greater volumes of data or more recent/relevant data
  • 5.
    Does not replacedata scientists but rather expediate their capabilities Does AutoML remove the need for Data Scientists? At the advent of the assembly line in manufacturing, many tedious processes were automated. This enabled workers to put their time and energy into bigger issues, from quality of product to improving design and manufacturing processes. AutoML gives similar power to data scientists, delivering more time to engineer predictive features, develop data acquisition strategies, improve the data transformation pipelines, and more.
  • 6.
    Copyright © 2021,Oracle and/or its affiliates 6 Oracle Machine Learning Automated Automated machine learning supports data scientist productivity and empowers non-experts Algorithm-specific data preparation, integrated text mining, partitioned models Scalable Over 30 high performance, parallelized in-database machine learning algorithms that require no data movement Production-ready Quickly deploy and update machine learning models in production via SQL and REST APIs Deploy R and Python user-defined functions using managed processes with easy data-parallel and task- parallel invocation Model Repository Workspaces and Projects Zeppelin-based Notebooks Model Deployment Model Building Model Management Prediction Details R and Python Integration AutoML Data Management Infrastructure Oracle Database – Oracle Autonomous Database – Data Lake Access – Integration – Preparation – Exploration CPU – Storage – Network Cloud On premises
  • 7.
    Oracle Machine Learninginterfaces to Oracle Database Oracle Autonomous Database Oracle Database OML Notebooks Oracle Database Cloud Service OML4Py Oracle Data Miner OML4R OML4SQL Python client, Jupyter Notebooks SQL Developer R client, RStudio SQL Developer SQL*Plus Data Management Platform Oracle Machine Learning Component Tool * coming soon Apache Zeppelin OML4SQL OML4Py OML4R* Copyright © 2021 Oracle and/or its affiliates.
  • 8.
    Most Productive forDevelopers and Analysts Integrated microservices, events, REST, SaaS, ML, CI/CD, Low-Code Supports diverse Workload Transactions, analytics, ML, IoT, streaming, blockchain Supports diverse Data Relational, JSON, graph, spatial, text, OLAP, XML, multimedia In cloud and on premises – integrated, not fragmented Building on the world’s only converged database Copyright © 2021, Oracle and/or its affiliates.
  • 9.
    Oracle Machine LearningNotebooks Collaborative UI • Based on Apache Zeppelin • Supports data scientists, data analysts, application developers, and DBAs with SQL and Python • Easy notebook sharing • Scheduling, versioning, access control Included with Autonomous Database • Automatically provisioned and managed • In-database algorithms and analytics functions • Explore and prepare, build and evaluate models, score data, deploy solutions Autonomous Database as a Data Science Platform Copyright © 2021 Oracle and/or its affiliates.
  • 10.
    CLASSIFICATION • Naïve Bayes •Logistic Regression (GLM) • Decision Tree • Random Forest • Neural Network • Support Vector Machine (SVM) • Explicit Semantic Analysis • XGBoost* ANOMALY DETECTION • One-Class SVM • MSET-SPRT* CLUSTERING • Hierarchical K-Means • Hierarchical O-Cluster • Expectation Maximization (EM) TIME SERIES • Forecasting - Exponential Smoothing • Includes popular models e.g. Holt-Winters with trends, seasonality, irregular time series REGRESSION • Generalized Linear Model (GLM) • Support Vector Machine (SVM) • Stepwise Linear regression • Neural Network • XGBoost* ATTRIBUTE IMPORTANCE • Minimum Description Length • Principal Component Analysis (PCA) • Unsupervised Pairwise KL Divergence • CUR decomposition for row & AI ASSOCIATION RULES • A priori PREDICTIVE QUERIES • Predict, cluster, detect, features SQL ANALYTICS • SQL Windows • SQL Patterns • SQL Aggregates FEATURE EXTRACTION • Principal Comp Analysis (PCA) • Non-negative Matrix Factorization • Singular Value Decomposition (SVD) • Explicit Semantic Analysis (ESA) ROW IMPORTANCE • CUR Decomposition RANKING • XGBoost* TEXT MINING SUPPORT • Algorithms support text columns • Tokenization and theme extraction • Explicit Semantic Analysis (ESA) STATISTICAL FUNCTIONS • min, max, median, stdev, t-test, F-test, Pearson’s, Chi-Sq, ANOVA, etc. Oracle Machine Learning Algorithms and Analytics in Oracle Database * New in 21c Includes support for Partitioned Models, Transactional data and aggregations Copyright © 2021, Oracle and/or its affiliates
  • 11.
    Oracle Machine Learningfor SQL In-database, parallelized, distributed algorithms • No extracting data to separate ML engine • Fast and scalable • Batch and real-time scoring at scale that leverages Exadata storage-tier function pushdown • Algorithm-specific automatic data preparation • Explanatory prediction details ML models as first-class database objects • Access control per model • Audit user actions • Export / import models across databases • Ease of backup, recovery, and security Faster time-to-market through immediate solution deployment Empower SQL users with immediate access to ML included with Oracle Database and Oracle Autonomous Database SQL Interfaces SQL*Plus SQLDeveloper … Oracle Autonomous Database OML Notebooks Oracle Database with OML Copyright © 2021 Oracle and/or its affiliates.
  • 12.
    OML to PredictCustomer Behavior -- Build a machine learning model to determine which customers are likely buy Travel Insurance DECLARE v_setlst DBMS_DATA_MINING.SETTING_LIST; BEGIN v_setlst('ALGO_NAME') := 'ALGO_SUPPORT_VECTOR_MACHINES'; V_setlst('PREP_AUTO') := 'ON'; DBMS_DATA_MINING.CREATE_MODEL2( MODEL_NAME => 'BUY_TRVL_INSUR', MINING_FUNCTION => 'CLASSIFICATION', DATA_QUERY => 'select * from CUSTOMERS', SET_LIST => v_setlst, CASE_ID_COLUMN_NAME => 'CUST_ID', TARGET_COLUMN_NAME => BUY_TRAVEL_INSURANCE'); END; -- Apply a machine learning model to predict which customers are likely to buy SELECT prediction_probability(BUY_TRVL_INSUR, 'Yes' USING 3500 as bank_funds, 37 as age, 'Married' as marital_status, 2 as num_previous_cruises) FROM dual; Intuitive SQL API—OML4SQL Copyright © 2021 Oracle and/or its affiliates.
  • 13.
    New algorithms andfeatures eXtreme Gradient Boosting Trees (XGBoost) • Classification, regression, ranking • Highly popular and powerful algorithm for speed and model accuracy Multivariate State Estimation Technique- Sequential Probability Ratio Test (MSET-SPRT) • Anomaly detection for sensors, IoT data sources • Detects subtle anomalies while producing minimal false alarms Neural Network • Adam Solver - A minibatch solver – computationally efficient, requires little memory, well-suited to larger data • ReLU activation function – enables easier to train models with better performance Enhanced prediction details • Enables even higher quality understanding of factors that most contribute to a prediction • For Support Vector Machine, Generalized Linear Model, Neural Network, k-Means OML4SQL – new in Database 21c Copyright © 2021, Oracle and/or its affiliates 13
  • 14.
    Oracle Data MinerUser Interface SQL Developer Extension for Oracle Database on premises and DBCS Automates typical data science steps Easy to use drag-and- drop interface Analytical workflows quickly defined and shared Wide range of algorithms and data transformations Generate SQL code for immediate deployment Create analytical workflows – productivity tool for data scientists – enables citizen data scientists Copyright © 2021 Oracle and/or its affiliates.
  • 15.
    New algorithms andfeatures eXtreme Gradient Boosting Trees (XGBoost) • Classification, regression, ranking • Highly popular and powerful algorithm for speed and model accuracy Multivariate State Estimation Technique- Sequential Probability Ratio Test (MSET-SPRT) • Anomaly detection for sensors, IoT data sources • Detects subtle anomalies while producing minimal false alarms Neural Network • Adam Solver - A minibatch solver – computationally efficient, requires little memory, well-suited to larger data • ReLU activation function – enables easier to train models with better performance Enhanced prediction details • Enables even higher quality understanding of factors that most contribute to a prediction • For Support Vector Machine, Generalized Linear Model, Neural Network, k-Means OML4SQL – new in Database 21c Copyright © 2021, Oracle and/or its affiliates 15
  • 16.
    Summary • Minimize oreliminate data movement for database data • Multi-persona, collaborative, democratized machine learning for data scientists, citizen data scientists, developers • Multi-language API (SQL, Python) and no-code user interface • Access from broader data lake data through external tables and Cloud SQL • Data and model governance via Oracle Database and Autonomous Database security models in development and production • Scalable and high-performance modeling and scoring • Elastic scaling for machine learning as part of OML on Autonomous Database • Model explainability and prediction details support XAI in development and production • Bridges gap between development and production with model deployment options • MLOps capabilities include immediate model production deployment from SQL and REST, user collaboration, queryable model repositories, and support for streamlined creation of reproducible ML pipelines • Oracle stack, SaaS, PaaS, IaaS provides a strong environment in which data engineers, ML engineers and architects, corporate developers and others can contribute to the DS and ML workflow • On-premises and Cloud availability for ML capabilities • Oracle tools and enterprise applications integration, including Oracle Analytics Server, Oracle Analytics Cloud and Oracle APEX • Simple pricing structure - ML capabilities included in core product at no additional cost Oracle Machine Learning on ADB-S Copyright © 2021, Oracle and/or its affiliates 16
  • 17.
    Oracle Machine Learningfor R and Python Transparency layer • Leverage proxy objects so data remains in database • Overload native functions translating functionality to SQL • Use familiar R / Python syntax on database data Parallel, distributed algorithms • Scalability and performance • Exposes in-database algorithms available from OML4SQL Embedded execution • Manage and invoke R or Python scripts in Oracle Database • Data-parallel, task-parallel, and non-parallel execution • Use open source packages to augment functionality OML4Py also includes AutoML and MLX • Automated algorithm selection, feature selection, model tuning • Algorithm-agnostic model explainability (MLX) for feature ranking Copyright © 2021 Oracle and/or its affiliates. Empower data scientists with open source environments Oracle Database SQL Interface OML4R OML Notebooks OML4Py REST Interface Oracle Autonomous Database Oracle Database SQL Interface
  • 18.
    spawns Embedded Execution Example ofparallel partitioned data flow using third party package using OML4Py # user-defined function using sklearn def build_lm(dat): from sklearn import linear_model lm = linear_model.LinearRegression() X = dat[['PETAL_WIDTH']] y = dat[['PETAL_LENGTH']] lm.fit(X, y) return lm # select column(s) for partitioning data index = oml.DataFrame(IRIS['SPECIES']) # invoke function in parallel on IRIS table mods = oml.group_apply(IRIS, index, func=build_lm, parallel=2) mods.pull().items() OML4Py Python Engine OML4Py Python Engine OML4Py OML Notebooks Copyright © 2021 Oracle and/or its affiliates. REST Interface Oracle Autonomous Database User tables
  • 19.
    Copyright © 2021,Oracle and/or its affiliates 19
  • 20.
    Enhance data scientistproductivity and enable non-expert data professionals Accelerate new ML projects Automate repetitive and time-consuming tasks Generate editable notebooks for selected models Deploy models as REST endpoints Featuring • Monitor experiment progress • Customize selection quality metric and metrics display • Even faster data scoring performance for streaming and real-time applications OML AutoML UI 20 Copyright © 2021, Oracle and/or its affiliates
  • 21.
    Simplify the machinelearning modeling and deployment process OML AutoML UI OML Model Data Copyright © 2021, Oracle and/or its affiliates 21 Auto Algorithm Selection • Identify in-database algorithms likely to achieve higher model quality • Find best algorithm faster than exhaustive search Adaptive Sampling • Identify right sample size for training data • Adjust sample for unbalanced data Auto Feature Selection • De-noise data • Reduce features by identifying most predictive • Improve accuracy and performance Auto Model Tuning • Improves model accuracy • Automated tuning of hyperparameters • Avoid manual or exhaustive search techniques OML AutoML UI Experiment Pipeline Feature Prediction Impact • Rank features most influential for scoring • Algorithm-agnostic technique • For each final model per algorithm Plus…
  • 22.
    Comparing OML4Py AutoMLwith OML AutoML UI Copyright © 2021, Oracle and/or its affiliates 22 Step in workflow OML4Py AutoML API OML AutoML UI Algorithm Selection ü Optional use ü Adaptive Sampling Roadmap ü Feature Selection ü Optional use ü Model Tuning ü ü Model Selection ü Specific API function to return top model or user selection ü Leaderboard ranks models by score metric for use choice Feature Prediction Impact ü Optional use via MLX ü Generate notebook for model Not available ü Integrated model deployment to OML Services Explicit model export and REST API import ü Manual pipeline assembly Experiment assembles the full pipeline
  • 23.
    Oracle Machine Learningfor Spark Leverage Spark 2 environment for powerful data preparation and machine learning Use data across range of Data Lake sources Achieve scalability and performance using full Hadoop cluster Parallelized and distributed ML algorithms from native and Spark MLlib implementations R Language API Component to Oracle Big Data Connectors and on Oracle Big Data Service Java API HDFS | Hive | Spark DF | Impala | JDBC Sources BDA BDS DIY OML4Spark R Client Copyright © 2021 Oracle and/or its affiliates.
  • 24.
    Enable key elementsof overall enterprise MLOps strategy Fast data scoring performance for streaming and real-time applications Pay only for actual scoring compute – no pre-provisioned VM Facilitate collaboration across data science team Model Management and Deployment Services • Deploy in-database (native format) and third-party (ONNX format) models • Import ONNX for Tensorflow, PyTorch, MXNet, scikitlearn, etc. • Store, version, compare ML models • Organize models within namespaces Built-in cognitive text services • Extract topics and keywords • Sentiment analysis • Text summary and similarity OML Services Supports lightweight model scoring using REST endpoints for application integration Copyright © 2021 Oracle and/or its affiliates.
  • 25.
    25 Connectivity and usefrom Client Oracle Machine Learning Services architecture Copyright © 2021, Oracle and/or its affiliates. All rights reserved REST Client user/pass GET Token Token + Actions & Text/Objects GET POST DELETE Oracle Autonomous Database /omlusers PDB /omlmod OML Services
  • 26.
    Components with built-inOracle Machine Learning Oracle Machine Learning Services - Methods Copyright © 2021, Oracle and/or its affiliates 26 Repository • Store Model • Update Model Namespace • Model Listing • Model Info • Model Metadata • Model Content • Model Admin • Token using ADB user and password Generic • Metadata for all Versions: Version 1 Metadata • Open API Specification Deployment • Create Model Endpoint • Score Model using Endpoint • Endpoints • Endpoint Details • Open API Specification for Endpoint • Endpoint Cognitive Text • Get Most Relevant Topics • Get Most Relevant Keywords • Get Summaries • Get Sentiments • Get Semantic Similarities • Numeric Features • Get Endpoints GET POST DELETE GET POST DELETE GET POST GET POST
  • 27.
    Copyright © 2021Oracle and/or its affiliates. Demo
  • 28.
    Copyright © 2021,Oracle and/or its affiliates 28 OML components deployment scenarios
  • 29.
    Copyright © 2021,Oracle and/or its affiliates 29 Prepared Database Table Generate notebook {REST:API} OML Services Enterprise Applications Deploy in-database model OML AutoML UI Build in-db model Export and deploy in-db model In-database SQL scoring Direct model access and In- database SQL scoring Direct model access and In- database SQL scoring Oracle APEX In-database model deployment scenarios – OML AutoML UI
  • 30.
    Copyright © 2021,Oracle and/or its affiliates 30 {REST:API} OML Services Oracle APEX Deploy in-database model Import in-db model SQL OML Notebooks Enterprise Applications Direct model access and In-database SQL scoring Export in-db model In-database model deployment scenarios – OML Notebooks Direct model access and In-database SQL scoring
  • 31.
    Copyright © 2021,Oracle and/or its affiliates 31 Oracle Database (on premises and DBCS) Oracle Autonomous Database (ADW, ATP, AJD) Oracle Autonomous Database (ADW, ATP, AJD) Export and deploy in-db model Export and deploy in-db model Multi-database model deployment scenarios
  • 32.
    Copyright © 2021,Oracle and/or its affiliates 32 Export model in ONNX format {REST:API} OML Services Import model OCI Data Science Oracle APEX Enterprise Applications Model deployment scenarios
  • 33.
    Performs text analysisat scale Understand unstructured text in documents e.g.: • Customer feedback interactions • Support tickets • Social media Built-in pre-trained models eliminates the need for machine learning expertise Empowers developers to apply: • Sentiment analysis • Key-phrase extraction • Text classification • Named entity recognition • + more OCI Language
  • 34.
    Provides automatic speechrecognition Real-time speech recognition using prebuilt models Trained on thousands of native and non-native language speakers Enables developers to easily: • Convert file-based audio data containing human speech into highly accurate text transcriptions • Provide in-workflow closed captions • Index content • Enhance analytics on audio and video content OCI Speech
  • 35.
    Provides pre-trained computervision models Perform image recognition and document analysis tasks Extend the models to other use cases e.g.: • Scene monitoring • Defect detection • Document processing Detect visual anomalies in manufacturing Extract text from forms to automate business workflows Tag items in images to count products or shipments OCI Vision
  • 36.
    Business-specific anomaly detectionmodels Flag critical irregularities early, which enables: • Faster resolution • Fewer operational disruption Provides REST APIs and SDKs for several programming languages Built on the patented MSET2 algorithm, which is used worldwide e.g.: • Nuclear reactor health monitoring • Fraud detection • Predicting equipment breakdown • Receiving data from multiple devices to predict failures OCI Anomaly Detection
  • 37.
    Delivers time-series forecasts Noneed for data science expertise Helps developers to quickly create accurate forecasts including: • Product demand • Revenue • Resource requirements Forecasts all have confidence intervals and explainability to help developers make the right business decisions OCI Forecasting
  • 38.
    Helps users buildlabeled datasets to train AI models Via user interfaces and public APIs, users can: • Assemble data • Create and browse datasets • Apply labels to data records The labeled data sets can be exported and used for model development across many of Oracle’s AI and data science services OCI Data Labeling
  • 39.
    Products and Servicesin Context 39 Copyright © 2021, Oracle and/or its affiliates. All rights reserved. • A complete and comprehensive platform for AI/ML • Logical layers supporting wide range of personas • Services integrated with data management services for data science / machine learning solution deployment
  • 40.
    Helpful Links 40 ORACLE MACHINELEARNING ON O.COM https://www.oracle.com/machine-learning OML TUTORIALS OML LiveLab: https://apexapps.oracle.com/pls/apex/dbpm/r/livelabs/view-workshop?p180_id=560 OML4Py LiveLab: https://apexapps.oracle.com/pls/apex/dbpm/r/livelabs/view-workshop?wid=786 Interactive tour: https://docs.oracle.com/en/cloud/paas/autonomous-database/oml-tour OML OFFICE HOURS https://asktom.oracle.com/pls/apex/asktom.search?office=6801#sessionss ORACLE ANALYTICS CLOUD https://www.oracle.com/solutions/business-analytics/data-visualization/examples.html OML4PY ORACLE AUTOML UI OML SERVICES Oracle Machine Learning AutoML UI (2m video) Oracle Machine Learning Demonstration (6m video) OML AutoML UI Technical Brief Blog: Introducing Oracle Machine Learning AutoML UI Oracle Machine Learning Services (2m video) OML Services Technical Brief Oracle Machine Learning Services Documentation Blog: Introducing Oracle Machine Learning Services GitHub Repository with OML Services examples OML4Py (2m video) OML4Py Introduction (17m video) OML4Py Technical Brief OML4Py User’s Guide Blog: Introducing OML4Py GitHub Repository with Python notebooks