Machine Learning in Oracle Database – What do you want to do? Classification Predict target variable containing 2 (binary) or more (multi-class) category values Regression Predict numeric target variable Anomaly Detection Identify cases as normal or anomalous by learning patterns of normal data Clustering Group or segment cases into hierarchical clusters producing probabilities, rules, and statistics Feature Extraction Derive new values where all Input variables considered to generate reduced set of variables Attribute Importance Supervised and unsupervised ranking of variables to improve model quality Time Series Forecast or predict sequential numeric data using series order column with either Number or Date/Timestamp types Decision Tree Random Forest Naïve Bayes Support Vector Machine Logistic Regression / Generalized Linear Model Generalized Linear Model Support Vector Machine Stepwise Regression Neural Network One-Class SVM K-Means Orthogonal Partitioning Non-negative Matrix Factorization Singular Value Decomposition Minimum Description Length Exponential Smoothing Principal Component Analysis Association Rules Market basic analysis using transactional or 2D data representation to extract frequently occurring patterns and rules Apriori Generates human-interpretable rules, can be used for segmentation Tree-based ensemble method that relies on bagging and feature randomness Computes conditional probabilities and yields interpretable probabilities; assumes predictor attribute independence Solves linear and non-linear problems; multiple solvers; sparsity optimizations; supports multi-target classification (a list of targets per row) Predict binary (0/1, Yes/No) target attributes with attribute coefficients and model statistics; narrow, wide, sparse data; enables ridge, feature selection/generation; row diagnostics Predict binary (0/1, Yes/No) target attributes with attribute coefficients and model statistics; narrow, wide, sparse data; enables ridge, feature selection/generation; row diagnostics Solves linear and non-linear problems; multiple solvers; sparsity optimizations Selects “best” set of predictors for linear model; supports forward, backward, both, and alternate direction Well-suited to noisy and complex data, supports many hidden layers Single, double, and triple exponential smoothing for regular and irregular series, with and without trend and seasonality; multiple methods supported, including Holt-Winters Derives features based on non-negative linear combinations for greater feature interpretability Narrow data via tall and skinny solvers; wide data via stochastic solvers Uses SVD to obtain a set of uncorrelated variables that contain the maximum amount of variance from dataset Select most important variables for classification and regression; Special case of SVM classification that does not use a target; Solves linear and non-linear problems; multiple solvers; sparsity optimizations Finds frequent itemsets and generates human-interpretable rules; computes support, confidence, lift, and aggregate measures associated with rules Produces specified number, k, of clusters; Euclidean and cosine distance functions; sparsity optimizations Discovers natural clusters up to maximum number specified; density-based © 2020 Oracle Corporation. All rights reserved. Oracle Machine Learning on Oracle Database 20c Oracle Machine Learning enables building AI applications and dashboards, delivering powerful in-database ML algorithms, automatic ML functionality, and integration with open source Python and R. OML algorithms support parallel execution for performance and scalability with improved memory utilization, and support for partitioned models and automatic mining of text columns Neural Network Well-suited to noisy and complex data, supports many hidden layers Expectation Maximization Automated model search; protection against overfitting; numeric and multinomial distributions; high quality probability estimates Explicit Semantic Analysis Text categorization with human-readable topic labels derived from corpus; semantic similarity estimates among documents Expectation Maximization Supports unsupervised variable ranking and pairwise dependency estimates Explicit Semantic Analysis Text categorization suitable for large text corpora CUR Decomposition Supports a low-rank SVD-based approach for ranking attribute importance as unsupervised method Row Importance Unsupervised ranking of rows CUR Decomposition Supports low-rank SVD-based approach for ranking row importance as unsupervised method Extreme Gradient Boosting Scalable implementation of popular XGBoost algorithm; supports tree and linear models Extreme Gradient Boosting Scalable implementation of popular XGBoost algorithm; supports tree and linear models MSET-SPRT Process monitoring to detect anomalies with non-linear, non-parametric patterns in IoT sensor data; “Multivariate State Estimation Technique” Ranking Supervised prediction probability of one item ranking over other items Extreme Gradient Boosting Supports pairwise and list-wise ranking

Oracle ML Cheat Sheet

  • 1.
    Machine Learning inOracle Database – What do you want to do? Classification Predict target variable containing 2 (binary) or more (multi-class) category values Regression Predict numeric target variable Anomaly Detection Identify cases as normal or anomalous by learning patterns of normal data Clustering Group or segment cases into hierarchical clusters producing probabilities, rules, and statistics Feature Extraction Derive new values where all Input variables considered to generate reduced set of variables Attribute Importance Supervised and unsupervised ranking of variables to improve model quality Time Series Forecast or predict sequential numeric data using series order column with either Number or Date/Timestamp types Decision Tree Random Forest Naïve Bayes Support Vector Machine Logistic Regression / Generalized Linear Model Generalized Linear Model Support Vector Machine Stepwise Regression Neural Network One-Class SVM K-Means Orthogonal Partitioning Non-negative Matrix Factorization Singular Value Decomposition Minimum Description Length Exponential Smoothing Principal Component Analysis Association Rules Market basic analysis using transactional or 2D data representation to extract frequently occurring patterns and rules Apriori Generates human-interpretable rules, can be used for segmentation Tree-based ensemble method that relies on bagging and feature randomness Computes conditional probabilities and yields interpretable probabilities; assumes predictor attribute independence Solves linear and non-linear problems; multiple solvers; sparsity optimizations; supports multi-target classification (a list of targets per row) Predict binary (0/1, Yes/No) target attributes with attribute coefficients and model statistics; narrow, wide, sparse data; enables ridge, feature selection/generation; row diagnostics Predict binary (0/1, Yes/No) target attributes with attribute coefficients and model statistics; narrow, wide, sparse data; enables ridge, feature selection/generation; row diagnostics Solves linear and non-linear problems; multiple solvers; sparsity optimizations Selects “best” set of predictors for linear model; supports forward, backward, both, and alternate direction Well-suited to noisy and complex data, supports many hidden layers Single, double, and triple exponential smoothing for regular and irregular series, with and without trend and seasonality; multiple methods supported, including Holt-Winters Derives features based on non-negative linear combinations for greater feature interpretability Narrow data via tall and skinny solvers; wide data via stochastic solvers Uses SVD to obtain a set of uncorrelated variables that contain the maximum amount of variance from dataset Select most important variables for classification and regression; Special case of SVM classification that does not use a target; Solves linear and non-linear problems; multiple solvers; sparsity optimizations Finds frequent itemsets and generates human-interpretable rules; computes support, confidence, lift, and aggregate measures associated with rules Produces specified number, k, of clusters; Euclidean and cosine distance functions; sparsity optimizations Discovers natural clusters up to maximum number specified; density-based © 2020 Oracle Corporation. All rights reserved. Oracle Machine Learning on Oracle Database 20c Oracle Machine Learning enables building AI applications and dashboards, delivering powerful in-database ML algorithms, automatic ML functionality, and integration with open source Python and R. OML algorithms support parallel execution for performance and scalability with improved memory utilization, and support for partitioned models and automatic mining of text columns Neural Network Well-suited to noisy and complex data, supports many hidden layers Expectation Maximization Automated model search; protection against overfitting; numeric and multinomial distributions; high quality probability estimates Explicit Semantic Analysis Text categorization with human-readable topic labels derived from corpus; semantic similarity estimates among documents Expectation Maximization Supports unsupervised variable ranking and pairwise dependency estimates Explicit Semantic Analysis Text categorization suitable for large text corpora CUR Decomposition Supports a low-rank SVD-based approach for ranking attribute importance as unsupervised method Row Importance Unsupervised ranking of rows CUR Decomposition Supports low-rank SVD-based approach for ranking row importance as unsupervised method Extreme Gradient Boosting Scalable implementation of popular XGBoost algorithm; supports tree and linear models Extreme Gradient Boosting Scalable implementation of popular XGBoost algorithm; supports tree and linear models MSET-SPRT Process monitoring to detect anomalies with non-linear, non-parametric patterns in IoT sensor data; “Multivariate State Estimation Technique” Ranking Supervised prediction probability of one item ranking over other items Extreme Gradient Boosting Supports pairwise and list-wise ranking